[go: up one dir, main page]

CN118917353B - Evolutionary neural architecture search method and system based on performance-level agent assistance - Google Patents

Evolutionary neural architecture search method and system based on performance-level agent assistance Download PDF

Info

Publication number
CN118917353B
CN118917353B CN202411379555.1A CN202411379555A CN118917353B CN 118917353 B CN118917353 B CN 118917353B CN 202411379555 A CN202411379555 A CN 202411379555A CN 118917353 B CN118917353 B CN 118917353B
Authority
CN
China
Prior art keywords
architecture
performance
level
hierarchy
architectures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411379555.1A
Other languages
Chinese (zh)
Other versions
CN118917353A (en
Inventor
张晓磊
薛羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202411379555.1A priority Critical patent/CN118917353B/en
Publication of CN118917353A publication Critical patent/CN118917353A/en
Application granted granted Critical
Publication of CN118917353B publication Critical patent/CN118917353B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physiology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供了基于性能层级代理辅助的演化神经架构搜索方法和系统,所述方法从图的角度探索神经网络架构的有效表示,采用图神经网络提取网络架构的特征。为了降低演化神经架构搜索的高昂的资源消耗问题,提出预测架构性能层级的代理模型。通过代理模型对架构性能层级的预测,演化神经架构搜索可以充分探索搜索空间,搜索到高性能的网络架构。本发明设计了基于简单路径的交叉算子,尽可能保留父架构中的有效数据处理路径,确保子架构继承父代的优良特性,从而提高性能稳定性。本发明的代理辅助演化神经架构搜索框架可以在任务数据集上以较低的代价搜索到高性能的神经网络架构,为神经网络架构的自动化设计和优化提供了崭新的视角和高效率的方法。

The present invention provides an evolutionary neural architecture search method and system based on performance level agent assistance, the method explores the effective representation of neural network architecture from the perspective of graph, and uses graph neural network to extract the characteristics of network architecture. In order to reduce the high resource consumption problem of evolutionary neural architecture search, an agent model for predicting architecture performance level is proposed. Through the prediction of architecture performance level by the agent model, evolutionary neural architecture search can fully explore the search space and search for high-performance network architecture. The present invention designs a crossover operator based on simple paths to retain the effective data processing path in the parent architecture as much as possible, ensuring that the child architecture inherits the excellent characteristics of the parent, thereby improving performance stability. The agent-assisted evolutionary neural architecture search framework of the present invention can search for high-performance neural network architectures at a low cost on task data sets, providing a new perspective and efficient method for the automated design and optimization of neural network architectures.

Description

Method and system for searching evolutionary neural architecture based on performance level agent assistance
Technical Field
The invention relates to the field of automated machine learning, in particular to an evolutionary neural architecture searching method and system based on performance level agent assistance.
Background
Neural networks, an important model for machine learning, have been successfully used in the fields of computer vision, natural language processing, medical health, and the like. The network architecture determines the feature extraction capability and the representation capability of the model, and a reasonable architecture can effectively capture important features in data, so that the performance of the model is improved. The excellent architecture design can bring higher precision, efficiency and wider application scenes. However, designing an efficient neural network architecture often requires extensive expertise and extensive experimental work, which makes the network architecture design a complex and time-consuming task. To address this problem, neural network architecture searches have evolved. Neural network architecture searching is an automated process by systematically searching and optimizing the architecture of a neural network to find the network structure that best suits a particular task. The method aims at reducing the complexity of manually designing the network architecture and improving the performance and efficiency of the network. Through architecture searching, researchers and engineers can find a better network architecture, thereby achieving better effects in applications such as computer vision, natural language processing, voice processing and the like. The method ensures that the design process of the neural network is more intelligent, and improves the overall performance of the artificial intelligence system.
While neural architecture searches offer great potential in automating the design and optimization of neural network architecture, the expensive computational cost becomes a bottleneck limiting its development. The search process requires evaluation of a large number of candidate architectures, each of which typically requires complete training and verification, while the search process may take weeks or even months, especially when using deep neural networks and large data sets, resulting in significant computational resource consumption and time costs. On the other hand, the evolution calculation has strong adaptability, and the neural network architecture can be optimized for different tasks. The search space of the neural network architecture is usually very large and complex, and the evolution algorithm can effectively explore the space through a mechanism of randomness and diversity. However, evolving computation based on population mechanisms requires evaluation of a large number of architectural individuals. This has also led to further developments and applications of evolutionary neural architecture searches.
In recent years, a proxy-assisted neural architecture search framework has emerged. The proxy model is an effective way to reduce the resource consumption of neural architecture search. As a data driven method, it establishes a mapping relationship of architecture to performance through existing data, thereby predicting performance of unknown architecture. The proxy model can give predictions of architecture performance in milliseconds and is very inexpensive to build and maintain. The use of inexpensive proxy model predictions replaces expensive architecture performance evaluations, effectively reducing the high consumption of neural architecture searches. However, the existing proxy model method still has the problems of low prediction accuracy and unstable prediction results. How to select efficient proxy indexes and ensure stable and available prediction results is a key factor for determining the performance of a proxy model. Meanwhile, the existing evolution operators cannot be designed according to the characteristics of the neural network architecture, and how to generate a high-performance architecture through the high-efficiency evolution operators also directly influences the final search result. Therefore, it is necessary to provide an evolutionary neural architecture search method based on performance level agent assistance to achieve an efficient search process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an evolutionary neural architecture searching method and system based on performance level agent assistance. The method comprises the following steps:
Step 1, initializing a framework population, evolving a plurality of generations, actually evaluating all framework individuals in the population to obtain real performance, and recording the existing framework performance data;
Step 2, dividing the performance level of the architecture according to the real performance, marking a level label as training data, and training a proxy model by using the training data, wherein the proxy model establishes a mapping relation from the architecture to the performance level, so that the performance level of an unknown architecture can be predicted;
step 3, generating an architecture individual through a crossover operator and a mutation operator based on a simple path, predicting the performance level of the architecture individual by using a proxy model, reserving a high-level architecture, and directly eliminating a low-level architecture;
Step 4, the high-level architecture continues to participate in a one-time selection process, the real performance of the architecture finally reserved is evaluated, and the existing architecture performance data is added;
step 5, merging the evaluated high-level architecture with the current population, performing environment selection, and selecting a tournament to generate a new population in order to maintain the diversity of the population;
And 6, repeating the steps 3 to 5 until the stopping condition is met, and outputting the global optimal architecture.
In step 1, several generations of evolution are required in advance because the proxy model based on performance level prediction requires a partially high-performance architecture as a training sample. Only if the training dataset contains both high-performance and low-performance architectures, the proxy model can learn knowledge of the performance level. Therefore, through the evolution process of several generations, the performance of the architecture population is improved to a certain extent, and the architecture with partial high performance in the training data is ensured.
In step 1, after initializing a population P of frameworks, G generation is evolved, where G is less than or equal to 10, evaluating the performance of each framework on the task data set (the data set collected and prepared for a particular task or goal, for training, validating and testing the model, helping the model learn the patterns and relationships required for a particular task), and saving in recordsWhereinA pair of architecture performance is represented as a pair,Is the i-th architecture in the record,Is the performance corresponding to the ith architecture, and N is the total number of architectures in the record Archive.
In step 2, firstly, a training data set of the proxy model is made according to the existing architecture performance data division hierarchy to train the proxy model. The number of levels then requires a comprehensive search space size and complexity determination. The existing architecture data is ranked according to the performance, then the architecture is divided into layers according to the quantity, and the layers are correspondingly labeled. This will be used for supervised training of the proxy model. The step 2 specifically comprises the following steps:
Step 2.1, dividing the performance level of the architecture, and making a training data set:
for recording Architecture in terms of performanceOrdering from big to small, setting the performance level as K, and dividing the performance level into the following steps:
,
data is a training Data set of the agent Model, and is used for supervised training of the Model; representing the K-th level of the division, Is the number of architectures within the K-th hierarchy, K identifies the architectureIs a hierarchy of (2);
Step 2.2, designing the structure of the proxy model:
the agent Model comprises an architecture feature extractor and a hierarchy divider, wherein the architecture feature extractor and the hierarchy divider are mutually coupled to jointly train and complete tasks of architecture feature extraction and hierarchy division;
the architecture feature extractor comprises a graph convolution layer, and the architecture feature extractor finishes extracting architecture features of the neural network based on transmission and aggregation of information among nodes to obtain architecture representation;
the hierarchy divider comprises a full-connection layer, and the hierarchy divider completes prediction of the performance level of the neural network architecture based on learning of architecture representation and divides the corresponding architecture into the performance levels;
The neural network architecture can be viewed as a directed acyclic graph, with edges representing connections and nodes representing operations. For the data of the graph structure, the graph neural network can obtain effective architecture representation by utilizing the information packaged in the graph structure based on the transmission and aggregation of the information among the nodes. Thus, the architectural feature extractor is made up of a number of layers of the gallery stack. The features of each node are typically represented by a feature vector X. These features may be attributes of the node, embedded representations, or initial features obtained from other sources. In each layer of graph convolution, the feature vector of each node is updated by the feature vector of the neighboring node, which is described as:
,
Wherein, Is an activation function; The method is characterized in that the method comprises the steps of adding an adjacency matrix of a self-loop graph, wherein D is a degree matrix formed by all nodes and is a diagonal matrix, and the calculation method comprises the following steps: ; Is the first in the degree matrix D Line 1The elements of the column are arranged such that,Is an adjacency matrixMiddle (f)Elements of row b;
Is the trainable weight matrix of the first layer of the architecture feature extractor, namely the parameter matrix of the first layer; is the node characteristic vector of the first layer of the architecture characteristic extractor, is the node characteristic representation obtained by continuously transmitting and aggregating information through the formula, and Representing the input of an entire architectural feature extractorThe node characteristic information extracted by the graph convolution layer is combined with all the characteristic information through a global pooling layer to obtain a representation of the architecture in the hidden space;
The architecture representation is then input into a hierarchy divider, which is made up of a number of fully connected layers to establish a mapping of the architecture representation to the performance level, to predict the performance level to which the corresponding architecture belongs. In the invention, the extraction of the architecture features and the dividing modules of the performance level are mutually coupled, and the weight parameters of the two modules are trained and optimized simultaneously. This will facilitate the co-operation of the two modules while accomplishing the extraction of architectural features and the partitioning of the performance level.
And 2.3, the invention designs a loss function for the training process of the proxy model according to the task characteristics of the proxy model. In neural architecture searching, efficient architecture representation is critical to successful implementation of efficient searching and optimization. In one aspect, the architecture representation should adequately reflect the topology and operational information of the network architecture. On the other hand, the architecture representation would facilitate the prediction and search process of the proxy model if it were able to perform a corresponding distribution in hidden space according to performance. In other words, the architectural representation in hidden space is preferably capable of assuming a distribution according to performance. For example, the representation of high-performance architecture and low-performance architecture in hidden space should be as principle as possible to minimize interference with proxy model predictions, and similar-performance architecture should be aggregated in similar representation areas to facilitate proxy model predictions as much as possible. Accordingly, in a predictive performance level task, architectural representations of the same level should be as close as possible and architectural representations of different levels should be as far apart as possible. Therefore, in order to achieve an efficient architectural representation, it is necessary to design a special loss function.
First, the present invention employs a cross entropy loss function in hierarchical partitioning tasks. It is a widely used loss function in deep learning, especially for classification tasks. The method measures the difference between the actual label and the prediction probability distribution, is used for evaluating the prediction effect of the classification model, and guides model training by quantifying the inconsistency between the actual class label and the prediction probability distribution. The hierarchical division task may also be considered a classification task, so cross entropy loss functions may be used to guide the training of proxy models.
Designing a loss function, and guiding the training process of the proxy model:
The loss function of the design proxy model is expressed as:
,
,
,
Wherein, For the cross entropy loss function, m is the batch size of the input data,Is the level of the ith architecture in the batch predicted by the proxy model,Is the true performance level corresponding to the ith architecture; to design the characteristic loss function The aim is to guide the model to extract more advantageous architectural features. It will reduce the architectural features between the same levels, expand the architectural features between different levels, ensure similar performance of the architecture map to similar regions in the feature space. Wherein the function is,Is an architectural representationAnd architecture representationEuclidean distance between them; And Is the input of the hierarchy divider, i.e., the output of the architectural feature extractor, whenIntermediate parameters whenOtherwise, intermediate parameters;Is a critical value parameter for controlling the feature distance between different hierarchies, and the larger the hierarchy phase difference is, the larger the distance between the architecture features should be. For the same hierarchy of architecture, the closer the distance between them is, the better.Is a hyper-parameter that controls the importance between two different loss functions, it can also be set as a learnable parameter, letting the model automatically determine the optimal value according to the training process. And training the proxy model by adopting the total loss function L until convergence.
In step 3, most of the existing evolutionary neural architecture searches adopt a crossing mode based on an adjacency matrix. Firstly, leveling the adjacent matrix into one-dimensional vectors, connecting the one-dimensional vectors after node operation adopts one-hot coding, and then splicing the two one-dimensional vectors to be used as a representation of a framework. Based on this vector representation, conventional crossover operators are employed. Since conventional crossover operators are generally designed for linear structures, they are not suitable for processing the graph structure of the neural network, and may destroy the effective connection and structure of the original architecture, resulting in unstable performance of the generated sub-architecture. Moreover, common crossover operators tend to be non-intuitive for use on network architectures. The lack of understanding of the architecture characteristics of the neural network by the common crossover operator results in larger randomness of the generated child architecture, and is difficult to inherit the excellent characteristics of the parent.
The invention designs a crossover operator based on a simple path for generating a new architecture. The crossover operator based on the simple path is specially designed for the graph structure data, so that the topological structure of the neural network can be better processed, and the generated child architecture is ensured to accord with expectations. The crossover operator keeps the effective data processing path in the parent architecture as far as possible, ensures that the child architecture inherits the excellent characteristics of the parent, and improves the performance stability. The crossover process based on the simple path is more visual, is convenient to understand and realize, and can more effectively explore and optimize the neural network architecture. By adopting the crossover operator based on the simple path, the defects of the common crossover operator can be overcome, the neural architecture search can be more effectively carried out, and the performance and the stability of the final generated architecture are improved.
The step 3 specifically comprises the following steps:
Step 3.1, generating a new architecture by adopting a crossover operator based on a simple path;
and 3.2, predicting the performance level of the newly generated architecture by using the proxy model.
Step 3.1 comprises:
Step 3.1.1, selecting two father frameworks needing to be crossed, regarding the frameworks as directed graphs, numbering all nodes, and specifically, according to the fitness value of all framework individuals in a population, giving higher selection probability to the high-performance frameworks, giving lower selection probability to the low-performance frameworks, selecting the two father frameworks from the population according to a roulette algorithm, performing cross operation, regarding the two father frameworks as two directed graphs, and numbering the nodes in each graph from an input node to an output node in sequence;
Step 3.1.2, deconstructing two directed graphs to obtain all simple paths of the directed graphs to form a simple path candidate pool, wherein for each directed graph, all simple paths of the directed graph are found, namely, no repeated node sequence exists between an input node and an output node, all simple paths are combined into one path candidate pool, and node sequence numbers and operations are reserved;
In step 3.1.2, for each directed graph, all its simple paths, i.e. the order of nodes from the input node to the output node without repetition, are found, which in the network architecture are actually different data processing flows, such as an operation sequence of convolution-averaging pooling-convolution. All simple paths are combined into a path candidate pool, and node sequence numbers and operations of the path candidate pool are reserved. The candidate path pool provides diversified path selection, preserving efficient data processing paths from the parent architecture.
Step 3.1.3, randomly selecting a plurality of simple paths to reconstruct a directed graph of a sub-architecture, wherein the number of paths in a path candidate pool is M, a path number L is randomly generated from [1, M ], then, L simple paths are randomly selected from the path candidate pool again, and the sub-architecture graph is reconstructed by the L simple paths;
Step 3.1.4, inheriting the operation in different modes according to different nodes, wherein the operation directly reserves the node inherited by the father architecture, combines the generated new node, reserves one node from the node operation of the two father architectures randomly with a certain probability (generally 50%), generates a new sub-architecture through the intersection of the two father architectures, and can accelerate the convergence speed of the algorithm and find the architecture with high performance more quickly based on the intersection operator of the simple path because the superior characteristics of the father can be inherited more effectively. For complex neural network structures, the simple path-based crossover operator can better capture and utilize layering and modularized characteristics of the network, and has stronger adaptability.
Step 3.1.5, repeating the steps 3.1.1 to 3.1.4 until a set number of new structures are generated.
Step 3.2 comprises:
performance level prediction is performed on the newly generated architecture by using the constructed proxy model for predicting the architecture performance level, wherein for the set performance level K, the newly generated architecture is divided into K levels, which are expressed as:
,
Wherein, Representing the i-th architecture of the prediction,Is the predicted performance level of the ith architecture. For each new architecture, it is not practical to obtain all its performances in turn, which would cause huge computational and time consumption. Since a proxy model has been constructed that predicts the architecture performance level, the present invention uses the proxy model to perform performance level predictions on the newly generated architecture.
Step 4 comprises:
step 4.1, obtaining all feature representations of the training data by using an architecture feature extractor in the proxy model, and then calculating a hierarchy center of the performance hierarchy;
Step 4.2, using an architecture feature extractor in the proxy model to obtain a feature representation of each architecture in the high level, calculating the distance between the feature representation and the center of the highest performance level, and selecting the architecture closest to the center for performance evaluation;
and 4.3, if the updating condition of the proxy model is met, updating the proxy model by using the latest architecture performance data, and carrying out iterative updating on the proxy model on line.
Step 4.1 comprises:
for m architectures, the feature representation is extracted for all architectures in the training dataset Data using the proxy model, expressed as:
,
Wherein, The ith architecture of representation extractionIs characterized by;
According to the performance level, calculating the centers C of K levels respectively, and sequentially extracting each level architecture The center C of the hierarchy is calculated, expressed as:
,
Wherein, Representing the center of the level of the t-th level,Representing the number of architectures at the t-th level,A feature representation representing an ith architecture in a nth hierarchy.
Step 4.2 comprises:
In the result of agent model prediction, obtaining the feature representation of each architecture in the highest hierarchy, calculating the distance between the feature representation and the center of the highest hierarchy, selecting the V architectures with the smallest distance, wherein the V architectures are the network architectures most worthy of complete training by using a data set, so as to obtain the performance of the V architectures on a task data set, and storing the result in an Archive;
Step 4.3 comprises:
If the evolution of the agent model reaches the set algebra, namely after every evolution for 10 generations, the latest record Archive is used, the latest training Data set Data is manufactured along with the hierarchical division method, and the agent model is updated by using the training Data set Data. While each generation of updates will result in a better proxy model, the computational consumption and search time will correspondingly increase. In comprehensive consideration, the agent model is updated with a certain frequency, so that the agent model with high enough performance can be obtained while the calculation consumption is reduced.
Step 5 includes merging the most recently evaluated architecture with existing architectures in the current population. The consolidated population contains a variety of possible architectures that provide a rich candidate for subsequent environmental selection. On the one hand, in order to keep the current optimal architecture, the elite strategy is firstly adopted to select the part of the architecture with the forefront ranking to be directly kept, and meanwhile, in order to ensure that the diversity of the population is maintained in the selection process, the subsequent architecture adopts the tournament selection strategy. Tournament selection is a common selection method by randomly selecting several individuals in a population for comparison and then selecting the individual with the best performance among them for the next generation. This process is iterated until a new population is generated. By the method, the population can be prevented from converging to a local optimal solution prematurely while excellent individuals are maintained, and the diversity and exploratory capacity of the population are maintained. Finally, the population quality can be effectively improved in the optimization process, the diversity of the population is ensured, and the probability of finding the optimal neural network architecture is improved.
The invention also provides an evolutionary neural architecture search system based on performance level agent assistance, comprising:
the evolution operator module based on the simple path is used for generating a new neural network architecture and pushing the population evolution process;
The architecture feature extraction module based on the graph neural network is used for extracting topology information and operation information of a network architecture and obtaining vector representation of the network architecture;
and a proxy model based on performance hierarchy division, which is used for establishing a mapping relation between architecture representation and performance hierarchy and predicting the performance hierarchy of the neural network architecture.
According to the invention, by constructing a new proxy model, the calculation consumption of the evolutionary neural architecture search can be effectively reduced, the search process can be accelerated, the whole search space can be better explored, and the optimal neural network architecture can be stably searched. Based on the proxy model of performance level prediction, the performance prediction result of the unknown architecture can be accurately and stably given, and the search process is assisted. The invention also provides a method for treating the neural network architecture as a directed acyclic graph, and extracting architecture characteristics by using the graph neural network to obtain effective architecture representation. The architecture representation has greater adaptability and functionality than previously studied. Meanwhile, aiming at the characteristics of the network architecture, a crossover operator based on a simple path is designed, the efficient data processing path in the network architecture is reserved to the greatest extent, and the crossover process is more visual. New angles of view predicted from graph structures and performance levels bring new breakthrough and progress to the field of evolutionary neural architecture search. Compared with the existing evolutionary neural architecture searching method, the evolutionary neural architecture searching framework formed by the technology provided by the invention has better results.
The beneficial effects of the invention are as follows:
the method provides an evolutionary neural architecture searching method based on performance level agent assistance aiming at evolutionary neural architecture searching. The method extracts the architecture characteristic representation through the graph neural network, and extracts the high-efficiency architecture representation by adopting a designed loss function. And regarding the network architecture as a directed graph, and extracting topology information and operation information in the neural network architecture through an information transmission mechanism of the graph neural network. The constructed proxy model based on performance level prediction can efficiently predict the performance of the architecture from the new point of view of the performance level. A number of candidate architecture performance predictions can be given in an inexpensive manner, replacing the expensive architecture performance evaluations. Meanwhile, on the application of evolution calculation in neural architecture search, a crossover operator conforming to the characteristics of a network architecture is provided. The crossover operator based on the simple path accords with the characteristics of the neural network graph structure on one hand, and retains the effective data processing path in the neural network to the greatest extent on the other hand, and has high interpretation. The performance-level agent-assisted evolutionary neural architecture searching method provided by the invention increases the fit degree of evolutionary computation on neural architecture searching, and the proposed agent model method effectively reduces resource consumption, can promote the application of neural architecture searching on practical problems, and has higher practical value and wide application scenarios.
Drawings
Fig. 1 is a diagram of an algorithm framework of the present invention.
Fig. 2 is a diagram showing a proxy model structure of the present invention.
FIG. 3 is a cross operator depiction of the present invention.
FIG. 4 is a graph of performance of an architecture and corresponding Gaussian kernel density estimation.
Fig. 5 is an effect diagram of the proxy model.
Detailed Description
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
In an embodiment of the present invention, as shown in fig. 1, an evolutionary neural architecture search method based on performance level agent assistance is provided, fig. 1 is a frame diagram of the whole method, and from the initialization of population to the output of results, the implementation flow and details of the evolutionary neural architecture search method based on performance level agent assistance provided in the present invention will be described in detail on a reference data set of the evaluation neural architecture search method, namely NAS-band-101. The method specifically comprises the following steps:
Initializing a framework population P with the size of N, an empty record Archive, evaluating all frameworks in the P on a task data set, and obtaining the actual performance values of the frameworks, wherein the actual performance values are recorded in the Archive, namely:
,
Wherein, Is a pair of architecture performance levels,Representing the i-th architecture of the system,Representing the performance value corresponding to the ith fabric, N is the total number of fabrics in the fabric.
Then evolves a small number of algebra to population P and evaluates each newly generated architecture while also adding the record Archive.
And 2, on the record Archive, according to the existing architecture performance data division hierarchy, making a training data set of the proxy model so as to train the proxy model. The architecture is ranked according to the performance, then the architecture is divided into K layers according to the number, and the layers are correspondingly labeled. Then, the proxy model is trained to build a mapping model of architecture to performance level.
Step 2.1, sorting the architecture in the architecture according to the performance from big to small, setting the performance level as K, equally dividing the performance level into K levels according to the number, wherein the number Q of the architecture of each level is as follows:
,
The result of the division is:
,
data is a training dataset of the proxy Model for supervised training of the Model. Representing the K-th level of the division,Is the number of architecture within this hierarchy, K identifies the hierarchy of architecture;
Step 2.2 constructing a proxy model as shown in fig. 2 (where C1 represents the number of channels of the input layer and C2 represents the number of channels of the output layer), consisting of an architecture feature extractor and a hierarchy divider. The architecture feature extractor is based on a graph convolution layer, the hierarchy divider is based on a full connection layer, a batch of architectures are input, after feature extraction, the architecture feature extractor is input into the hierarchy divider, and finally a predicted hierarchy is output.
And 2.3, before using the proxy Model, performing supervised training on the proxy Model by using Data, and adopting the designed loss function until the Model converges to obtain the proxy Model. Thereafter, feature representations are extracted for all the schemas in Data using ModelAnd the centers C of the K levels are calculated respectively according to the performance levels thereof.Representing the total number of architectures in the log Archive.
Step 3, as shown in fig. 3, after selecting two father frameworks needing to be crossed, deconstructing all simple paths to form a path pool, randomly selecting a certain number of simple paths from the path pool to reconstruct a new sub-framework, and then predicting the performance level of all the generated frameworks by using a proxy Model to reserve the framework of a high performance level. The method comprises the following steps:
And 3.1, generating a new architecture by adopting a crossover operator based on a simple path. And taking the two parent frameworks as directed graphs, and sequentially numbering all nodes from the input nodes of the network framework until the nodes are output. Then, all simple paths from the input node to the output node are obtained. And combining all the simple paths found by the two father frameworks to form a path pool, randomly selecting a certain number of simple paths from the path pool, combining the nodes with the same sequence number into a node according to the sequence number of the nodes in each simple path, and respectively retaining the nodes with different sequence numbers, so as to reconstruct a new framework. This process is repeated multiple times, yielding 10N architectures
And 3.2, predicting the performance level of the newly generated architecture by using the proxy model. Using Model pairsPredicting performance levels for each architecture in (a) to yield:
,
Wherein, Is the ith architectureThe predicted architecture performance level.
And 4, selecting a promising architecture to evaluate the performance on the target data set, and updating the proxy model by using the latest Archive when the condition is met. The method comprises the following steps:
step 4.1, obtaining the characteristic representation of all the frameworks in the architecture by the Model: Then, each hierarchical center C is calculated, expressed as:
,
Where t is the level of performance of a certain level, Is the number of architectures in the t-level,Is a feature representation of the ith architecture.
And 4.2, selecting out V high-level architectures. However, three different situations can occur. The method comprises the steps of firstly, calculating the distance between the representation of each architecture in the highest hierarchy and the center C of the existing hierarchy if the number of architectures in the highest hierarchy is larger than V, reserving the V architectures closest to the center C in the obtained result, secondly, reserving the number of architectures in the highest hierarchy which is smaller than V but not 0, wherein all architectures in the highest hierarchy are reserved, extracting the rest architectures from the next highest hierarchy, still selecting the architecture closest to the center of the hierarchy, thirdly, extracting the complementary number V of architectures from the subsequent hierarchy according to the first similar condition and the second similar condition when the number of architectures in the highest hierarchy is 0. Finally, the performance of the V architectures on the target dataset is evaluated and the results are incorporated into the Archive.
And 4.3, if the agent Model updating condition is met, carrying out hierarchical division on the Archive by adopting the division mode in the step 2 to obtain a new training Data set Data, and training the Model in the same mode until convergence.
And 5, merging the evaluated sub-architecture with the current population, participating in environment selection, and generating a next generation population P. The elite mechanism is adopted, the optimal architecture of the first 10% is selected from the previous generation population P to be directly reserved with the elite control rate of 0.1, the excellent architecture is ensured to be reserved, and then 90% of architecture and elite architecture group layer next generation population is selected from the previous generation population by adopting tournament in order to maintain the diversity of the population.
Repeating the steps until the stopping condition is reached, and then outputting the globally optimal architecture as a result.
NASBench-101 are a benchmark dataset for neural architecture searches that provide a standardized platform for evaluating and comparing different NAS methodologies. Researchers can use the same data set and evaluation index to compare, ensure consistency and repeatability of results. NAS-Bench-101 contains more than 423000 unique neural network architectures, each of which was evaluated on the CIFAR-10 dataset. This large search space allows researchers to search and optimize in a large set of architectures. The present invention will be tested in NAS-Bench-101 to verify validity.
In examining the crossover operator of the present invention, a performance comparison is made with random sampling and a crossover operator based on an adjacency matrix. Intersection operators based on adjacency matrices are often adopted by existing studies and are therefore chosen as baselines. 2000 architectures were randomly sampled at NAS-Bench-101, then 2000 architectures were generated by a cross operator based on adjacency matrix, and finally 2000 architectures were generated by a cross operator based on simple paths. Next, the Gaussian kernel density function that generated the architecture is calculated and the performance of the architecture and the corresponding Gaussian kernel density estimation curve are plotted, as shown in FIG. 4.
It can be found that the architecture based on the crossover operator generation of simple paths has higher overall performance than other approaches. The performance of a large number of architectures lies in a higher performance interval, and the most part of the architecture performance is higher than that generated in other modes. In addition, in NAS-Bench-101, the effect of the proxy model is verified at the same time, as shown in FIG. 5. Here, a confusion matrix of the prediction results of the proxy model is given, and the prediction capability of the proxy model can be clearly seen from the result of the confusion matrix. It can be found that the proxy model predicts the performance level of the architecture with high accuracy.
The invention provides a method and a system for searching an evolutionary neural architecture based on performance-level agent assistance, and the method and the way for realizing the technical scheme are numerous, the above description is only a preferred embodiment of the invention, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the invention, and the improvements and modifications should be regarded as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (5)

1. An evolutionary neural architecture search method based on performance level agent assistance, the method is used for voice processing and comprises the following steps:
Step 1, initializing a framework population, evolving a plurality of generations, actually evaluating all framework individuals in the population to obtain real performance, and recording the existing framework performance data;
Step 2, dividing the performance level of the architecture according to the real performance, marking a level label as training data, and training a proxy model by using the training data, wherein the proxy model establishes a mapping relation from the architecture to the performance level, so that the performance level of an unknown architecture can be predicted;
step 3, generating an architecture individual through a crossover operator and a mutation operator based on a simple path, predicting the performance level of the architecture individual by using a proxy model, reserving a high-level architecture, and directly eliminating a low-level architecture;
Step 4, the high-level architecture continues to participate in a one-time selection process, the real performance of the architecture finally reserved is evaluated, and the existing architecture performance data is added;
step 5, merging the evaluated high-level architecture with the current population, performing environment selection, and selecting a tournament to generate a new population;
Step 6, repeating the steps 3 to 5 until the stopping condition is met, and outputting a global optimal architecture;
in step 1, after initializing a framework population P, evolving G generation, evaluating the performance of each framework on a task data set, and storing in a record
Wherein (Arch i,yi) represents an architecture performance pair, arch i is the ith architecture in the record, y i is the performance corresponding to the ith architecture, and N is the total number of architectures in the record Arch;
The step 2 comprises the following steps:
Step 2.1, dividing the performance level of the architecture, and making a training data set:
For the architecture in the record Archive, the performance levels are set to K according to the sorting of the performances y i from large to small, and the dividing result is as follows:
data is a training Data set of the agent Model, and is used for supervised training of the Model; N K is the number of architectures within this hierarchy, representing the K-th hierarchy divided, K identifying the hierarchy of architecture Arch i;
Step 2.2, designing the structure of the proxy model:
the agent Model comprises an architecture feature extractor and a hierarchy divider, wherein the architecture feature extractor and the hierarchy divider are mutually coupled to jointly train and complete tasks of architecture feature extraction and hierarchy division;
the architecture feature extractor comprises a graph convolution layer, and the architecture feature extractor finishes extracting architecture features of the neural network based on transmission and aggregation of information among nodes to obtain architecture representation;
the hierarchy divider comprises a full-connection layer, and the hierarchy divider completes prediction of the performance level of the neural network architecture based on learning of architecture representation and divides the corresponding architecture into the performance levels;
In each layer of graph convolution, the feature vector of each node is updated by the feature vector of the neighboring node, which is described as:
Wherein σ is the activation function; the method is characterized in that the method comprises the steps of adding an adjacency matrix of a self-loop graph, wherein D is a degree matrix formed by all nodes, and the calculation method comprises the following steps: is the element of row a and column a in the degree matrix D, Is an adjacency matrixElements of row a and column b;
W (l) is a trainable weight matrix of the first layer of the architecture feature extractor, H (l) is a node feature vector of the first layer of the architecture feature extractor, and H (0) =X, wherein the input H (0) representing the whole architecture feature extractor is an initial node feature vector X;
next, the architecture representation is input into a hierarchy divider, predicting the performance hierarchy to which the corresponding architecture belongs;
step 2.3, designing a loss function, and guiding the training process of the proxy model:
The loss function of the design proxy model is expressed as:
L=L2+λL1,
where L 1 is the cross entropy loss function, m is the batch size of the input data, L 2 is a designed characteristic loss function, wherein the function psi (x) =max (x, 0), F (x i,xj) represents Euclidean distance between architecture representation x i and architecture representation x j, x i and x j are inputs of a hierarchy divider, namely the output of a hierarchy feature extractor, when t i=tj, an intermediate parameter y ij =1, otherwise, the intermediate parameter y ij =0, and margin is a critical value parameter for controlling the feature distance between different hierarchies, lambda is a super parameter for controlling the importance between two different loss functions, and the total loss function L is adopted to guide the training of the proxy model until convergence;
The step 3 comprises the following steps:
Step 3.1, generating a new architecture by adopting a crossover operator based on a simple path;
step 3.2, predicting the performance level of the newly generated architecture by using the proxy model;
Step 3.1 comprises:
Step 3.1.1, selecting two father frameworks needing to be crossed, regarding the frameworks as directed graphs, numbering all nodes, and specifically, according to the fitness value of all framework individuals in a population, giving higher selection probability to the high-performance frameworks, giving lower selection probability to the low-performance frameworks, selecting the two father frameworks from the population according to a roulette algorithm, performing cross operation, regarding the two father frameworks as two directed graphs, and numbering the nodes in each graph from an input node to an output node in sequence;
Step 3.1.2, deconstructing two directed graphs to obtain all simple paths of the directed graphs to form a simple path candidate pool, wherein for each directed graph, all simple paths of the directed graph are found, namely, no repeated node sequence exists between an input node and an output node, all simple paths are combined into one path candidate pool, and node sequence numbers and operations are reserved;
Step 3.1.3, randomly selecting a plurality of simple paths to reconstruct a directed graph of a sub-architecture, wherein the number of paths in a path candidate pool is M, a path number L is randomly generated from [1, M ], then, L simple paths are randomly selected from the path candidate pool again, and the sub-architecture graph is reconstructed by the L simple paths;
Step 3.1.4, inheriting the operation in different modes according to different nodes, wherein the operation directly reserves the node inherited by the father architecture, combines the generated new node, and randomly reserves one node from the node operation of the two father architectures;
Step 3.1.5, repeating the steps 3.1.1 to 3.1.4 until a set number of new architectures are generated;
step 3.2 comprises:
performance level prediction is performed on the newly generated architecture by using the constructed proxy model for predicting the architecture performance level, wherein for the set performance level K, the newly generated architecture is divided into K levels, which are expressed as:
Wherein Arch i represents the predicted ith architecture, Is the predicted performance level of the ith architecture.
2. The method of performance-level agent-assisted evolutionary neural architecture search of claim 1, wherein step 4 comprises:
step 4.1, obtaining all feature representations of the training data by using an architecture feature extractor in the proxy model, and then calculating a hierarchy center of the performance hierarchy;
Step 4.2, using an architecture feature extractor in the proxy model to obtain a feature representation of each architecture in the high level, calculating the distance between the feature representation and the center of the highest performance level, and selecting the architecture closest to the center for performance evaluation;
and 4.3, if the updating condition of the proxy model is met, updating the proxy model by using the latest architecture performance data, and carrying out iterative updating on the proxy model on line.
3. The method of performance-level agent-assisted evolutionary neural architecture search of claim 2, wherein step 4.1 comprises:
for m architectures, the feature representation is extracted for all architectures in the training dataset Data using the proxy model, expressed as:
Wherein X i represents the feature representation of the extracted i-th architecture Arch i;
According to the performance level, calculating the centers C of K levels respectively, and sequentially extracting each level architecture The center C of the hierarchy is calculated, expressed as:
wherein C t represents the level center of the t-th level, m t represents the number of architectures of the t-th level, A feature representation representing an ith architecture in a nth hierarchy.
4. The method of performance-level agent-assisted evolutionary neural architecture search of claim 3, wherein step 4.2 comprises:
In the result of agent model prediction, obtaining the feature representation of each architecture in the highest hierarchy, calculating the distance between the feature representation and the center of the highest hierarchy, selecting the V architectures with the smallest distance, wherein the V architectures are the network architectures most worthy of complete training by using a data set, so as to obtain the performance of the V architectures on a task data set, and storing the result in an Archive;
Step 4.3 comprises:
If the evolution of the agent model reaches the set algebra, the latest record Archive is used, the latest training Data set Data is manufactured along with the hierarchical division method, and the agent model is updated by using the training Data set Data.
5. The performance-level-agent-assisted-based evolutionary-neural-architecture-search system implemented by the performance-level-agent-assisted-based evolutionary-neural-architecture-search method according to any one of claims 1 to 4, comprising:
the evolution operator module based on the simple path is used for generating a new neural network architecture and pushing the population evolution process;
The architecture feature extraction module based on the graph neural network is used for extracting topology information and operation information of a network architecture and obtaining vector representation of the network architecture;
and a proxy model based on performance hierarchy division, which is used for establishing a mapping relation between architecture representation and performance hierarchy and predicting the performance hierarchy of the neural network architecture.
CN202411379555.1A 2024-09-30 2024-09-30 Evolutionary neural architecture search method and system based on performance-level agent assistance Active CN118917353B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411379555.1A CN118917353B (en) 2024-09-30 2024-09-30 Evolutionary neural architecture search method and system based on performance-level agent assistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411379555.1A CN118917353B (en) 2024-09-30 2024-09-30 Evolutionary neural architecture search method and system based on performance-level agent assistance

Publications (2)

Publication Number Publication Date
CN118917353A CN118917353A (en) 2024-11-08
CN118917353B true CN118917353B (en) 2025-01-28

Family

ID=93308940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411379555.1A Active CN118917353B (en) 2024-09-30 2024-09-30 Evolutionary neural architecture search method and system based on performance-level agent assistance

Country Status (1)

Country Link
CN (1) CN118917353B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119623515B (en) * 2025-02-17 2025-05-23 南京信息工程大学 Evolutionary neural architecture searching method and system based on similarity agent assistance

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245436A (en) * 2019-06-19 2019-09-17 山东大学 A Parallel Analog Circuit Optimization Method Based on Genetic Algorithm and Machine Learning
CN115130483A (en) * 2022-07-13 2022-09-30 湘潭大学 Neural architecture searching method based on multi-target group intelligent algorithm and application

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668797B2 (en) * 2006-04-07 2010-02-23 Gary Kuvich Active semiotic system for image and video understanding by robots and unmanned vehicles, methods and apparatus
US11360447B2 (en) * 2017-02-10 2022-06-14 Johnson Controls Technology Company Building smart entity system with agent based communication and control
US20220107628A1 (en) * 2020-10-07 2022-04-07 The Boeing Company Systems and methods for distributed hierarchical control in multi-agent adversarial environments
CN115860057A (en) * 2021-09-24 2023-03-28 中移(苏州)软件技术有限公司 Data processing method, device and equipment
CN117152568B (en) * 2023-11-01 2024-01-30 常熟理工学院 Deep integration model generation method and device and computer equipment
CN117874644A (en) * 2024-01-11 2024-04-12 南京信息工程大学 An improved sequential feature selection method, system and device based on agent model
CN118297102A (en) * 2024-04-03 2024-07-05 平安科技(深圳)有限公司 Neural network architecture searching method and device, electronic equipment and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245436A (en) * 2019-06-19 2019-09-17 山东大学 A Parallel Analog Circuit Optimization Method Based on Genetic Algorithm and Machine Learning
CN115130483A (en) * 2022-07-13 2022-09-30 湘潭大学 Neural architecture searching method based on multi-target group intelligent algorithm and application

Also Published As

Publication number Publication date
CN118917353A (en) 2024-11-08

Similar Documents

Publication Publication Date Title
Chen et al. Techniques for automated machine learning
CN114373101A (en) Image classification method for neural network architecture search based on evolution strategy
CN119623515B (en) Evolutionary neural architecture searching method and system based on similarity agent assistance
Pan et al. Neural architecture search based on evolutionary algorithms with fitness approximation
WO2022126448A1 (en) Neural architecture search method and system based on evolutionary learning
CN116629352A (en) Hundred million-level parameter optimizing platform
CN116108384B (en) A neural network architecture search method, device, electronic device and storage medium
CN118917353B (en) Evolutionary neural architecture search method and system based on performance-level agent assistance
CN119150925B (en) Generative adversarial network architecture search method and system based on hybrid convolution operation
He et al. Simple-Encoded evolving convolutional neural network and its application to skin disease image classification
CN113792845A (en) Prediction model training method and device based on meta-learning, electronic equipment and medium
WO2025091967A1 (en) Method and apparatus for generating deep ensemble model, and computer device
CN117034100A (en) Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture
CN119167985A (en) Graph neural network pre-training method and system based on community structure commonality
CN119204082B (en) Particle swarm neural architecture searching method and system based on ternary comparison agent assistance
Wan et al. RSSM-Net: Remote sensing image scene classification based on multi-objective neural architecture search
CN119940484A (en) Neural network architecture search method based on existing knowledge to generate structure and coefficients
Ali et al. Recent trends in neural architecture search systems
Xue et al. Similarity surrogate-assisted evolutionary neural architecture search with dual encoding strategy.
CN115620807B (en) Method for predicting interaction strength between target protein molecule and drug molecule
CN119988236B (en) Intelligent equipment software testing method based on convolutional neural network combined with genetic algorithm
Shi et al. JLR-GCN: Joint label-aware and relation-aware graph convolution neural network for heterogeneous graph representations
CN119151703B (en) Multi-target large-scale community detection method based on agent model self-adaptive selection
Li et al. Surrogate-Assisted Evolution of Convolutional Neural Networks by Collaboratively Optimizing the Basic Blocks and Topologies
CN120181178B (en) Multi-target hybrid evolutionary neural architecture searching method and system based on advantage classifier assistance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant