[go: up one dir, main page]

CN112801264A - Dynamic differentiable space architecture searching method and system - Google Patents

Dynamic differentiable space architecture searching method and system Download PDF

Info

Publication number
CN112801264A
CN112801264A CN202011271696.3A CN202011271696A CN112801264A CN 112801264 A CN112801264 A CN 112801264A CN 202011271696 A CN202011271696 A CN 202011271696A CN 112801264 A CN112801264 A CN 112801264A
Authority
CN
China
Prior art keywords
space
matrix
subspace
search
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011271696.3A
Other languages
Chinese (zh)
Other versions
CN112801264B (en
Inventor
杨隆兴
胡瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202011271696.3A priority Critical patent/CN112801264B/en
Publication of CN112801264A publication Critical patent/CN112801264A/en
Application granted granted Critical
Publication of CN112801264B publication Critical patent/CN112801264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提出一种动态可微分的空间架构搜索方法与系统,将空间采样与可微分搜索结合,仅对采样空间进行可微分搜索,同时于搜索过程中更新搜索空间的概率分布,用以指导下一次的采样。多次迭代后算法收敛,继而根据相应参数确定最终的搜索结构。这样既可以通过仅优化子空间来加速搜索,又能够以采样的方式使搜索在多个子空间下进行,跳出可微分优化导致的局部最优解,找到更好的网络结构。

Figure 202011271696

The present invention proposes a dynamic differentiable space architecture search method and system, which combines space sampling with differentiable search, performs differentiable search only on the sampling space, and updates the probability distribution of the search space during the search process, so as to guide the following one sample. After several iterations, the algorithm converges, and then the final search structure is determined according to the corresponding parameters. In this way, the search can be accelerated by only optimizing the subspace, and the search can be carried out in multiple subspaces in a sampling manner, jumping out of the local optimal solution caused by differentiable optimization, and finding a better network structure.

Figure 202011271696

Description

Dynamic differentiable space architecture searching method and system
Technical Field
The invention relates to the field of neural network architecture search in deep learning, in particular to a dynamic differentiable space architecture search method and system.
Background
Neural network architecture search (NAS for short) is a branch of deep learning, and is used to reduce trial and error cost of manually designing a network architecture and automatically search a network architecture with better performance. The NAS can be divided into three methods, namely, a reinforcement learning method, an evolutionary algorithm method and a differentiable method, around how to improve the search efficiency and how to find a better structure. The first two methods require extremely large search resources due to the need to retrain each structure, and thus the search algorithm that is currently mainstream is a micro-classifiable algorithm. The differentiable method requires integrating all structures of the search space with a weight sharing mechanism, then optimizing weights and structure parameters based on gradients, and finally selecting structures through the structure parameters.
The differentiable search requires the integration of the structure of the whole search space, which has disadvantages in efficiency and performance. On one hand, a weight-shared super network is generated after search space integration, and training of the super network in the search process needs large video memory overhead, so that a hardware threshold is raised. Meanwhile, forward and backward propagation can be time consuming, slowing down the overall search time. On the other hand, the differentiable method utilizes the gradient to search, is limited by initial conditions and an optimization process, is easy to fall into a local optimal solution, and even has structure collapse, so that a good structure is difficult to find. This type of approach often requires multiple searches, which in turn reduces the efficiency of the search.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for searching a dynamic differentiable space and an architecture, which comprises the following steps:
step 1, obtaining a search space by gathering a plurality of neural network architectures to be searched, sharing all the neural network architectures in the search space by weight to generate a super network, representing a directed acyclic graph of the super network by a matrix to obtain a space matrix, wherein the row number of the space matrix represents the edge of the graph, and the column represents a candidate operation;
step 2, calculating a confidence threshold value of each element of the space matrix, sequencing the confidence threshold values of all the elements, and generating a subspace matrix by selecting the maximum topK, wherein the subspace matrix corresponds to a sampled subspace;
step 3, optimizing the weight parameters, the spatial parameters and the structural parameters of the candidate operation in the subspace matrix by using a back propagation algorithm; the number of iterations of the optimization is M steps, and the iteration number is used for controlling the convergence degree of the parameters in the subspace.
And 4, obtaining the importance degrees of all candidate operations of each edge in the subspace matrix according to the spatial parameters and the structural parameters to construct an importance degree matrix, judging whether the current iteration times reach the preset times, if not, executing the step 2 again until the current iteration times reach the preset times, and selecting the candidate operation with the maximum importance degree of each edge according to the importance degree matrix to form a final neural network architecture.
The method for searching the dynamic differentiable space and the architecture comprises the step that the super network comprises the number of layers, the number of channels, the step length, the topological structure and candidate operation information among nodes of the neural network.
The method for searching the dynamic differentiable space and the architecture comprises the step that the value of an element in a space matrix is 1 or 0, and the value is used for indicating whether the operation of the edge is a candidate operation of the search space.
The method for searching the dynamic differentiable space and the architecture comprises the following steps of 4, wherein the importance degree of all candidate operations of each edge is as follows:
if the value of the position corresponding to the subspace matrix is 1, the importance degree of the ith iteration is the spatial probability + the structure conditional probability + the importance degree of the (i-1) th iteration, otherwise, the importance degree of the ith iteration is the importance degree of the (i-1) th iteration, wherein the spatial probability is the spatial parameter, and the structure conditional probability is the structural parameter.
The method for searching the dynamic differentiable space and the architecture comprises the following specific process of calculating the upper confidence bound value in the step 2:
a confidence threshold is computed for each element of the spatial matrix, UCB ═ spatial probability + c × sqrt (lnT/T), where UCB is the confidence threshold, the spatial probability is from the initial value at the first iteration, followed by the spatial parameters from step 3, c is the confidence coefficient, T is the number of subspaces sampled in total, and T is the number of times the subspace matrix value for the operation is 1.
The method for searching the dynamic differentiable space and the architecture comprises the following steps of 3, wherein the optimization process specifically comprises the following steps:
the set of candidate operations in the subspace matrix is O, O representing a particular operation,
Figure BDA0002777885320000021
denotes the mixing operation, ω, ζ, α are weight, space, structural parameters, k is used to denote the k-th operation, thenThe calculation of the mixing operation between two nodes is
Figure BDA0002777885320000022
Wherein p isk,qkRespectively representing the spatial probability of the k-th operation and the conditional structure probability in this space, pk=sigmoid(ζk),
Figure BDA0002777885320000023
Figure BDA0002777885320000031
ZkAnd obtaining weight, space and structure parameters for the subspace matrix value of the kth operation among the nodes through back propagation updating. And the step number of the back propagation optimization is M steps, the step number is used for controlling the convergence degree of parameters in the subspace, the size of the parameter M is selected according to the size of the subspace, when the M is larger, the convergence degree of the subspace is better, but the search is easy to fall into a local optimal solution through overfitting. When M is small, the subspace does not converge well, and the assessment of its importance will be inaccurate.
The invention also provides a system for searching dynamic differentiable space and architecture, comprising:
the method comprises the steps that a module 1 is used for collecting a plurality of neural network architectures to be searched to obtain a search space, all the neural network architectures in the search space are shared through weights to generate a super network, a directed acyclic graph of the super network is represented by a matrix to obtain a space matrix, the row number of the space matrix represents the edge of the graph, and the column represents candidate operation;
the module 2 is used for calculating the upper confidence limit value of each element of the space matrix, sequencing the upper confidence limit values of all the elements, and generating a subspace matrix by selecting the largest topK, wherein the subspace matrix corresponds to a sampled subspace;
and the module 3 is used for optimizing the weight parameters, the space parameters and the structural parameters of the candidate operation in the subspace matrix by using a back propagation algorithm, wherein the optimization step number is M steps and is used for controlling the convergence degree of the parameters in the subspace.
And the module 4 is used for obtaining the importance degrees of all candidate operations of each edge in the subspace matrix according to the spatial parameters and the structural parameters so as to construct an importance degree matrix, judging whether the current iteration times reach the preset times, and if not, operating the module 2 again until the current iteration times reach the preset times. Wherein the iteration times are sampling times. And selecting the candidate operation with the maximum importance degree of each edge according to the importance degree matrix to form a final neural network architecture.
The system for searching the dynamic differentiable space and the architecture comprises the super network, wherein the super network comprises the number of layers, the number of channels, the step length, the topological structure and candidate operation information among nodes of the neural network.
The system for searching the dynamic differentiable space and the architecture comprises a space matrix, wherein the value of an element in the space matrix is 1 or 0, and the space matrix is used for indicating whether the operation of the edge is a candidate operation of the search space.
The system for searching the dynamic differentiable space and the architecture comprises the following importance degrees of all candidate operations of each edge in the module 4:
if the value of the corresponding position of the subspace matrix is 1, the importance degree of the ith iteration is the space probability structure condition probability + the importance degree of the ith-1 iteration, otherwise, the importance degree of the ith iteration is the importance degree of the ith-1 iteration, wherein the space probability is the space parameter, and the structure condition probability is the structure parameter;
the specific process of calculating the upper confidence limit value in the module 2 is as follows:
computing a confidence threshold value for each element of the spatial matrix, UCB ═ spatial probability + c × sqrt (lnT/T), where UCB is the confidence threshold value, the spatial probability comes from the initial value at the first iteration, then the spatial parameters from module 3, c is the confidence coefficient, T is the number of total sampling times in the subspace, and T is the number of times the subspace matrix value of the operation is 1;
the module 3 optimization process specifically comprises the following steps:
the set of candidate operations in the subspace matrix is O, O representing a particular operation,
Figure BDA0002777885320000041
representing the blending operation, ω, ζ, α are weight, spatial, structural parameters, k is used to represent the kth operation, and the blending operation between two nodes is calculated as
Figure BDA0002777885320000042
Wherein p isk,qkRespectively representing the spatial probability of the k-th operation and the conditional structure probability in this space, pk=sigmoid(ζk),
Figure BDA0002777885320000043
Figure BDA0002777885320000044
ZkAnd obtaining weight, space and structure parameters for the subspace matrix value of the kth operation among the nodes through back propagation updating, wherein I is an indication function, the condition establishment value in the parenthesis is 1, and otherwise, the condition establishment value is 0.
According to the scheme, the invention has the advantages that: the invention combines space sampling and differentiable searching, only differentiable searching is carried out on the sampling space, and meanwhile, the probability distribution of the searching space is updated in the searching process so as to guide the next sampling. After multiple iterations, the algorithm converges, and then the final search structure is determined according to the corresponding parameters. Therefore, the search can be accelerated by optimizing the subspace, and the search can be carried out under a plurality of subspaces in a sampling mode, so that a local optimal solution caused by differential optimization is skipped, and a better network structure is found.
Drawings
FIG. 1 is a flow diagram of a method of dynamically differentiable spatial and architectural searches of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a generated hyper-network of an embodiment of the present invention;
FIG. 3 is a schematic diagram of subspace sampling in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a network architecture option according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a dynamic differentiable space and architectural search in accordance with an embodiment of the present invention.
Detailed Description
The sample space is a subspace of the entire search space. From a set perspective, the sample space is a subset of the entire search, both belonging to an inclusion relationship. The sample space may be any subset of the entire search space, except for the empty set. For a certain sampling, other than the sampling space, the sampling space is not sampled at this time, and may be sampled later. There is no space outside the sampling space.
In a first aspect, the present invention provides a method for searching a dynamically differentiable space and architecture, comprising the steps of:
step 1, determining a search space and generating a weight-shared super network.
Determining the search space, i.e. determining the neural network architecture set to be searched, for example, designing a network with 10 layers, each layer being selectable with 2 convolution kernels, such as 3 × 3 and 5 × 5, so that there are 2^10 ^ 1024 network structures in total. Specifically, the super network includes information such as the number of layers, the number of channels, the step size, the topology, and candidate operations between nodes of the neural network. Then, all the structures of the set are processed through a weight sharing strategy to generate a hyper-network, namely a directed acyclic graph. The graph can be represented by a matrix, referred to as a spatial matrix. The rows of the spatial matrix represent the edges of the graph and the columns represent candidate operations. The value of the spatial matrix is 1 or 0, 1 indicates that the operation of the edge is a candidate operation for searching the space, and 0 is not. The matrix initialization value is 1, which represents a full space matrix.
And 2, sampling a sub-space from the search space, namely sampling a sub-graph from the directed acyclic graph corresponding to the hyper-network.
Before sampling, the search space needs to be evaluated to balance the exploration and utilization of the space. The evaluation is based on the UCB (Upper Confidence Bound) principle, i.e. the UCB value is calculated for each element of the matrix. For each candidate operation, UCB ═ spatial probability + c ═ sqrt (lnT/t). The spatial probability first step is from an initial value, then the spatial probability value from the 4 th step, c is a confidence coefficient, for example, 1.44, T is the total number of times of sampling in the subspace, T is the number of times of the subspace matrix value of the operation being 1, a spatial matrix is generated for each sampling, K values in the matrix are 1, other values are 0, and the position with the value of 1 is counted once. The total number of counts of a position after several samples is the number of times that the position matrix value is 1. One position here corresponds to one operation. Then, the UCB values of all elements are sorted, and the largest topK is selected. This creates a spatial matrix of 1's and 0's, where K of 1's correspond to the K preceding positions. The spatial matrix is a subspace matrix, corresponding to the sampled subspace.
And 3, carrying out differentiable search on the subspace.
According to the subspace matrix of 2, the method only optimizes the candidate operation with the median value of 1 in the subspace matrix in the ultra-network. The optimization uses a back propagation algorithm. The optimized parameters include three types, namely a weight parameter, a spatial parameter and a structural parameter, a weight parameter (such as a parameter of a convolution kernel) respectively representing a candidate operation, a parameter of spatial probability distribution and a parameter of structural conditional probability distribution under a given space. The spatial and structural parameters may also form a matrix, which is the same size as the spatial matrix. The weight parameter is used to converge the hyper-network, while it affects the spatial and structural parameters.
And 4, evaluating the importance degree of the candidate operation and the UCB value of the search space.
The importance level is a variable that determines the final network architecture. This is also a matrix, called the importance matrix. The size of which is the same as the spatial matrix, each value of the matrix being a non-negative real number. The matrix element value calculation mode is as follows: if the value of the corresponding position of the subspace matrix is 1, the importance degree of the ith iteration is equal to the space probability structure conditional probability + the importance degree of the ith-1 th iteration, otherwise, the importance degree of the ith iteration is equal to the importance degree of the ith-1 th iteration. Wherein the spatial and structural conditional probabilities are calculated by the spatial and structural parameters of step 3, respectively. In addition, the UCB value is a parameter for balance exploration and utilization, which is used to guide the next spatial sampling, and the calculation formula is shown in step 2.
In a second aspect, the present invention provides a method and apparatus for dynamic differentiable space and architecture search, comprising the following modules:
A. and the super network generation module is used for constructing a weight-shared super network for the determined search space. The super network is a directed acyclic graph and can be represented by a space matrix. The spatial matrix rows are edges and columns are candidate operations. Its value is 0 or 1, 1 indicates that the operation of the edge is a candidate operation for searching the space, 0 is not. The initial values of the matrices are all 1, representing the full space.
B. And the subspace sampling module is used for exploring and utilizing the search space, and extracting the subspace with the maximum profit, namely the subspace with the maximum UCB value, from the search space for searching.
C. And the differentiable optimization module is used for updating the weight parameter, the space parameter and the structural parameter by the gradient.
D. And the importance degree evaluation module is used for calculating and updating the importance degree of each candidate operation.
E. And the network architecture selection module determines the final network architecture according to the importance degree.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Example 1
FIG. 1 is a search method for dynamically differentiable space and architecture provided by the present invention, comprising the steps of:
s11: and determining a search space, generating a weight-shared super network, and constructing a matrix for a directed acyclic graph of the super network.
In an embodiment of this step: as shown in fig. 2, the super-network is formed by stacking n-layer cells (cells), which have two types, namely, normal cells (normal cells) and down-sampling cells (reduction cells), wherein the down-sampling cells are located in the network
Figure BDA0002777885320000061
And
Figure BDA0002777885320000062
layers, with the general cells being located in other layers. The inside of the unit is a directed acyclic graph which comprises v nodes and e edges, and each edge is formed by m candidate operations, such as zero operation, convolution operation, pooling operation and the like. Thus, by determining the topology of the cell and the type of candidate operation, the search space can also be determined. The unit comprises all search structures which form the whole complete unit in a weight sharing mode. Meanwhile, the method represents the whole search space in a matrix form, and the search space is called as a space matrix. The rows of the spatial matrix represent edges and the columns represent candidate operations. The matrix has a value of 1 or 0, 1 indicating that the operation of the edge is a candidate operation for searching the space, and 0 is not. The initial values of the matrices are all 1, representing the full space. In fig. 2, a cell is composed of 4 nodes, the number of edges is 6, the candidate operation is 3, and the size of the matrix is 6 × 3. Note that since the super network is formed by stacking two types of cells, it is not necessary to construct a matrix for the super network, and only two types of cells need to be constructed separately.
S12: and sampling a subspace from the search space, namely sampling a sub-graph from a directed acyclic graph corresponding to the hyper-network, and carrying out topK selection on the basis of UCB (unified principal component analysis) of sampling.
In an embodiment of this step: as shown in fig. 3, before sampling, the search space needs to be evaluated to balance the exploration and utilization of the space, and the evaluation is calculated based on UCB (upper Confidence bound), that is, the UCB value is calculated for each element of the matrix. Where the UCB value comes from the initial value first, followed by the result from S14. Then, the UCB values of all elements are sorted, the largest topK is selected, and then a spatial matrix consisting of 1 and 0 is generated, where the number of 1 is K, corresponding to the position of the previous K numbers. The spatial matrix is referred to herein as a subspace matrix, and represents a sub-graph, corresponding to a subspace. Note that K in fig. 3 is 12, and the UCB value is only for understanding, not necessarily the value that the method produces.
S13: and carrying out differentiable search on the subspace, and carrying out differentiable optimization on the weight parameter, the space parameter and the structure parameter.
In an embodiment of this step: from the spatial matrix of S12, the present invention optimizes only the candidate operations in the hyper-network with a spatial matrix value of 1. The optimization uses a back propagation algorithm, and the optimized parameters include three types, namely a weight parameter, a spatial parameter and a structural parameter, a weight parameter (such as a parameter of a convolution kernel) representing a candidate operation, a parameter of a spatial probability distribution and a parameter of a structural condition probability distribution under a given space respectively. For more precise description, it is assumed here that the set of candidate operations is O, which represents a specific operation, i.e. an element of O,
Figure BDA0002777885320000073
indicating a mixing operation. ω, ζ, α are weight, spatial, structural parameters, k is used to represent the k-th operation, and the computation of the hybrid operation between two nodes is
Figure BDA0002777885320000071
Wherein x represents a feature map, pk,qkRespectively representing the spatial probability of the k-th operation and the conditional structure probability in the space, which are calculated by pk=sigmoid(ζk),
Figure BDA0002777885320000072
ZkThe subspace matrix value of the kth operation among the nodes is shown, k' is an index subscript, and n is the operation number. Therefore, the network forward propagation can be determined, and then the three parameters are updated according to the back propagation algorithm.
S14: and evaluating the candidate operation importance degree and the UCB value of the search space, wherein the importance degree is calculated according to the space parameters and the structure parameters optimized in the S13, and the UCB value is calculated according to the space parameters.
In an embodiment of this step: the importance degree is a variable used for determining the final network architecture, the size of the final network architecture is the same as a space matrix, and each value of the matrix is a non-negative real number. The matrix element value calculation mode is as follows: if the value of the corresponding position of the subspace matrix is 1, the importance degree of the ith iteration is equal to the space probability structure condition probability + the importance degree of the ith-1 th iteration, otherwise, the importance degree of the ith iteration is equal to the importance degree of the ith-1 st iteration. The spatial and structural conditional probabilities are calculated by the spatial parameters and the structural parameters, respectively, as detailed in S13. Meanwhile, for each candidate operation, a UCB value is calculated, where c is a confidence coefficient, T is a total number of subspace samples, and T is a number of times that the subspace matrix value for the operation is 1.
S15: and judging whether the algorithm reaches the specified iteration times, if not, jumping to S12, and if so, jumping to S16.
S16: and selecting the final network architecture according to the importance degree of the candidate operation, and selecting the operation with the maximum importance degree of each edge.
In an embodiment of this step: the selection strategy is to select the candidate operation with the greatest importance degree at each edge, so that the selected operations form the final network architecture, and the step is implemented as shown in fig. 4, and it is noted that the values of the importance degree matrix are only used for understanding, but not necessarily the values generated by the method
Example 2
The embodiment of the invention also provides a device for searching the dynamic differentiable space and the architecture, which comprises: the system comprises a hyper network generation module 21, a subspace sampling module 22, a differentiable optimization module 23, an evaluation module 24 and a network architecture selection module 25.
The super network generation module 21 is used for integrating all network structures to be searched into a super network in a weight sharing mode; a subspace sampling module 22 for selecting a subspace according to the UCB value topK; the differentiable optimization module 23 is used for carrying out differentiable optimization on the weight parameters, the structural parameters and the space parameters in the generated sub-network; the evaluation module 24 is configured to evaluate a candidate operation importance degree and a UCB value, where the former is used for network architecture selection, and the latter is used for guiding next subspace sampling; the network architecture selection module 25 selects the final network architecture according to the importance level.
The following are system embodiments corresponding to the above method embodiments, and this embodiment mode can be implemented in cooperation with the above embodiment modes. The related technical details mentioned in the above embodiments are still valid in the present embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a system for searching dynamic differentiable space and architecture, comprising:
the method comprises the steps that a module 1 is used for collecting a plurality of neural network architectures to be searched to obtain a search space, all the neural network architectures in the search space are shared through weights to generate a super network, a directed acyclic graph of the super network is represented by a matrix to obtain a space matrix, the row number of the space matrix represents the edge of the graph, and the column represents candidate operation;
the module 2 is used for calculating the upper confidence limit value of each element of the space matrix, sequencing the upper confidence limit values of all the elements, and generating a subspace matrix by selecting the largest topK, wherein the subspace matrix corresponds to a sampled subspace;
a module 3 for optimizing weight parameters, spatial parameters and structural parameters of the candidate operations in the subspace matrix by using a back propagation algorithm;
and the module 4 is used for obtaining the importance degrees of all candidate operations of each edge in the subspace matrix according to the spatial parameters and the structural parameters so as to construct an importance degree matrix, judging whether the current iteration times reach the preset times, if not, running the module 2 again until the current iteration times reach the preset times, and selecting the candidate operation with the maximum importance degree of each edge according to the importance degree matrix to form a final neural network architecture.
The system for searching the dynamic differentiable space and the architecture comprises the super network, wherein the super network comprises the number of layers, the number of channels, the step length, the topological structure and candidate operation information among nodes of the neural network.
The system for searching the dynamic differentiable space and the architecture comprises a space matrix, wherein the value of an element in the space matrix is 1 or 0, and the space matrix is used for indicating whether the operation of the edge is a candidate operation of the search space.
The system for searching the dynamic differentiable space and the architecture comprises the following importance degrees of all candidate operations of each edge in the module 4:
if the value of the corresponding position of the subspace matrix is 1, the importance degree of the ith iteration is the space probability structure condition probability + the importance degree of the ith-1 iteration, otherwise, the importance degree of the ith iteration is the importance degree of the ith-1 iteration, wherein the space probability is the space parameter, and the structure condition probability is the structure parameter;
the specific process of calculating the upper confidence limit value in the module 2 is as follows:
computing a confidence threshold value for each element of the spatial matrix, UCB ═ spatial probability + c × sqrt (lnT/T), where UCB is the confidence threshold value, the spatial probability comes from the initial value at the first iteration, then the spatial parameters from module 3, c is the confidence coefficient, T is the number of total sampling times in the subspace, and T is the number of times the subspace matrix value of the operation is 1;
the module 3 optimization process specifically comprises the following steps:
the set of candidate operations in the subspace matrix is O, O representing a particular operation,
Figure BDA0002777885320000101
representing the blending operation, ω, ζ, α are weight, spatial, structural parameters, k is used to represent the kth operation, and the blending operation between two nodes is calculated as
Figure BDA0002777885320000102
Wherein p isk,qkRespectively representing the spatial probability of the k-th operation and the conditional structure probability in this space, pk=sigmoid(ζk),
Figure BDA0002777885320000103
Figure BDA0002777885320000104
ZkAnd obtaining weight, space and structure parameters for the subspace matrix value of the kth operation among the nodes through back propagation updating.

Claims (10)

1.一种动态可微分空间和架构搜索的方法,其特征在于,包括:1. a method for dynamic differentiable space and architecture search, is characterized in that, comprises: 步骤1、通过集合多个待搜索的神经网络架构得到搜索空间,将该搜索空间中所有神经网络架构通过权重共享生成超网络,将该超网络的有向无环图以矩阵表示,得到空间矩阵,该空间矩阵的行数表示图的边,列表示候选操作;Step 1. Obtain a search space by gathering multiple neural network architectures to be searched, generate a super network through weight sharing of all neural network architectures in the search space, and represent the directed acyclic graph of the super network as a matrix to obtain a space matrix. , the number of rows of the space matrix represents the edges of the graph, and the columns represent the candidate operations; 步骤2、对该空间矩阵的每一个元素计算上置信界值,将所有元素的上置信界值进行排序,通过选择最大的topK个,生成子空间矩阵,该子空间矩阵对应着采样的子空间;Step 2. Calculate the upper confidence boundary value for each element of the space matrix, sort the upper confidence boundary value of all elements, and generate a subspace matrix by selecting the largest topK, which corresponds to the sampled subspace. ; 步骤3、使用反向传播算法优化该子空间矩阵中候选操作的权重参数、空间参数和结构参数;Step 3, using the back-propagation algorithm to optimize the weight parameters, spatial parameters and structural parameters of the candidate operations in the subspace matrix; 步骤4、根据该空间参数和该结构参数得到该子空间矩阵中每条边的所有候选操作的重要程度,以构建重要程度矩阵,判断当前迭代次数是否达到预定次数,若否则再次执行该步骤2,直到当前迭代次数达到预定次数,根据该重要程度矩阵,选择每条边重要程度最大的候选操作,构成最终的神经网络架构。Step 4. Obtain the importance of all candidate operations of each edge in the subspace matrix according to the space parameter and the structure parameter, so as to construct an importance matrix, and determine whether the current number of iterations has reached a predetermined number of times. If not, perform step 2 again. , until the current number of iterations reaches a predetermined number of times, and according to the importance matrix, select the candidate operation with the greatest importance of each edge to form the final neural network architecture. 2.如权利要求1所述的动态可微分空间和架构搜索的方法,其特征在于,该超网络包括神经网络的层数、通道数、步长、拓扑结构和节点间候选操作信息,且步骤3中反向传播算法优化的迭代次数为M,用于控制子空间中参数的收敛程度。2. The method for dynamic differentiable space and architecture search as claimed in claim 1, wherein the super network comprises the number of layers, the number of channels, the step size, the topology and the candidate operation information between nodes of the neural network, and the step The number of iterations for the optimization of the backpropagation algorithm in 3 is M, which is used to control the degree of convergence of the parameters in the subspace. 3.如权利要求1所述的动态可微分空间和架构搜索的方法,其特征在于,空间矩阵中元素的值为1或0,用于表示边的操作是否是搜索空间的候选操作。3 . The method for dynamic differentiable space and architecture search according to claim 1 , wherein the value of the element in the space matrix is 1 or 0, which is used to indicate whether the operation of the edge is a candidate operation of the search space. 4 . 4.如权利要求1所述的动态可微分空间和架构搜索的方法,其特征在于,该步骤4中每条边的所有候选操作的重要程度为:4. The method for dynamic differentiable space and architecture search as claimed in claim 1, wherein the importance of all candidate operations of each edge in this step 4 is: 如果子空间矩阵对应位置的值为1则第i次迭代的重要程度=空间概率*结构条件概率+第i-1次迭代的重要程度,否则第i次迭代的重要程度=第i-1次迭代的重要程度,其中空间概率为该空间参数,结构条件概率为该结构参数。If the value of the corresponding position of the subspace matrix is 1, the importance of the ith iteration = spatial probability * structural conditional probability + the importance of the i-1th iteration, otherwise the importance of the ith iteration = the i-1th The importance of iteration, where the spatial probability is the spatial parameter and the structural conditional probability is the structural parameter. 5.如权利要求1所述的动态可微分空间和架构搜索的方法,其特征在于,该步骤2中计算上置信界值具体过程为:5. the method for dynamic differentiable space and architecture search as claimed in claim 1, is characterized in that, in this step 2, the concrete process of calculating upper confidence boundary value is: 对该空间矩阵的每一个元素计算上置信界值,UCB=空间概率+c*sqrt(lnT/t),其中UCB为该上置信界值,该空间概率在第一次迭代时来自初始值,之后来自步骤3的空间参数,c为置信系数,T为子空间总共采样次数,t为该操作的子空间矩阵值为1的次数。Calculate the upper confidence boundary value for each element of the spatial matrix, UCB=spatial probability+c*sqrt(lnT/t), where UCB is the upper confidence boundary value, the spatial probability comes from the initial value in the first iteration, After the spatial parameters from step 3, c is the confidence coefficient, T is the total sampling times of the subspace, and t is the number of times the subspace matrix value of this operation is 1. 6.如权利要求1所述的动态可微分空间和架构搜索的方法,其特征在于,该步骤3优化过程具体为:6. the method for dynamic differentiable space and architecture search as claimed in claim 1, is characterized in that, this step 3 optimization process is specifically: 该子空间矩阵中候选操作集合为O,o表示具体操作,
Figure FDA0002777885310000021
表示混合操作,ω,ζ,α为权重、空间、结构参数,k用来表示第k种操作,则在两个节点间混合操作的计算为
Figure FDA0002777885310000022
其中,pk,qk分别表示第k种操作的空间概率和在此空间下的条件结构概率,pk=sigmoid(ζk),
Figure FDA0002777885310000023
Figure FDA0002777885310000024
Zk为节点间第k种操作的子空间矩阵值,通过反向传播更新得到权重、空间、结构参数。
The set of candidate operations in the subspace matrix is O, where o represents a specific operation,
Figure FDA0002777885310000021
represents the mixing operation, ω, ζ, and α are the weight, space, and structure parameters, and k is used to represent the kth operation, then the calculation of the mixing operation between the two nodes is as follows
Figure FDA0002777885310000022
Among them, p k , q k represent the spatial probability of the kth operation and the conditional structure probability in this space, respectively, p k =sigmoid(ζ k ),
Figure FDA0002777885310000023
Figure FDA0002777885310000024
Z k is the subspace matrix value of the k-th operation between nodes, and the weight, space and structure parameters are obtained by back-propagation update.
7.一种动态可微分空间和架构搜索的系统,其特征在于,包括:7. A system for dynamic differentiable space and architecture search, comprising: 模块1、用于集合多个待搜索的神经网络架构得到搜索空间,将该搜索空间中所有神经网络架构通过权重共享生成超网络,将该超网络的有向无环图以矩阵表示,得到空间矩阵,该空间矩阵的行数表示图的边,列表示候选操作;Module 1. It is used to collect multiple neural network architectures to be searched to obtain a search space, generate a super network through weight sharing of all neural network architectures in the search space, and express the directed acyclic graph of the super network as a matrix to obtain a space. matrix, the number of rows of the space matrix represents the edges of the graph, and the columns represent the candidate operations; 模块2、用于对该空间矩阵的每一个元素计算上置信界值,将所有元素的上置信界值进行排序,通过选择最大的topK个,生成子空间矩阵,该子空间矩阵对应着采样的子空间;Module 2. It is used to calculate the upper confidence boundary value for each element of the space matrix, sort the upper confidence boundary value of all elements, and generate a subspace matrix by selecting the largest topK, which corresponds to the sampled subspace; 模块3、用于反向传播算法优化该子空间矩阵中候选操作的权重参数、空间参数和结构参数;Module 3, used for the back-propagation algorithm to optimize the weight parameters, spatial parameters and structural parameters of the candidate operations in the subspace matrix; 模块4、用于根据该空间参数和该结构参数得到该子空间矩阵中每条边的所有候选操作的重要程度,以构建重要程度矩阵,判断当前迭代次数是否达到预定次数,若否则再次运行该模块2,直到当前迭代次数达到预定次数,根据该重要程度矩阵,选择每条边重要程度最大的候选操作,构成最终的神经网络架构。Module 4. It is used to obtain the importance of all candidate operations of each edge in the subspace matrix according to the space parameter and the structure parameter, so as to construct an importance matrix, and determine whether the current iteration number reaches a predetermined number, otherwise, run the process again. In module 2, until the current number of iterations reaches a predetermined number of times, according to the importance matrix, select the candidate operation with the greatest importance of each edge to form the final neural network architecture. 8.如权利要求1所述的动态可微分空间和架构搜索的系统,其特征在于,该超网络包括神经网络的层数、通道数、步长、拓扑结构和节点间候选操作信息,且模块3中反向传播算法优化的迭代次数为M,用于控制子空间中参数的收敛程度。8. The system for dynamic differentiable space and architecture search as claimed in claim 1, wherein the super network comprises the number of layers, the number of channels, the step size, the topology and the candidate operation information between nodes of the neural network, and the module The number of iterations for the optimization of the backpropagation algorithm in 3 is M, which is used to control the degree of convergence of the parameters in the subspace. 9.如权利要求1所述的动态可微分空间和架构搜索的系统,其特征在于,空间矩阵中元素的值为1或0,用于表示边的操作是否是搜索空间的候选操作。9 . The system for dynamic differentiable space and architecture search according to claim 1 , wherein the value of the elements in the space matrix is 1 or 0, which is used to indicate whether the operation of the edge is a candidate operation of the search space. 10 . 10.如权利要求1所述的动态可微分空间和架构搜索的系统,其特征在于,该模块4中每条边的所有候选操作的重要程度为:10. The system for dynamic differentiable space and architecture search as claimed in claim 1, wherein the importance of all candidate operations of each edge in the module 4 is: 如果子空间矩阵对应位置的值为1则第i次迭代的重要程度=空间概率*结构条件概率+第i-1次迭代的重要程度,否则第i次迭代的重要程度=第i-1次迭代的重要程度,其中空间概率为该空间参数,结构条件概率为该结构参数;If the value of the corresponding position of the subspace matrix is 1, the importance of the ith iteration = spatial probability * structural conditional probability + the importance of the i-1th iteration, otherwise the importance of the ith iteration = the i-1th The importance of iteration, where the spatial probability is the spatial parameter, and the structural conditional probability is the structural parameter; 该模块2中计算上置信界值具体过程为:The specific process of calculating the upper confidence boundary value in module 2 is as follows: 对该空间矩阵的每一个元素计算上置信界值,UCB=空间概率+c*sqrt(lnT/t),其中UCB为该上置信界值,该空间概率在第一次迭代时来自初始值,之后来自模块3的空间参数,c为置信系数,T为子空间总共采样次数,t为该操作的子空间矩阵值为1的次数;Calculate the upper confidence boundary value for each element of the spatial matrix, UCB=spatial probability+c*sqrt(lnT/t), where UCB is the upper confidence boundary value, the spatial probability comes from the initial value in the first iteration, Then the space parameters from module 3, c is the confidence coefficient, T is the total sampling times of the subspace, and t is the number of times the subspace matrix value of the operation is 1; 该模块3优化过程具体为:The optimization process of module 3 is as follows: 该子空间矩阵中候选操作集合为O,o表示具体操作,
Figure FDA0002777885310000031
表示混合操作,ω,ζ,α为权重、空间、结构参数,k用来表示第k种操作,则在两个节点间混合操作的计算为
Figure FDA0002777885310000032
其中,pk,qk分别表示第k种操作的空间概率和在此空间下的条件结构概率,pk=sigmoid(ζk),
Figure FDA0002777885310000033
Figure FDA0002777885310000034
Zk为节点间第k种操作的子空间矩阵值,通过反向传播更新得到权重、空间、结构参数。
The set of candidate operations in the subspace matrix is O, where o represents a specific operation,
Figure FDA0002777885310000031
represents the mixing operation, ω, ζ, and α are the weight, space, and structure parameters, and k is used to represent the kth operation, then the calculation of the mixing operation between the two nodes is as follows
Figure FDA0002777885310000032
Among them, p k , q k represent the spatial probability of the kth operation and the conditional structure probability in this space, respectively, p k =sigmoid(ζ k ),
Figure FDA0002777885310000033
Figure FDA0002777885310000034
Z k is the subspace matrix value of the k-th operation between nodes, and the weight, space and structure parameters are obtained by back-propagation update.
CN202011271696.3A 2020-11-13 2020-11-13 A Dynamically Differentiable Spatial Architecture Search Method and System Active CN112801264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011271696.3A CN112801264B (en) 2020-11-13 2020-11-13 A Dynamically Differentiable Spatial Architecture Search Method and System

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011271696.3A CN112801264B (en) 2020-11-13 2020-11-13 A Dynamically Differentiable Spatial Architecture Search Method and System

Publications (2)

Publication Number Publication Date
CN112801264A true CN112801264A (en) 2021-05-14
CN112801264B CN112801264B (en) 2023-06-13

Family

ID=75806168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011271696.3A Active CN112801264B (en) 2020-11-13 2020-11-13 A Dynamically Differentiable Spatial Architecture Search Method and System

Country Status (1)

Country Link
CN (1) CN112801264B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115034A (en) * 2022-06-30 2022-09-27 深圳市腾讯计算机系统有限公司 Image processing network generation, image classification method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851566A (en) * 2019-11-04 2020-02-28 沈阳雅译网络技术有限公司 An Improved Differentiable Network Structure Search Method
WO2020046719A1 (en) * 2018-08-31 2020-03-05 D5Ai Llc Self-supervised back propagation for deep learning
US20200082275A1 (en) * 2018-09-10 2020-03-12 Fujitsu Limited Neural network architecture search apparatus and method and computer readable recording medium
US20200104688A1 (en) * 2018-09-27 2020-04-02 Swisscom Ag Methods and systems for neural architecture search
CN111767983A (en) * 2020-05-29 2020-10-13 中国科学院大学 Discretized Differentiable Neural Network Search Method Based on Entropy Loss Function

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020046719A1 (en) * 2018-08-31 2020-03-05 D5Ai Llc Self-supervised back propagation for deep learning
US20200082275A1 (en) * 2018-09-10 2020-03-12 Fujitsu Limited Neural network architecture search apparatus and method and computer readable recording medium
US20200104688A1 (en) * 2018-09-27 2020-04-02 Swisscom Ag Methods and systems for neural architecture search
CN110851566A (en) * 2019-11-04 2020-02-28 沈阳雅译网络技术有限公司 An Improved Differentiable Network Structure Search Method
CN111767983A (en) * 2020-05-29 2020-10-13 中国科学院大学 Discretized Differentiable Neural Network Search Method Based on Entropy Loss Function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"自动化所智能感知与计算研究中心团队提出多自由度网络架构协同搜索新方法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115034A (en) * 2022-06-30 2022-09-27 深圳市腾讯计算机系统有限公司 Image processing network generation, image classification method and related device
CN115115034B (en) * 2022-06-30 2025-01-24 深圳市腾讯计算机系统有限公司 Image processing network generation, image classification method and related device

Also Published As

Publication number Publication date
CN112801264B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN119597493B (en) Intelligent evolution method and system of distributed computing resources based on digital twins
CN114626506B (en) A neural network unit structure search method and system based on attention mechanism
CN113283426A (en) Embedded target detection model generation method based on multi-target neural network search
CN113780146B (en) Hyperspectral image classification method and system based on lightweight neural architecture search
CN113780542B (en) A construction method of multi-objective network structure for FPGA
CN112381208A (en) Neural network architecture searching method and system with gradual depth optimization
CN119150925B (en) Generative adversarial network architecture search method and system based on hybrid convolution operation
CN109951392B (en) An intelligent routing method for medium and large networks based on deep learning
CN115062759B (en) A Fault Diagnosis Method Based on Improved Long Short-Term Memory Neural Network
CN113095501A (en) Deep reinforcement learning-based unbalanced classification decision tree generation method
CN119474644A (en) A small sample regression prediction method for tunnel blasting effect
CN119180305A (en) Multi-target neural architecture searching method and system based on gradient similarity super network
CN119167985A (en) Graph neural network pre-training method and system based on community structure commonality
WO2020248440A1 (en) Machine learning method and apparatus
CN112270058A (en) Optical network multi-channel transmission quality prediction method based on echo state network
CN112801264A (en) Dynamic differentiable space architecture searching method and system
CN118917353B (en) Evolutionary neural architecture search method and system based on performance-level agent assistance
CN119886226B (en) Neural architecture searching method based on diffusion evolution algorithm
CN120163113A (en) A wafer-level chip system design space construction and fast parameter search method
Wan et al. RSSM-Net: Remote sensing image scene classification based on multi-objective neural architecture search
CN119356848A (en) Application-aware configuration parameter self-evolution tuning method and system
CN119066944A (en) Spacecraft attitude control parameter optimization method considering dynamic partitioning and sorting model assistance
CN119988236B (en) Intelligent equipment software testing method based on convolutional neural network combined with genetic algorithm
Wang et al. Facilitating hardware-aware neural architecture search with learning-based predictive models
CN117689865A (en) Target detection method and system based on feature and fusion mode search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant