[go: up one dir, main page]

WO2020237688A1 - Method and device for searching network structure, computer storage medium and computer program product - Google Patents

Method and device for searching network structure, computer storage medium and computer program product Download PDF

Info

Publication number
WO2020237688A1
WO2020237688A1 PCT/CN2019/089697 CN2019089697W WO2020237688A1 WO 2020237688 A1 WO2020237688 A1 WO 2020237688A1 CN 2019089697 W CN2019089697 W CN 2019089697W WO 2020237688 A1 WO2020237688 A1 WO 2020237688A1
Authority
WO
WIPO (PCT)
Prior art keywords
network structure
training
parameters
search
search space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/089697
Other languages
French (fr)
Chinese (zh)
Inventor
蒋阳
庞磊
胡湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Priority to CN201980009276.7A priority Critical patent/CN111684472A/en
Priority to PCT/CN2019/089697 priority patent/WO2020237688A1/en
Publication of WO2020237688A1 publication Critical patent/WO2020237688A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of machine learning, and in particular to a method and device for network structure search, computer storage media, and computer program products.
  • the embodiments of the present application provide a method and device for searching a network structure, a computer storage medium, and a computer program product.
  • the network structure search method in the implementation manner of this application includes:
  • the step of defining search space determining the search space of the neural network model to be searched for the network structure, the search space defining a variety of operations on the operation layer between every two nodes in the convolutional neural network;
  • Pre-training step training the general map of the search space according to the first network structure with preset parameters of the first network structure to obtain the general map with pre-training parameters, the general map being constituted by the operation;
  • Training step training the general map with the pre-training parameters according to the first network structure and updating the first network structure according to the amount of feedback of the first network structure.
  • the network structure search device of the embodiment of the present application includes a processor and a memory, the memory stores one or more programs, and the processor is used to define a search space: determine the search space of the neural network model to be searched for the network structure,
  • the search space defines a variety of operations on the operation layer between every two nodes in the convolutional neural network; and is used for pre-training: according to the first network structure, preset parameters of the first network structure Training the general map of the search space to obtain the general map with pre-training parameters, the general map being composed of the operations; and for training: training the general map with the pre-training parameters according to the first network structure
  • the computer storage medium of the embodiment of the present application stores a computer program thereon, and when the computer program is executed by a computer, the computer executes the above-mentioned method.
  • a computer program product containing instructions according to an embodiment of the present application when the instructions are executed by a computer, the computer executes the above-mentioned method.
  • the general map is first performed with the preset parameters of the fixed first network structure.
  • Pre-training makes the overall picture with pre-training parameters fully trained. After the pre-training is completed, let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch. Improve the credibility of the first network structure and ensure that the searched model is globally optimal.
  • FIG. 1 is a schematic flowchart of a method for searching a network structure according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of modules of a network structure search device according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the principle of a method for searching a network structure in related technologies
  • FIG. 4 is a schematic diagram of a general diagram of a network structure search method according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of the principle of a network structure search method according to another embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a network structure search method according to still another embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a network structure search method according to still another embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a method for searching a network structure according to still another embodiment of the present application.
  • FIG. 12 is a schematic diagram of the penalty effect of the network structure search method in the embodiment of the present application.
  • the network structure search device 10 the memory 102, the processor 104, and the communication interface 106.
  • first and second are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of the features. In the description of this application, “multiple” means two or more than two, unless otherwise specifically defined.
  • connection should be interpreted broadly unless otherwise clearly specified and limited.
  • it can be a fixed connection or a detachable connection.
  • Connected or integrally connected it can be mechanically connected, or electrically connected or can communicate with each other; it can be directly connected, or indirectly connected through an intermediate medium, it can be the internal communication of two components or the interaction of two components relationship.
  • connection should be interpreted broadly unless otherwise clearly specified and limited.
  • it can be a fixed connection or a detachable connection.
  • Connected or integrally connected it can be mechanically connected, or electrically connected or can communicate with each other; it can be directly connected, or indirectly connected through an intermediate medium, it can be the internal communication of two components or the interaction of two components relationship.
  • an embodiment of the present application provides a method and device 10 for searching a network structure.
  • the network structure search method in the implementation manner of this application includes:
  • Step S12 Determine the search space of the neural network model to be searched for the network structure, the search space defines a variety of operations on the operation layer between every two nodes in the neural network model;
  • Pre-training step S14 training a whole graph of the search space with preset parameters of the first network structure according to the first network structure to obtain a total graph of pre-training parameters, the total graph being composed of operations;
  • Training step S16 training the general map with pre-training parameters according to the first network structure and updating the first network structure according to the feedback amount (ACC) of the first network structure.
  • the network structure search apparatus 10 of the embodiment of the present application includes a processor 104 and a memory 102.
  • the memory 102 stores one or more programs.
  • the processor 104 is used to execute the steps of defining the search space. : Determine the search space of the neural network model to be searched for the network structure.
  • the search space defines a variety of operations on the operation layer between every two nodes in the neural network model; and is used to perform pre-training steps: according to the first network
  • the structure trains the total map of the search space with the preset parameters of the first network structure to obtain the total map of pre-training parameters, the total map is composed of operations; and is used to perform the training step: training the total map with pre-training parameters according to the first network structure Figure and update the first network structure according to the feedback amount of the first network structure.
  • the network structure search method of the embodiment of the present application can be implemented by the network structure search apparatus 10 of the embodiment of the present application.
  • step S12, step S14, and step S16 may be implemented by the processor 104.
  • the overall image is pre-trained with the preset parameters of the fixed first network structure, so that there is pre-training.
  • the overall picture of the parameters is fully trained.
  • let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch.
  • the number of processors 104 may be one.
  • the number of processors 104 may also be multiple, such as 2, 3, 5, or other numbers.
  • the steps of the network structure search method of the present application may be executed by different processors 104, for example, different processors 104 may execute step S14 and step S16 respectively.
  • the network structure search apparatus 10 may further include a communication interface 106 for outputting data processed by the network structure search apparatus 10, and/or input data to be processed by the network structure search apparatus 10 from an external device .
  • the processor 104 is used to control the communication interface 106 to input and/or output data.
  • network structure search is a technology that uses algorithms to automatically design neural network models.
  • the network structure search is to search out the structure of the neural network model.
  • the neural network model to be searched for the network structure is Convolutional Neural Networks (CNN).
  • the problem to be solved by the network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network structures. Further, the nodes in the neural network model can be understood as the characteristic layer in the neural network model. The operation between two nodes refers to the operation required to transform the feature data on one node into the feature data on the other node. The operations mentioned in this application may be other neural network operations such as convolution operations, pooling operations, or fully connected operations. It can be considered that the operation between two nodes constitutes the operation layer between these two nodes. Generally, there are multiple searchable operations on the operation layer between two nodes, that is, multiple candidate operations. The purpose of network structure search is to determine an operation on each operation layer.
  • the NAS After the NAS establishes the search space, it usually uses the first network structure to sample the second network structure in the search space, then trains the second network structure to converge to determine the amount of feedback, and finally uses the amount of feedback to update the first network structure.
  • a network structure After the NAS establishes the search space, it usually uses the first network structure to sample the second network structure in the search space, then trains the second network structure to converge to determine the amount of feedback, and finally uses the amount of feedback to update the first network structure.
  • a network structure After the NAS establishes the search space, it usually uses the first network structure to sample the second network structure in the search space, then trains the second network structure to converge to determine the amount of feedback, and finally uses the amount of feedback to update the first network structure.
  • a network structure After the NAS establishes the search space, it usually uses the first network structure to sample the second network structure in the search space, then trains the second network structure to converge to determine the amount of feedback, and finally uses the amount of feedback to update the first network structure.
  • the idea of NAS is to obtain a network structure in the search space through a first network structure, and then obtain the accuracy rate R according to the network structure, and use the accuracy rate R as feedback to update the first network structure. Continue to optimize to get another network structure, and repeat until the best result is obtained.
  • the first network structure is constructed by Recurrent Neural Network (RNN).
  • RNN Recurrent Neural Network
  • the first network structure can also be constructed by Convolutional Neural Networks (CNN) or long and short-term memory artificial nerves.
  • CNN Convolutional Neural Networks
  • LSTM Long-Short Term Memory
  • Fig. 4 In the process of using the efficient network structure search based on weight sharing, operations are connected into a general map, and the final optimal structure searched is one of the sub-graphs in the general map.
  • the overall graph In the example of Fig. 4, the overall graph is formed by the operation of nodes.
  • the connection mode of the optimal structure with edges in bold in Fig. 4 is a subgraph of the overall picture.
  • ENAS adopts a weight sharing strategy. After sampling a network structure every time, it is no longer directly trained to convergence, but a batch is trained. After multiple iterations, the overall graph can finally converge. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.
  • the parameters of the overall graph can be fixed (fix), and then the first network structure is trained.
  • the general graph can be sampled to obtain the second network structure, and predictions can be made according to the second network structure to train the parameters of the first network structure and obtain the feedback amount of the first network structure, thereby updating the first network structure .
  • the efficient network structure search based on weight sharing can save time and improve the efficiency of network structure search because each time the network structure is searched, the parameters that can be shared are shared. For example, in the example in Figure 4, if after searching for node 1, node 3 and node 6 and training the searched network structure, node 1, node 2, node 3 and node 6 are searched this time, then, The relevant parameters of the network structure trained when node 1, node 3 and node 6 are searched can be applied to the training of the network structure searched this time. In this way, it is possible to improve efficiency through weight sharing.
  • ENAS can increase the efficiency of NAS by more than 1000 times.
  • the searched network often has a large bias, that is, the network structure searched through ENAS will always tend to For operations with larger kernel size.
  • the first network structure will always search conv5*5 .
  • the searched model parameters are larger. In this way, even if the computing power is sufficient, there is no restriction on the model parameters, and the model with more parameters is easy to overfit, the generalization ability decreases, and it is difficult to debug and train.
  • the bias of the first network structure means that the first network structure converges to a local optimal solution, and the search space cannot be fully explored (explore). Such a first network structure does not have high credibility, and cannot guarantee that the model we searched for is the global optimal.
  • the above-mentioned problems are more serious, and the first network structure needs to be adjusted very finely so that it does not converge to the local optimal solution. Otherwise, the problem will invalidate the ENAS framework. But fine-tuning the first network structure will make ENAS very cumbersome, and the lost original intention of AutoML will become difficult to use. Moreover, fine adjustment cannot guarantee a better first network structure.
  • the method and device 10 for searching the network structure of the embodiment of the present application before optimizing the general map and the first network structure, pre-train the general map with the preset parameters of the fixed first network structure, so that The master map with pre-training parameters is fully trained. After the pre-training is completed, let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch. Improve the credibility of the first network structure and ensure that the searched model is globally optimal.
  • the neural network structure obtained from the network structure search can generally be trained and verified through sample data.
  • the sample data includes verification samples and training samples.
  • the verification samples can be used to verify whether the network structure is good or not, and the training samples can be used to train the network structure searched by the network structure search method.
  • the training sample may also include a training set (train) and a test set (valid).
  • the training sample can be divided into a training set and a test set.
  • the training set and the test set may be divided by the user, or divided by the device 10 for searching the network structure.
  • the processor 104 may be used to divide the training sample into a training set and a test set.
  • the first network structure is constructed by LSTM.
  • the training set is used to train the parameters of the search structure, such as the parameters of the structure calculated by conv3*3, sep5*5.
  • Cell is used to search the network structure.
  • the LSTM parameters are trained on the test set.
  • the training set is used to train the parameters of the search structure
  • the test set is used to train the parameters of the LSTM.
  • the verification sample is used to verify whether the network structure searched after training is good.
  • the number of training samples is 10, and the training samples are divided into a training set of 8 and a test set of 2, and the training set of 8 is used to train the searched network structure.
  • a test set of 2 is used to train the LSTM.
  • the verification sample can be used to verify the network structure in this embodiment, that is, the verification sample in this embodiment can remain unchanged.
  • the training set in the training sample can be used in the pre-training step S14 and the training step S16, that is, the pre-training step S14 and the training step S16 can be trained on the same training set.
  • step S14 includes:
  • Step S142 Set the parameters of the first network structure as preset parameters of the first network structure.
  • step S142 may be implemented by the processor 104, that is, the processor 104 may be configured to set the parameters of the first network structure as preset parameters of the first network structure.
  • the parameter setting of the first network structure in the pre-training process is realized.
  • the parameters of the first network structure in the pre-training process are fixed.
  • step S14 further includes:
  • Step S144 sampling an operation at each operation layer of the search space according to the first network structure and the preset parameters of the first network structure to obtain a sub-picture of the overall picture;
  • Step S146 Use a batch of training data of the training set to train the sub-images of the overall image to obtain the overall image with pre-training parameters.
  • step S144 and step S146 can be implemented by the processor 104, that is to say, the processor 104 can be used to sample one sample at each operation layer of the search space with the preset parameters of the first network structure according to the first network structure. Operate to obtain a sub-image of the overall image, and the sub-image used to train the overall image using a batch of training data of the training set to obtain the overall image with pre-training parameters.
  • the pre-training of the overall image is realized.
  • the first network structure samples operations with fixed preset parameters, so that the probability of sampling each operation is uniformly distributed. That is to say, in the pre-training process, when the first network structure samples corresponding operations with fixed preset parameters, the probability of sampling each operation is equal, and the size of the convolution kernel will not affect the first network structure. The probability of sampling to each operation.
  • step S144 and step S146 are iterated until the total number of first iterations is completed.
  • steps S144 and S146 can perform pre-training on one sub-graph at a time, that is, after each pre-training, there is always a change in the parameters of one sub-graph in the overall picture with pre-training parameters. Therefore, steps S144 and S146 The iterative process allows multiple subgraphs to be pre-trained. By controlling the total number of first iterations of pre-training, each subgraph in the overall graph can be sampled, so that the overall graph can be fully pre-trained, that is, A general map with pre-trained parameters.
  • the search space has 5 layers, and each layer has 4 optional operations, which is equivalent to a 4X5 graph.
  • Network structure search needs to select an operation at each layer, which is equivalent to path optimization on the graph.
  • step S144 randomly samples an operation in each operation layer of the search space, and then connects the sampled operations to obtain a network structure.
  • This network structure is a sub-graph in the general graph of the search space.
  • Step S146 is This network structure is trained on a batch of test data in the training set; then, step S144 is repeated and another operation is randomly sampled at each layer to obtain another network structure. In step S146, this network structure is trained on another batch of test data in the training set.
  • the training set can be divided into multiple batches of training data, and step S144 and step S146 are continuously repeated until the multiple batches of training data of the training set are used up, that is, one iteration (epoch).
  • epoch one iteration
  • some subgraphs in the overall image were randomly sampled, and the corresponding subgraphs were also pre-trained.
  • the overall image was trained in the same way to complete the second epoch... In this way, multiple iterations were performed. Then sufficient pre-training of the overall picture can be realized.
  • the total number of first iterations may be 310 times. It can be understood that, in other embodiments, the value of the total number of second iterations may be 100 times, 200 times, or other values.
  • step S16 includes:
  • Training the master map step S162 training the master map with pre-training parameters according to the first network structure
  • Training the first network structure step S164 determining the feedback amount of the first network structure and updating the first network structure according to the feedback amount of the first network structure.
  • step S162 and step S164 can be implemented by the processor 104, that is to say, the processor 104 can be used to train the general map with pre-training parameters according to the first network structure, and to determine the amount of feedback and according to the amount of feedback. Update the first network structure.
  • step S162 and step S164 are iterated until the total number of second iterations is completed.
  • the overall graph and the first network structure are alternately optimized. Specifically, the total number of second iterations can enable the overall graph and the first network structure to be fully trained.
  • step S162 includes:
  • Step S1622 Sampling an operation in each operation layer of the search space according to the first network structure to obtain a sub-graph of the overall graph with pre-training parameters;
  • Step S1624 Use a batch of training data in the training set to train the sampled subgraphs.
  • step S1622 and step S1624 can be implemented by the processor 104, that is, the processor 104 can be used to sample an operation at each operation layer of the search space according to the first network structure to obtain a total of pre-training parameters.
  • ENAS adopts a weight sharing strategy. After sampling a network structure each time, it is no longer directly trained to convergence, but a batch is trained, and then the first network structure is trained. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.
  • the parameters of the first network structure can be released at this time, that is, the preset parameters of the first network structure may not be sampled in step S1622.
  • the probability of sampling to each operation may be different.
  • step S1622 and step S1624 can be performed cyclically.
  • each layer samples an operation, and then connect the sampled operations to obtain a subgraph, and train this on a batch of data in the training set Subgraph; then, sample one operation for each layer to obtain another subgraph, and then train this subgraph on another batch of data in the training set; then, continue to sample another subgraph for each layer and use another subgraph in the training set Train this subgraph on batch data...until the data in the training set is used up, that is, an epoch is completed.
  • step S162 of training the general map After an epoch is completed in step S162 of training the general map, the parameters of the general map are fixed, and the first network structure training step S164 is entered to realize the optimization of the general map and the first network structure alternately.
  • the parameters of the master map change as the training progresses, that is, the master map with pre-training parameters in each iteration continues with the parameters of the master map during the training process. Update until the completion of an epoch.
  • each time an epoch is completed in the training master map step S162 the parameters of the master map are fixed, and then the training first network structure step S164 is entered.
  • the training master map step S162 can be After every 10 epochs, the parameters of the overall graph are fixed, and then the first network structure training step S164 is entered.
  • the foregoing listing of the number of epochs completed in step S162 each time is only an example, and cannot be construed as a limitation of the application. It can be understood that the number of epochs completed each time in step S162 can be set according to actual conditions, and is not specifically limited here.
  • the total number of second iterations is 310. It can be understood that, in other embodiments, the value of the total number of second iterations may be 100 times, 200 times, or other values.
  • the total number of first iterations and the total number of second iterations are the same, while in other embodiments, the total number of first iterations and the total number of second iterations may be different. There is no specific limitation here.
  • Each operation layer of the search space corresponds to a time step of the long short-term memory artificial neural network (LSTM).
  • the cell of the long short-term memory artificial neural network (Cell) outputs one Hidden state, step S1622 includes:
  • the dimension of the feature vector is the same as the number of operations on each operation layer
  • the processor 104 may be used to map the hidden state into a feature vector, and to sample an operation at each operation layer according to the feature vector to obtain a sub-image of the overall image.
  • the dimension of the feature vector is the same as the number of operations on each operation layer.
  • the solid arrow represents timestep
  • time 1 represents the first cell of LSTM
  • time 2 represents the second cell of LSTM
  • so on The square conv3*3 represents the operation of this layer in the model
  • the circle represents the connection relationship between the operation layer and the operation layer.
  • the hidden state output by the cell is calculated to obtain conv3 ⁇ 3, conv3 ⁇ 3 is used as the input layer of the cell at time 2, and the hidden state output by the cell at time 1 is also used as the input of the cell at time 2.
  • Circle 1 is calculated.
  • circle 1 is used as the input of the cell at time 3
  • the hidden state of the cell output at time 2 is also used as the input of time 3.
  • the convolution sep5 ⁇ 5 is calculated and so on.
  • steps of sampling an operation at each operation layer according to the feature vector to obtain the network structure include:
  • the hidden state output by the cell of the LSTM is encoded (encoding), and it is mapped to a vector with a dimension of 6, which undergoes a normalized exponential function (softmax). , Becomes a probability distribution, sampling is performed according to this probability distribution, and the operation of the current layer is obtained. And so on to finally get a network structure.
  • step S164 the steps of determining the feedback amount include:
  • Step S1642 Sample an operation at each operation layer of the search space according to the first network structure to obtain the second network structure;
  • Step S1644 Use a batch of test data in the test set to predict the second network structure to determine the feedback amount of the first network structure.
  • step S1642 and step S1644 can be implemented by the processor 104, that is, the processor 104 can be used to sample an operation at each operation layer of the search space according to the first network structure to obtain the second network structure, and It is used to predict the second network structure using a batch of test data of the test set to determine the feedback amount of the first network structure.
  • the searched second network structure can be predicted on the test set to obtain feedback to update the first network structure according to the aforementioned formula.
  • the LSTM is not directly trained on the test set.
  • step S164 of training the first network structure loops a preset number of times, and step S164 includes:
  • step S1646 the first network structure is updated with the feedback amount of the first network structure determined in each cycle, and the number of feedback amounts of the first network structure is determined to be a preset number in each cycle.
  • the processor 104 is configured to cyclically train the first network structure for a preset number of times, and is configured to update the first network structure by using the feedback amount determined in each cycle.
  • the number of feedback amounts determined in each cycle is a preset number.
  • step S16 is looped 50 times, that is to say, the training general map step S162 and the training first network structure step S164 are iteratively performed during the training process, each time the training first network structure step 164 is executed , The first network structure is updated 50 times to realize the optimization of the first network structure.
  • the training master map step 162 in the next iteration uses the last updated first network structure to train the master map.
  • the preset number of times may be 10, 20, 30 or other values, and the specific number of the preset times is not limited herein.
  • step S1642 and step S1644 are cyclically performed to obtain a preset number of feedback amounts of the first network structure.
  • the preset number is 20, that is, the feedback amount of the first network structure is 20. It can be understood that in other examples, the preset number may be 10, 15, 25 or other numerical values. The specific value of the preset number is not limited here.
  • the first network structure is constructed according to a long and short-term memory network model, and step S164 is implemented by the following conditional formula:
  • R k is the k th feedback
  • ⁇ c is the short and long term memory parameter network model
  • a ( t-1): 1 ; ⁇ c ) is the probability of sampling to operation
  • m is the total number of feedbacks
  • T is the number of hyperparameters predicted by the first network structure.
  • T includes an operating layer and jumpers.
  • other hyperparameters to be optimized may also be included.
  • the specific content of T is not limited here.
  • the training of the overall graph and the update of the first network structure are performed in multiple iterations, and the total number of second iterations of alternate training of the overall graph and the first network structure is 310 times.
  • step S14 and step S16 are iterated 310 times. In this way, a better second network structure can be finally obtained.
  • the preset number of cycles of step S16 is 50 times.
  • the first network structure is updated 50 times and executed 50 times according to the above conditional expression. It can be understood that selecting the number of loops in step S16 to be 50 can reduce the randomness optimization caused by sampling.
  • step S16 Each time step S16 is looped, the number of sampled second network structures is 20 to obtain 20 feedback quantities, and the 20 feedback quantities are substituted into the above conditional expression as R k to realize the update of the first network structure.
  • the value of m is 20.
  • the training sample is traversed, it is an iteration.
  • the number of training samples is 10, and each time 2 samples are divided as a batch of the training set to train the subgraph. After 5 batches, the training samples are used up, and one iteration is completed.
  • step S164 the step of determining the feedback amount includes:
  • Step S1646 Adjust the feedback amount of the first network structure according to the preset penalty model.
  • step S1646 can be implemented by the processor 104, that is, the processor 104 can be used to adjust the feedback amount of the first network structure according to a preset penalty model.
  • the pre-training process may not train the overall image to fully converge.
  • the feedback amount is adjusted according to the preset penalty model to avoid the The prejudice caused by optimizing the first network structure when the convergence is insufficient, thereby further improving the credibility of the first network structure, and ensuring that the searched model is the global optimum.
  • step S1646 includes:
  • Step S1648 Determine the penalty item according to the preset information, the second network structure, the current number of iterations and the total number of second iterations;
  • Step S1649 Adjust the feedback amount according to the penalty item.
  • step S1648 and step S1649 can be implemented by the processor 104, that is, the processor 104 is used to determine the penalty item according to the preset information, the second network structure, the current number of iterations and the total number of iterations; and Adjust the amount of feedback according to the penalty.
  • the feedback amount is adjusted according to the penalty model. It can be understood that in order to avoid the introduction of bias in the optimization of the first network structure in the case of insufficient convergence of the general map during the search process. We add a penalty term to the larger convolution kernel to deal with the amount of feedback during prediction.
  • the preset information in step S1648 may be information input in advance by the user, or may be a result input by another calculation model.
  • the source of the preset information is not limited here.
  • step S162 and step S164 are performed iteratively.
  • the total number of second iterations refers to the total number of iterations of step S12 and step S164.
  • the current iteration number refers to how many iterations the current iteration is during the iteration process of step S162 and step S164.
  • the total number of the second iteration is 310. In the first iteration, the current iteration number is 1; in the second iteration, the current iteration number is 2; in the third iteration, the current iteration number is 3...and so on, step S162 and step S164 iterate 310 times and stop the iteration.
  • the total number of second iterations and the current number of iterations in each iteration can also be determined.
  • the total number of second iterations can be set by the user.
  • the current number of iterations can be accumulated as the iteration progresses.
  • the value of the penalty term can be gradually transitioned from 0.76 to 1.
  • the adjusted feedback amount gradually changes from ACC*0.76 to ACC.
  • the size of the penalty item may be related to the degree of convergence of the total graph during pre-training.
  • the preset information may include the total number of first iterations of pre-training, that is, step S1648 may be based on the pre-training. The total number of first iterations of training, the second network structure, the current number of iterations, and the total number of second iterations determine the penalty term.
  • the embodiment of the present application also provides a computer storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a computer, the computer executes the method of any of the above embodiments.
  • the embodiment of the present application also provides a computer program product containing instructions, which when executed by a computer causes the computer to execute the method of any one of the foregoing embodiments.
  • the computer storage medium and the computer program product of the embodiment of the present application optimize the network structure and the first network structure alternately, and adjust the feedback amount according to the preset penalty model, which can avoid the optimization of the first network structure when the overall map is fully converged. Therefore, the credibility of the first network structure is improved, and the searched model is guaranteed to be globally optimal.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

Provided is a method for searching network structure, including: the defining search space step: (step S12) determining the search space of the neural network model to be searched for the network structure, wherein the search space defines a variety of operations on the operation layer between every two nodes in the neural network model; the pre-training step: (step S14) training the general map of the search space with the preset parameters of the first network structure according to the first network structure to obtain the general map with the pre-training parameters, wherein the general map is composed of operations; the training step: (step S16) training the general map with the pre-training parameters according to the first network structure and updating the first network structure according to the feedback amount of the first network structure. The present application also discloses a network structure search device, a computer storage medium and a computer program product.

Description

网络结构搜索的方法及装置、计算机存储介质和计算机程序产品Method and device for searching network structure, computer storage medium and computer program product 技术领域Technical field

本申请涉及机器学习领域,特别涉及一种网络结构搜索的方法及装置、计算机存储介质和计算机程序产品。This application relates to the field of machine learning, and in particular to a method and device for network structure search, computer storage media, and computer program products.

背景技术Background technique

相关技术中,机器学习算法尤其深度学习算法近年来得到快速发展与广泛应用。随着应用场景和模型结构变得越来越复杂,在应用场景中得到最优模型的难度越来越大,其中,可以使用基于权值分享的高效网络结构搜索(Efficient Neural Architecture Search via Parameter Sharing,ENAS)来提高网络结构搜索(Neural Architecture Search,NAS)的效率。然而,通过ENAS搜索到的网络结构经常出现较大的偏见(bias),即,通过ENAS搜索到的网络结构总会倾向于卷积核尺寸(kernel size)较大的操作。这导致搜索到的模型参数较大,不易调试训练。另外,控制器带有偏见就意味着控制器收敛到的局部最优解,无法充分探索(explore)搜索空间。这样的控制器不具有较高的可信度,不能保证搜索到的模型就是全局最优。Among related technologies, machine learning algorithms, especially deep learning algorithms, have been rapidly developed and widely used in recent years. As the application scenarios and model structures become more and more complex, it is more and more difficult to obtain the optimal model in the application scenarios. Among them, the efficient network structure search based on weight sharing can be used (Efficient Neural Architecture Search via Parameter Sharing). , ENAS) to improve the efficiency of network structure search (Neural Architecture Search, NAS). However, the network structure searched by ENAS often has a larger bias, that is, the network structure searched by ENAS always tends to operate with a larger kernel size. This leads to larger model parameters, making it difficult to debug and train. In addition, the bias of the controller means that the local optimal solution to which the controller converges cannot be fully explored (explore) the search space. Such a controller does not have high credibility and cannot guarantee that the searched model is the global optimum.

发明内容Summary of the invention

本申请的实施方式提供一种网络结构搜索的方法及装置、计算机存储介质和计算机程序产品。The embodiments of the present application provide a method and device for searching a network structure, a computer storage medium, and a computer program product.

本申请实施方式的网络结构搜索的方法包括:The network structure search method in the implementation manner of this application includes:

定义搜索空间步骤:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述卷积神经网络中每两个节点之间的操作层上的多种操作;The step of defining search space: determining the search space of the neural network model to be searched for the network structure, the search space defining a variety of operations on the operation layer between every two nodes in the convolutional neural network;

预训练步骤:根据第一网络结构以所述第一网络结构的预设参数训练所述搜索空间的总图以得到具有预训练参数的所述总图,所述总图由所述操作构成;Pre-training step: training the general map of the search space according to the first network structure with preset parameters of the first network structure to obtain the general map with pre-training parameters, the general map being constituted by the operation;

训练步骤:根据所述第一网络结构训练具有所述预训练参数的所述总图和根据所述第一网络结构的反馈量更新所述第一网络结构。Training step: training the general map with the pre-training parameters according to the first network structure and updating the first network structure according to the amount of feedback of the first network structure.

本申请实施方式的网络结构搜索的装置包括处理器和存储器,所述存储器存储有一个或多个程序,所述处理器用于定义搜索空间:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述卷积神经网络中每两个节点之间的操作层上的多种操作;及用于预训练:根据第一网络结构以所述第一网络结构的预设参数训练所述搜索空间的总图以得到具有预训练参数的所述总图,所述总图由所述操作构成;以及用于训练:根 据所述第一网络结构训练具有所述预训练参数的所述总图和根据所述第一网络结构的反馈量更新所述第一网络结构。The network structure search device of the embodiment of the present application includes a processor and a memory, the memory stores one or more programs, and the processor is used to define a search space: determine the search space of the neural network model to be searched for the network structure, The search space defines a variety of operations on the operation layer between every two nodes in the convolutional neural network; and is used for pre-training: according to the first network structure, preset parameters of the first network structure Training the general map of the search space to obtain the general map with pre-training parameters, the general map being composed of the operations; and for training: training the general map with the pre-training parameters according to the first network structure The overall graph and the update of the first network structure according to the feedback amount of the first network structure.

本申请实施方式的计算机存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行上述的方法。The computer storage medium of the embodiment of the present application stores a computer program thereon, and when the computer program is executed by a computer, the computer executes the above-mentioned method.

本申请实施方式的包含指令的计算机程序产品,所述指令被计算机执行时使得计算机执行上述的方法。A computer program product containing instructions according to an embodiment of the present application, when the instructions are executed by a computer, the computer executes the above-mentioned method.

本申请实施方式的网络结构搜索的方法及装置、计算机存储介质和计算机程序产品,在对总图和第一网络结构进行优化前,先以固定的第一网络结构的预设参数对总图进行预训练,使得具有预训练参数的总图得到充分的训练。预训练完成后,放开第一网络结构的参数对总图和第一网络结构进行训练,从而对网络结构和第一网络结构进行优化,避免从头开始训练时优化第一网络结构导致的偏见,提高第一网络结构的可信度,保证搜索到的模型是全局最优。In the network structure search method and device, computer storage medium, and computer program product of the embodiment of this application, before optimizing the general map and the first network structure, the general map is first performed with the preset parameters of the fixed first network structure. Pre-training makes the overall picture with pre-training parameters fully trained. After the pre-training is completed, let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch. Improve the credibility of the first network structure and ensure that the searched model is globally optimal.

本申请的实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实施方式的实践了解到。The additional aspects and advantages of the embodiments of the present application will be partly given in the following description, and part of them will become obvious from the following description, or be understood through the practice of the embodiments of the present application.

附图说明Description of the drawings

本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become obvious and easy to understand from the description of the embodiments in conjunction with the following drawings, in which:

图1是本申请实施方式的网络结构搜索的方法的流程示意图;FIG. 1 is a schematic flowchart of a method for searching a network structure according to an embodiment of the present application;

图2是本申请实施方式的网络结构搜索的装置的模块示意图;2 is a schematic diagram of modules of a network structure search device according to an embodiment of the present application;

图3是相关技术的网络结构搜索的方法的原理示意图;FIG. 3 is a schematic diagram of the principle of a method for searching a network structure in related technologies;

图4是本申请实施方式的网络结构搜索的方法的总图示意图;4 is a schematic diagram of a general diagram of a network structure search method according to an embodiment of the present application;

图5是本申请另一实施方式的网络结构搜索的方法的流程示意图;FIG. 5 is a schematic flowchart of a network structure search method according to another embodiment of the present application;

图6是本申请另一实施方式的网络结构搜索的方法的流程示意图;FIG. 6 is a schematic flowchart of a network structure search method according to another embodiment of the present application;

图7是本申请又一实施方式的网络结构搜索的方法的流程示意图;FIG. 7 is a schematic flowchart of a network structure search method according to another embodiment of the present application;

图8是本申请又一实施方式的网络结构搜索的方法的原理示意图;FIG. 8 is a schematic diagram of the principle of a network structure search method according to another embodiment of the present application;

图9是本申请再一实施方式的网络结构搜索的方法的流程示意图;FIG. 9 is a schematic flowchart of a network structure search method according to still another embodiment of the present application;

图10是本申请再一实施方式的网络结构搜索的方法的流程示意图;FIG. 10 is a schematic flowchart of a network structure search method according to still another embodiment of the present application;

图11是本申请再一实施方式的网络结构搜索的方法的流程示意图;FIG. 11 is a schematic flowchart of a method for searching a network structure according to still another embodiment of the present application;

图12是本申请实施方式的网络结构搜索的方法的惩罚效果的示意图。FIG. 12 is a schematic diagram of the penalty effect of the network structure search method in the embodiment of the present application.

主要元件符号说明:Symbol description of main components:

网络结构搜索的装置10、存储器102、处理器104、通信接口106。The network structure search device 10, the memory 102, the processor 104, and the communication interface 106.

具体实施方式Detailed ways

下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The following embodiments described with reference to the accompanying drawings are exemplary, and are only used to explain the present application, and cannot be understood as a limitation to the present application.

在本申请的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In the description of this application, it should be understood that the terms "first" and "second" are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of the features. In the description of this application, "multiple" means two or more than two, unless otherwise specifically defined.

在本申请的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接或可以相互通信;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。In the description of this application, it should be noted that the terms "installation", "connection", and "connection" should be interpreted broadly unless otherwise clearly specified and limited. For example, it can be a fixed connection or a detachable connection. Connected or integrally connected; it can be mechanically connected, or electrically connected or can communicate with each other; it can be directly connected, or indirectly connected through an intermediate medium, it can be the internal communication of two components or the interaction of two components relationship. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific circumstances.

下文的公开提供了许多不同的实施方式或例子用来实现本申请的不同结构。为了简化本申请的公开,下文中对特定例子的部件和设置进行描述。当然,它们仅仅为示例,并且目的不在于限制本申请。此外,本申请可以在不同例子中重复参考数字和/或参考字母,这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施方式和/或设置之间的关系。The following disclosure provides many different embodiments or examples for realizing different structures of the present application. To simplify the disclosure of the present application, the components and settings of specific examples are described below. Of course, they are only examples and are not intended to limit the application. In addition, the present application may repeat reference numerals and/or reference letters in different examples. Such repetition is for the purpose of simplification and clarity, and does not indicate the relationship between the various embodiments and/or settings discussed.

下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The following embodiments described with reference to the accompanying drawings are exemplary, and are only used to explain the present application, and cannot be understood as a limitation to the present application.

请参阅图1和图2,本申请实施方式提供一种网络结构搜索的方法及装置10。Please refer to FIG. 1 and FIG. 2, an embodiment of the present application provides a method and device 10 for searching a network structure.

本申请实施方式的网络结构搜索的方法包括:The network structure search method in the implementation manner of this application includes:

定义搜索空间步骤S12:确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;Define search space Step S12: Determine the search space of the neural network model to be searched for the network structure, the search space defines a variety of operations on the operation layer between every two nodes in the neural network model;

预训练步骤S14:根据第一网络结构以第一网络结构的预设参数训练搜索空间的总图(whole graph)以得到预训练参数的总图,总图由操作构成;Pre-training step S14: training a whole graph of the search space with preset parameters of the first network structure according to the first network structure to obtain a total graph of pre-training parameters, the total graph being composed of operations;

训练步骤S16:根据第一网络结构训练具有预训练参数的总图和根据第一网络结构的反馈量(ACC)更新第一网络结构。Training step S16: training the general map with pre-training parameters according to the first network structure and updating the first network structure according to the feedback amount (ACC) of the first network structure.

本申请实施方式的网络结构搜索的装置10包括处理器104和存储器102,存储器102存储有一个或多个程序,在程序被处理器执行的情况下,使得处理器104用于执行定义搜索空间步骤:确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;及用于执行预训练步骤:根据第一网 络结构以第一网络结构的预设参数训练搜索空间的总图以得到预训练参数的总图,总图由操作构成;以及用于执行训练步骤:根据第一网络结构训练具有预训练参数的总图和根据第一网络结构的反馈量更新第一网络结构。The network structure search apparatus 10 of the embodiment of the present application includes a processor 104 and a memory 102. The memory 102 stores one or more programs. When the programs are executed by the processor, the processor 104 is used to execute the steps of defining the search space. : Determine the search space of the neural network model to be searched for the network structure. The search space defines a variety of operations on the operation layer between every two nodes in the neural network model; and is used to perform pre-training steps: according to the first network The structure trains the total map of the search space with the preset parameters of the first network structure to obtain the total map of pre-training parameters, the total map is composed of operations; and is used to perform the training step: training the total map with pre-training parameters according to the first network structure Figure and update the first network structure according to the feedback amount of the first network structure.

也就是说,本申请实施方式的网络结构搜索的方法可由本申请实施方式的网络结构搜索的装置10实现。具体地,步骤S12、步骤S14和步骤S16可以由处理器104实现。In other words, the network structure search method of the embodiment of the present application can be implemented by the network structure search apparatus 10 of the embodiment of the present application. Specifically, step S12, step S14, and step S16 may be implemented by the processor 104.

本申请实施方式的网络结构搜索的方法及装置10,在对网络结构和第一网络结构进行优化前,先以固定的第一网络结构的预设参数对总图进行预训练,使得具有预训练参数的总图得到充分的训练。预训练完成后,放开第一网络结构的参数对总图和第一网络结构进行训练,从而对网络结构和第一网络结构进行优化,避免从头开始训练时优化第一网络结构导致的偏见,提高第一网络结构的可信度,保证搜索到的模型是全局最优。In the method and device 10 for searching the network structure of the embodiment of the present application, before optimizing the network structure and the first network structure, the overall image is pre-trained with the preset parameters of the fixed first network structure, so that there is pre-training. The overall picture of the parameters is fully trained. After the pre-training is completed, let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch. Improve the credibility of the first network structure and ensure that the searched model is globally optimal.

需要说明的是,处理器104的数量可以是一个。处理器104的数量也可以是多个,例如2个、3个、5个或其他数量。在处理器104的数量为多个的情况下,本申请网络结构搜索的方法的步骤流程可以由不同的处理器104分别执行,例如可以由不同的处理器104分别执行步骤S14和步骤S16。It should be noted that the number of processors 104 may be one. The number of processors 104 may also be multiple, such as 2, 3, 5, or other numbers. When the number of processors 104 is multiple, the steps of the network structure search method of the present application may be executed by different processors 104, for example, different processors 104 may execute step S14 and step S16 respectively.

可选地,网络结构搜索的装置10还可以包括通信接口106,用于将网络结构搜索的装置10处理完成的数据输出,和/或,从外部设备输入网络结构搜索的装置10将要处理的数据。例如,处理器104用于控制通信接口106输入和/输出数据。Optionally, the network structure search apparatus 10 may further include a communication interface 106 for outputting data processed by the network structure search apparatus 10, and/or input data to be processed by the network structure search apparatus 10 from an external device . For example, the processor 104 is used to control the communication interface 106 to input and/or output data.

近年来,机器学习算法,尤其是深度学习算法,得到了快速发展与广泛应用。随着模型性能不断地提高,模型结构也越来越复杂。在非自动化机器学习算法中,这些结构需要机器学习专家手工设计和调试,过程非常繁复。而且,随着应用场景和模型结构变得越来越复杂,在应用场景中得到最优模型的难度也越来越大。在这种情况下,自动化机器学习算法(AutoMachine Learning,AutoML)受到学术界与工业界的广泛关注,尤其是网络结构搜索(Neural Architecture Search,NAS)。In recent years, machine learning algorithms, especially deep learning algorithms, have been rapidly developed and widely used. With the continuous improvement of model performance, the model structure becomes more and more complex. In non-automated machine learning algorithms, these structures need to be manually designed and debugged by machine learning experts, and the process is very complicated. Moreover, as application scenarios and model structures become more and more complex, it becomes more and more difficult to obtain an optimal model in an application scenario. In this case, automated machine learning algorithms (AutoMachine Learning, AutoML) have received extensive attention from academia and industry, especially network architecture search (NAS).

具体地,网络结构搜索是一种利用算法自动化设计神经网络模型的技术。网络结构搜索就是要搜索出神经网络模型的结构。在本申请实施方式中,待进行网络结构搜索的神经网络模型为卷积神经网络(Convolutional Neural Networks,CNN)。Specifically, network structure search is a technology that uses algorithms to automatically design neural network models. The network structure search is to search out the structure of the neural network model. In the embodiment of this application, the neural network model to be searched for the network structure is Convolutional Neural Networks (CNN).

网络结构搜索要解决的问题就是确定神经网络模型中的节点之间的操作。节点之间的操作的不同组合对应不同的网络结构。进一步地,神经网络模型中的节点可以理解为神经网络模型中的特征层。两个节点之间的操作指的是,其中一个节点上的特征数据变换为另一个节点上的特征数据所需的操作。本申请提及的操作可以为卷积操作、池化操作、或全连接操作等其他神经网络操作。可以认为两个节点之间的操作构成这两个节点之间的操作 层。通常,两个节点之间的操作层上具有多个可供搜索的操作,即具有多个候选操作。网络结构搜索的目的就是在每个操作层上确定一个操作。The problem to be solved by the network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network structures. Further, the nodes in the neural network model can be understood as the characteristic layer in the neural network model. The operation between two nodes refers to the operation required to transform the feature data on one node into the feature data on the other node. The operations mentioned in this application may be other neural network operations such as convolution operations, pooling operations, or fully connected operations. It can be considered that the operation between two nodes constitutes the operation layer between these two nodes. Generally, there are multiple searchable operations on the operation layer between two nodes, that is, multiple candidate operations. The purpose of network structure search is to determine an operation on each operation layer.

例如,将conv3*3,conv5*5,depthwise3*3,depthwise5*5,maxpool3*3,average pool3*3等定义为搜索空间。也即是说,网络结构的每一层操作是在这六个选择中采样。For example, define conv3*3, conv5*5, depthwise3*3, depthwise5*5, maxpool3*3, average pool3*3, etc. as the search space. In other words, each layer operation of the network structure is sampled in these six options.

请参阅图3,NAS在建立搜索空间后,通常利用第一网络结构在搜索空间中采样到第二网络结构,然后将第二网络结构训练到收敛,以确定反馈量,最后利用反馈量更新第一网络结构。Refer to Figure 3. After the NAS establishes the search space, it usually uses the first network structure to sample the second network structure in the search space, then trains the second network structure to converge to determine the amount of feedback, and finally uses the amount of feedback to update the first network structure. A network structure.

具体地,NAS的思想是,通过一个第一网络结构在搜索空间中得到一个网络结构,然后根据该网络结构得到准确率R,将准确率R作为反馈以更新第一网络结构,第一网络结构继续优化得到另一个网络结构,如此反复进行直到得到最佳的结果。在图3的示例中,第一网络结构通过循环神经网络(Recurrent Neural Network,RNN)构建,当然,第一网络结构也可以通过卷积神经网络(Convolutional Neural Networks,CNN)或长短期记忆人工神经网络(Long-Short Term Memory,LSTM)构建。在此不对第一网络结构构建的具体方式进行限定。Specifically, the idea of NAS is to obtain a network structure in the search space through a first network structure, and then obtain the accuracy rate R according to the network structure, and use the accuracy rate R as feedback to update the first network structure. Continue to optimize to get another network structure, and repeat until the best result is obtained. In the example in Figure 3, the first network structure is constructed by Recurrent Neural Network (RNN). Of course, the first network structure can also be constructed by Convolutional Neural Networks (CNN) or long and short-term memory artificial nerves. Network (Long-Short Term Memory, LSTM) construction. The specific method of constructing the first network structure is not limited here.

然而,将第二网络结构训练到收敛比较耗时。对此,相关技术出现了多种解决NAS高效性的方法,例如,基于网络变换的高效网络结构搜索(Efficient Architecture Search by Network Transformation),以及基于权值分享的高效网络结构搜索(Efficient Neural Architecture Search via Parameter Sharing,ENAS)。其中,基于权值分享的高效网络结构搜索应用较为广泛。However, it takes time to train the second network structure to convergence. In this regard, related technologies have emerged a variety of methods to solve the efficiency of NAS, for example, efficient network structure search based on network transformation (Efficient Architecture Search by Network Transformation), and efficient network structure search based on weight sharing (Efficient Neural Architecture Search) via Parameter Sharing, ENAS). Among them, efficient network structure search based on weight sharing is widely used.

具体地,请参阅图4,基于权值分享的高效网络结构搜索在使用过程中,将操作连接成总图,搜索到的最终最优结构是总图中的其中一个子图。在图4的示例中,总图由节点间的操作连接而成。图4中加粗的带边最优结构的连接方式是总图的一个子图。Specifically, please refer to Fig. 4. In the process of using the efficient network structure search based on weight sharing, operations are connected into a general map, and the final optimal structure searched is one of the sub-graphs in the general map. In the example of Fig. 4, the overall graph is formed by the operation of nodes. The connection mode of the optimal structure with edges in bold in Fig. 4 is a subgraph of the overall picture.

ENAS采用权值分享策略,在每次采样到一个网络结构后,不再将其直接训练至收敛,而是训练一个批(batch),迭代多次后,总图最终可以收敛。请注意,图的收敛并不相当于网络结构的收敛。ENAS adopts a weight sharing strategy. After sampling a network structure every time, it is no longer directly trained to convergence, but a batch is trained. After multiple iterations, the overall graph can finally converge. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.

在训练完所述总图后,可以将总图的参数固定(fix)住,然后训练第一网络结构。具体地,可以对总图进行采样以得到第二网络结构,根据第二网络结构进行预测(prediction)以训练第一网络结构的参数并得到第一网络结构的反馈量,从而更新第一网络结构。After training the overall graph, the parameters of the overall graph can be fixed (fix), and then the first network structure is trained. Specifically, the general graph can be sampled to obtain the second network structure, and predictions can be made according to the second network structure to train the parameters of the first network structure and obtain the feedback amount of the first network structure, thereby updating the first network structure .

可以理解,基于权值分享的高效网络结构搜索,由于在每次搜索网络结构时,分享了可以分享的参数,可以节约时间,从而提高网络结构搜索的效率。例如,在图4的示例中,如果在搜索到节点1、节点3和节点6并对搜索到的网络结构进行训练之后,本次搜索到节点1、节点2、节点3和节点6,那么,搜索到节点1、节点3和节点6时训练的网络结 构的相关参数可以应用到对本次搜索到的网络结构的训练中。这样,就可以实现通过权值分享提高效率。It can be understood that the efficient network structure search based on weight sharing can save time and improve the efficiency of network structure search because each time the network structure is searched, the parameters that can be shared are shared. For example, in the example in Figure 4, if after searching for node 1, node 3 and node 6 and training the searched network structure, node 1, node 2, node 3 and node 6 are searched this time, then, The relevant parameters of the network structure trained when node 1, node 3 and node 6 are searched can be applied to the training of the network structure searched this time. In this way, it is possible to improve efficiency through weight sharing.

ENAS可以将NAS的效率提升1000倍以上,但是,在实际使用的过程中,会出现如下问题:搜索的网络经常出现较大的偏见(bias),即,通过ENAS搜索到的网络结构总会倾向于卷积核尺寸(kernel size)较大的操作。ENAS can increase the efficiency of NAS by more than 1000 times. However, in the actual use process, the following problems will occur: the searched network often has a large bias, that is, the network structure searched through ENAS will always tend to For operations with larger kernel size.

以上述搜索空间为例,将conv3*3,conv5*5,depthwise3*3,depthwise5*5,maxpool3*3,average pool3*3等定义为搜索空间,那么第一网络结构总会搜索到conv5*5。这将导致两个较为严重的问题:第一,搜索到的模型参数较大。这样即使计算力足够,对模型参数没有限制,参数较多的模型也易于过拟合,泛化能力下降,不易调试训练。第二,第一网络结构带有偏见就意味着第一网络结构收敛到的是局部最优解,无法充分探索(explore)搜索空间。这样的第一网络结构不具有较高的可信度,不能保证我们搜索到的模型就是全局最优。Taking the above search space as an example, define conv3*3, conv5*5, depthwise3*3, depthwise5*5, maxpool3*3, average pool3*3, etc. as the search space, then the first network structure will always search conv5*5 . This will lead to two more serious problems: First, the searched model parameters are larger. In this way, even if the computing power is sufficient, there is no restriction on the model parameters, and the model with more parameters is easy to overfit, the generalization ability decreases, and it is difficult to debug and train. Second, the bias of the first network structure means that the first network structure converges to a local optimal solution, and the search space cannot be fully explored (explore). Such a first network structure does not have high credibility, and cannot guarantee that the model we searched for is the global optimal.

上述问题较为严重,需要十分精细的调节第一网络结构使其不会收敛至局部最优解。否则,该问题将使ENAS框架失效。但是精细调节第一网络结构将使ENAS变得十分繁琐,失去的AutoML的初衷,变得不易使用。而且精细调节也不能保证得到较好的第一网络结构。The above-mentioned problems are more serious, and the first network structure needs to be adjusted very finely so that it does not converge to the local optimal solution. Otherwise, the problem will invalidate the ENAS framework. But fine-tuning the first network structure will make ENAS very cumbersome, and the lost original intention of AutoML will become difficult to use. Moreover, fine adjustment cannot guarantee a better first network structure.

基于此,本申请实施方式的网络结构搜索的方法及装置10,在对总图和第一网络结构进行优化前,先以固定的第一网络结构的预设参数对总图进行预训练,使得具有预训练参数的总图得到充分的训练。预训练完成后,放开第一网络结构的参数对总图和第一网络结构进行训练,从而对网络结构和第一网络结构进行优化,避免从头开始训练时优化第一网络结构导致的偏见,提高第一网络结构的可信度,保证搜索到的模型是全局最优。Based on this, the method and device 10 for searching the network structure of the embodiment of the present application, before optimizing the general map and the first network structure, pre-train the general map with the preset parameters of the fixed first network structure, so that The master map with pre-training parameters is fully trained. After the pre-training is completed, let go of the parameters of the first network structure to train the general map and the first network structure, thereby optimizing the network structure and the first network structure, and avoiding the bias caused by optimizing the first network structure when training from scratch. Improve the credibility of the first network structure and ensure that the searched model is globally optimal.

可以理解,传统的CNN中,由网络结构搜索得到的神经网络结构一般可以通过样本数据进行训练和验证。其中,样本数据包括验证样本和训练样本,验证样本可以用于验证网络结构好不好,训练样本可以用于训练网络结构搜索方法搜索到的网络结构。It can be understood that in traditional CNN, the neural network structure obtained from the network structure search can generally be trained and verified through sample data. Among them, the sample data includes verification samples and training samples. The verification samples can be used to verify whether the network structure is good or not, and the training samples can be used to train the network structure searched by the network structure search method.

在本实施方式中,训练样本还可以包括训练集(train)和测试集(valid)。In this embodiment, the training sample may also include a training set (train) and a test set (valid).

也即是说,在本实施方式中,训练样本可以划分为训练集和测试集。具体地,训练集和测试集可以由用户进行划分,或者由网络结构搜索的装置10进行划分。其中,训练集由网络结构搜索的装置10进行划分时,处理器104可以用于将训练样本分为训练集和测试集。In other words, in this embodiment, the training sample can be divided into a training set and a test set. Specifically, the training set and the test set may be divided by the user, or divided by the device 10 for searching the network structure. Wherein, when the training set is divided by the network structure search device 10, the processor 104 may be used to divide the training sample into a training set and a test set.

在本实施方式中,第一网络结构通过LSTM构建。在搜索网络结构时,训练集用于训练搜索结构的参数,如通过conv3*3,sep5*5计算出的结构的参数。而Cell用于搜索网络结构,为看搜索到的网络结构的泛化能力,会把LSTM的参数在测试集上训练。也即是说, 训练集用于训练搜索结构的参数,测试集用于训练LSTM的参数。而验证样本用于验证训练后搜索到的网络结构好不好。In this embodiment, the first network structure is constructed by LSTM. When searching the network structure, the training set is used to train the parameters of the search structure, such as the parameters of the structure calculated by conv3*3, sep5*5. Cell is used to search the network structure. In order to see the generalization ability of the searched network structure, the LSTM parameters are trained on the test set. In other words, the training set is used to train the parameters of the search structure, and the test set is used to train the parameters of the LSTM. The verification sample is used to verify whether the network structure searched after training is good.

在一个例子中,训练样本的数量为10个,将训练样本划分为数量为8个的训练集和数量为2个的测试集,数量为8个的训练集用于训练搜到的网络结构,数量为2个的测试集用于训练LSTM。In an example, the number of training samples is 10, and the training samples are divided into a training set of 8 and a test set of 2, and the training set of 8 is used to train the searched network structure. A test set of 2 is used to train the LSTM.

需要说明的是,与传统的CNN一样,本实施方式可以由验证样本用于验证网络结构好不好,也即是说,本实施方式的验证样本可以保持不变。训练样本中训练集可以用于预训练步骤S14和训练步骤S16,也即是说,预训练步骤S14和训练步骤S16可以在同一个训练集上进行训练。It should be noted that, like the traditional CNN, the verification sample can be used to verify the network structure in this embodiment, that is, the verification sample in this embodiment can remain unchanged. The training set in the training sample can be used in the pre-training step S14 and the training step S16, that is, the pre-training step S14 and the training step S16 can be trained on the same training set.

可选地,请参阅图5,步骤S14包括:Optionally, referring to FIG. 5, step S14 includes:

步骤S142:将第一网络结构的参数设置为第一网络结构的预设参数。Step S142: Set the parameters of the first network structure as preset parameters of the first network structure.

具体地,步骤S142可以由处理器104实现,也即是说,处理器104可以用于将第一网络结构的参数设置为第一网络结构的预设参数。Specifically, step S142 may be implemented by the processor 104, that is, the processor 104 may be configured to set the parameters of the first network structure as preset parameters of the first network structure.

如此,实现预训练过程中的第一网络结构的参数设置,此时,预训练过程中的第一网络结构的参数是固定的。In this way, the parameter setting of the first network structure in the pre-training process is realized. At this time, the parameters of the first network structure in the pre-training process are fixed.

可选地,步骤S14还包括:Optionally, step S14 further includes:

步骤S144:根据第一网络结构以第一网络结构的预设参数在搜索空间的每个操作层采样一个操作以得到总图的一个子图;和Step S144: sampling an operation at each operation layer of the search space according to the first network structure and the preset parameters of the first network structure to obtain a sub-picture of the overall picture; and

步骤S146:利用训练集的一批训练数据训练总图的子图以得到具有预训练参数的总图。Step S146: Use a batch of training data of the training set to train the sub-images of the overall image to obtain the overall image with pre-training parameters.

具体地,步骤S144和步骤S146可以由处理器104实现,也即是说,处理器104可以用于根据第一网络结构以第一网络结构的预设参数在搜索空间的每个操作层采样一个操作以得到总图的一个子图,以及用于利用训练集的一批训练数据训练总图的子图以得到具有预训练参数的总图。Specifically, step S144 and step S146 can be implemented by the processor 104, that is to say, the processor 104 can be used to sample one sample at each operation layer of the search space with the preset parameters of the first network structure according to the first network structure. Operate to obtain a sub-image of the overall image, and the sub-image used to train the overall image using a batch of training data of the training set to obtain the overall image with pre-training parameters.

如此,实现对总图的预训练。在本实施方式中,预训练时,第一网络结构以固定的预设参数采样操作,可以使得采样到每种操作的概率呈均匀分布。也即是说,在预训练过程中,第一网络结构以固定的预设参数采样相应的操作时,采样到每种操作的概率是相等的,卷积核尺寸不会影响第一网络结构的采样到各个操作的概率。In this way, the pre-training of the overall image is realized. In this embodiment, during pre-training, the first network structure samples operations with fixed preset parameters, so that the probability of sampling each operation is uniformly distributed. That is to say, in the pre-training process, when the first network structure samples corresponding operations with fixed preset parameters, the probability of sampling each operation is equal, and the size of the convolution kernel will not affect the first network structure. The probability of sampling to each operation.

可选地,步骤S144和步骤S146迭代进行至完成第一迭代总次数。Optionally, step S144 and step S146 are iterated until the total number of first iterations is completed.

可以理解,步骤S144和步骤S146每次可对一个子图进行预训练,即每次预训练后具有预训练参数的总图中总有一个子图的参数发生变化,因此,步骤S144和步骤S146迭代进行可以使得多个子图可以得到预训练,通过控制预训练的第一迭代总次数,可以使得总图中每个子图都能够被采样到,从而总图可以得到充分的预训练,也即得到具有预训练参 数的总图。It can be understood that steps S144 and S146 can perform pre-training on one sub-graph at a time, that is, after each pre-training, there is always a change in the parameters of one sub-graph in the overall picture with pre-training parameters. Therefore, steps S144 and S146 The iterative process allows multiple subgraphs to be pre-trained. By controlling the total number of first iterations of pre-training, each subgraph in the overall graph can be sampled, so that the overall graph can be fully pre-trained, that is, A general map with pre-trained parameters.

在一个例子中,搜索空间为5层,每层有4个可选用的操作,相当于4X5的图。网络结构搜索需要在每层选一个操作,相当于在图上进行路径优化。初始时,步骤S144在搜索空间的每个操作层随机采样一个操作,然后把采样到的操作连起来,得到一个网络结构,这个网络结构为搜索空间的总图中的一个子图,步骤S146在训练集的一批测试数据上训练这个网络结构;接着,重复步骤S144再在每层随机采样一个操作得到另一个网络结构,步骤S146再在训练集的另一批测试数据上训练这个网络结构。In an example, the search space has 5 layers, and each layer has 4 optional operations, which is equivalent to a 4X5 graph. Network structure search needs to select an operation at each layer, which is equivalent to path optimization on the graph. Initially, step S144 randomly samples an operation in each operation layer of the search space, and then connects the sampled operations to obtain a network structure. This network structure is a sub-graph in the general graph of the search space. Step S146 is This network structure is trained on a batch of test data in the training set; then, step S144 is repeated and another operation is randomly sampled at each layer to obtain another network structure. In step S146, this network structure is trained on another batch of test data in the training set.

其中,训练集可以划分为多批训练数据,不断重复步骤S144和步骤S146,直到训练集的多批训练数据使用完毕,也即是一个迭代(epoch)。在一个epoch后,随机采样到了总图中的部分子图,也对相应的子图进行了预训练,接着,以相同的方式训练总图完成第二个epoch……如此,进行多次迭代,即可实现对总图充分的预训练。Among them, the training set can be divided into multiple batches of training data, and step S144 and step S146 are continuously repeated until the multiple batches of training data of the training set are used up, that is, one iteration (epoch). After an epoch, some subgraphs in the overall image were randomly sampled, and the corresponding subgraphs were also pre-trained. Then, the overall image was trained in the same way to complete the second epoch... In this way, multiple iterations were performed. Then sufficient pre-training of the overall picture can be realized.

可以理解,通过设置合适的第一迭代总次数,可以使得对总图进行预训练后,总图中每个子图都已经受到过训练,得到具有预训练参数的总图。It can be understood that by setting a suitable total number of first iterations, after pre-training the overall image, each sub-image in the overall image has been trained, and an overall image with pre-training parameters can be obtained.

在本实施方式中,第一迭代总次数可以是310次。可以理解,在其他的实施方式中,第二迭代总次数的数值可为100次、200次或其他数值。In this embodiment, the total number of first iterations may be 310 times. It can be understood that, in other embodiments, the value of the total number of second iterations may be 100 times, 200 times, or other values.

可选地,请参阅图6,步骤S16包括:Optionally, referring to FIG. 6, step S16 includes:

训练总图步骤S162,根据第一网络结构训练具有预训练参数的总图;和Training the master map step S162, training the master map with pre-training parameters according to the first network structure; and

训练第一网络结构步骤S164,确定第一网络结构的反馈量并根据第一网络结构的反馈量更新第一网络结构。Training the first network structure step S164, determining the feedback amount of the first network structure and updating the first network structure according to the feedback amount of the first network structure.

具体地,步骤S162和步骤S164可以由处理器104实现,也即是说,处理器104可以用于根据第一网络结构训练具有预训练参数的总图,以及用于确定反馈量并根据反馈量更新第一网络结构。Specifically, step S162 and step S164 can be implemented by the processor 104, that is to say, the processor 104 can be used to train the general map with pre-training parameters according to the first network structure, and to determine the amount of feedback and according to the amount of feedback. Update the first network structure.

如此,在预训练的基础上实现对网络结构的优化。In this way, the optimization of the network structure is realized on the basis of pre-training.

可选地,步骤S162和步骤S164迭代进行至完成第二迭代总次数。Optionally, step S162 and step S164 are iterated until the total number of second iterations is completed.

如此,实现将总图和第一网络结构交替进行优化。具体地,第二迭代总次数可以使得总图和第一网络结构得到充分训练。In this way, the overall graph and the first network structure are alternately optimized. Specifically, the total number of second iterations can enable the overall graph and the first network structure to be fully trained.

可选地,请参阅图7,步骤S162包括:Optionally, referring to FIG. 7, step S162 includes:

步骤S1622:根据第一网络结构在搜索空间的每个操作层采样一个操作以得到具有预训练参数的总图的一个子图;和Step S1622: Sampling an operation in each operation layer of the search space according to the first network structure to obtain a sub-graph of the overall graph with pre-training parameters; and

步骤S1624,:利用训练集的一批训练数据训练采样到的子图。Step S1624: Use a batch of training data in the training set to train the sampled subgraphs.

具体地,步骤S1622和步骤S1624可以由处理器104实现,也即是说,处理器104可以用于根据第一网络结构在搜索空间的每个操作层采样一个操作以得到具有预训练参数的 总图的一个子图,以及用于利用训练集的一批训练数据训练采样到的子图。Specifically, step S1622 and step S1624 can be implemented by the processor 104, that is, the processor 104 can be used to sample an operation at each operation layer of the search space according to the first network structure to obtain a total of pre-training parameters. A subgraph of the graph, and the subgraph used for training samples using a batch of training data from the training set.

如此,实现对总图的训练。在本实施方式中,ENAS采用权值分享策略,在每次采样到一个网络结构后,不再将其直接训练至收敛,而是训练一个批,然后进行第一网络结构的训练。请注意,图的收敛并不相当于网络结构的收敛。In this way, the training of the master map is realized. In this embodiment, ENAS adopts a weight sharing strategy. After sampling a network structure each time, it is no longer directly trained to convergence, but a batch is trained, and then the first network structure is trained. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.

由于总图已经在预训练过程中受到过训练,此时,可以放开第一网络结构的参数,即步骤S1622中可以不以第一网络结构的预设参数进行采样,相应地,在具有预训练参数的总图的操作层中,采样到各个操作的概率可以不相同。Since the general map has been trained in the pre-training process, the parameters of the first network structure can be released at this time, that is, the preset parameters of the first network structure may not be sampled in step S1622. In the operation layer of the overall graph of training parameters, the probability of sampling to each operation may be different.

相应地,步骤S1622和步骤S1624可以循环进行,在一个例子中,初始时,每层采样一个操作,然后把采样到的操作连起来,得到一个子图,在训练集的一批数据上训练这个子图;接着,再每层采样一个操作得到另一个子图,再在训练集的另一批数据上训练这个子图;接着,继续每层采样到又一个子图并在训练集的又一批数据上训练这个子图……直到训练集中的数据使用完毕,也即是完成一个epoch。Correspondingly, step S1622 and step S1624 can be performed cyclically. In an example, at the beginning, each layer samples an operation, and then connect the sampled operations to obtain a subgraph, and train this on a batch of data in the training set Subgraph; then, sample one operation for each layer to obtain another subgraph, and then train this subgraph on another batch of data in the training set; then, continue to sample another subgraph for each layer and use another subgraph in the training set Train this subgraph on batch data...until the data in the training set is used up, that is, an epoch is completed.

当训练总图步骤S162完成一个epoch后,固定总图的参数,进入训练第一网络结构步骤S164从而实现总图和第一网络结构交替进行优化。After an epoch is completed in step S162 of training the general map, the parameters of the general map are fixed, and the first network structure training step S164 is entered to realize the optimization of the general map and the first network structure alternately.

接着,以相同的方式训练总图完成第二个epoch,固定总图的参数,然后训练第一网络结构。Then, train the overall image in the same way to complete the second epoch, fix the parameters of the overall image, and then train the first network structure.

接着,以相同的方式训练总图完成第三个epoch,固定总图的参数,然后训练第一网络结构……如此迭代,直至完成第二迭代总次数,以将总图和第一网络结构交替进行优化。Then, train the overall graph in the same way to complete the third epoch, fix the parameters of the overall graph, and then train the first network structure... Iterate until the total number of second iterations is completed, to alternate the overall graph and the first network structure optimize.

需要说明的是,在执行训练总图步骤S162中,总图的参数随着训练的进行而变化,也就是说,每次迭代中具有预训练参数的总图在训练的过程中总图参数持续地更新直至完成一个epoch。It should be noted that in the step S162 of executing the training master map, the parameters of the master map change as the training progresses, that is, the master map with pre-training parameters in each iteration continues with the parameters of the master map during the training process. Update until the completion of an epoch.

在上述讨论的实施方式中,训练总图步骤S162每完成一个epoch,再固定总图的参数,然后进入训练第一网络结构步骤S164,当然,在其他实施方式中,训练总图步骤S162可以是每完成10个epoch,再固定总图的参数,然后进入训练第一网络结构步骤S164。上述列出步骤S162每次完成epoch的数量仅作为示例,不能解释为对本申请的限制,可以理解,步骤S162每次完成epoch的数量可以根据实际情况进行设置,在此不做具体限定。In the above-discussed embodiment, each time an epoch is completed in the training master map step S162, the parameters of the master map are fixed, and then the training first network structure step S164 is entered. Of course, in other embodiments, the training master map step S162 can be After every 10 epochs, the parameters of the overall graph are fixed, and then the first network structure training step S164 is entered. The foregoing listing of the number of epochs completed in step S162 each time is only an example, and cannot be construed as a limitation of the application. It can be understood that the number of epochs completed each time in step S162 can be set according to actual conditions, and is not specifically limited here.

在本实施方式中,第二迭代总次数为310次。可以理解,在其他的实施方式中,第二迭代总次数的数值可为100次、200次或其他数值。In this embodiment, the total number of second iterations is 310. It can be understood that, in other embodiments, the value of the total number of second iterations may be 100 times, 200 times, or other values.

需要说明的是,在本实施方式中,第一迭代总次数和第二迭代总次数是相同的,而在其他实施方式中,第一迭代总次数和第二迭代总次数可以是不相同的,在此不做具体限定。It should be noted that, in this embodiment, the total number of first iterations and the total number of second iterations are the same, while in other embodiments, the total number of first iterations and the total number of second iterations may be different. There is no specific limitation here.

请参阅图8,搜索空间的每个操作层对应于长短期记忆人工神经网络(LSTM)的一个时间步(timestep),对于每个时间步,长短期记忆人工神经网络的细胞(Cell)输出一个隐 状态(hidden state),步骤S1622包括:Please refer to Figure 8. Each operation layer of the search space corresponds to a time step of the long short-term memory artificial neural network (LSTM). For each time step, the cell of the long short-term memory artificial neural network (Cell) outputs one Hidden state, step S1622 includes:

将隐状态映射为特征向量,特征向量的维度与每个操作层上的操作数量相同;Mapping the hidden state into a feature vector, the dimension of the feature vector is the same as the number of operations on each operation layer;

根据特征向量在每个操作层采样一个操作以得到总图的子图。Sampling an operation in each operation layer according to the feature vector to obtain a sub-image of the overall image.

具体地,处理器104可以用于将隐状态映射为特征向量,以及用于根据特征向量在每个操作层采样一个操作以得到总图的子图。其中,特征向量的维度与每个操作层上的操作数量相同。Specifically, the processor 104 may be used to map the hidden state into a feature vector, and to sample an operation at each operation layer according to the feature vector to obtain a sub-image of the overall image. Among them, the dimension of the feature vector is the same as the number of operations on each operation layer.

如此,实现在搜索空间的每个操作层采样一个操作以得到总图的子图。例如,一共要搜索一个20层的网络,不考虑跳线,需要20个时间步。In this way, an operation is sampled in each operation layer of the search space to obtain a sub-graph of the overall image. For example, to search a 20-layer network altogether, 20 time steps are required regardless of jumpers.

在图8的示例中,实线箭头表示时间步(timestep),时间1表示LSTM的第一个Cell,时间2表示LSTM的第二个Cell……以此类推。方块conv3*3表示在模型中该层的操作,圆形表示操作层与操作层之间的连接关系。In the example in FIG. 8, the solid arrow represents timestep, time 1 represents the first cell of LSTM, time 2 represents the second cell of LSTM, and so on. The square conv3*3 represents the operation of this layer in the model, and the circle represents the connection relationship between the operation layer and the operation layer.

可以理解,由于网络结构的计算有先后顺序,将计算先后的关系的逻辑关系映射到LSTM上,就是图8中一个小方块从左往右,对应的每一个时间的LSTM的cell的状态。It can be understood that because the calculation of the network structure has a sequence, the logical relationship of the calculation sequence is mapped to the LSTM, which is a small square in Figure 8 from left to right, corresponding to the state of the LSTM cell at each time.

具体地,在时间1下,cell输出的隐状态经过计算得到卷积conv3×3,conv3×3作为时间2下cell的输入层,时间1下cell输出的隐状态也作为时间2下cell的输入,计算得到圆圈1。Specifically, at time 1, the hidden state output by the cell is calculated to obtain conv3×3, conv3×3 is used as the input layer of the cell at time 2, and the hidden state output by the cell at time 1 is also used as the input of the cell at time 2. , Circle 1 is calculated.

同理,圆圈1作为时间3下cell的输入,时间2下cell输出的隐状态也作为时间3的输入,计算得到卷积sep5×5……以此类推。In the same way, circle 1 is used as the input of the cell at time 3, and the hidden state of the cell output at time 2 is also used as the input of time 3. The convolution sep5×5 is calculated and so on.

进一步地,根据特征向量在每个操作层采样一个操作以得到网络结构的步骤包括:Further, the steps of sampling an operation at each operation layer according to the feature vector to obtain the network structure include:

将特征向量进行归一化(softmax)以得到每个操作层的每个操作的概率;Normalize the feature vector (softmax) to get the probability of each operation of each operation layer;

根据概率在每个操作层采样一个操作以得到网络结构。Sample an operation at each operation layer according to the probability to get the network structure.

如此,实现根据特征向量在每个操作层采样一个操作以得到网络结构。具体地,在图8所示的例子中,对LSTM的cell输出的隐状态进行编码(encoding)操作,将其映射维度为6的向量(vector),该向量经过归一化指数函数(softmax),变为概率分布,依据此概率分布进行采样,得到当前层的操作。以此类推最终得到一个网络结构。可以理解,在此例子中,只有一个输入,一共包含六种操作(3×3卷积,5×5卷积,3×3depthwise-separable卷积,5×5depthwise-separable卷积,max pooling,3×3average pooling),向量的维度与搜索空间对应,6是指搜索空间有6个操作可选择。In this way, an operation is sampled at each operation layer according to the feature vector to obtain the network structure. Specifically, in the example shown in FIG. 8, the hidden state output by the cell of the LSTM is encoded (encoding), and it is mapped to a vector with a dimension of 6, which undergoes a normalized exponential function (softmax). , Becomes a probability distribution, sampling is performed according to this probability distribution, and the operation of the current layer is obtained. And so on to finally get a network structure. It can be understood that in this example, there is only one input, which contains a total of six operations (3×3 convolution, 5×5 convolution, 3×3 depthwise-separable convolution, 5×5 depthwise-separable convolution, max pooling, 3 ×3average pooling), the dimension of the vector corresponds to the search space, 6 means that the search space has 6 operations to choose from.

请参阅图9,步骤S164中,确定反馈量的步骤包括:Referring to FIG. 9, in step S164, the steps of determining the feedback amount include:

步骤S1642:根据第一网络结构在搜索空间的每个操作层采样一个操作以得到第二网络结构;Step S1642: Sample an operation at each operation layer of the search space according to the first network structure to obtain the second network structure;

步骤S1644:利用测试集的一批测试数据预测第二网络结构以确定第一网络结构的反 馈量。Step S1644: Use a batch of test data in the test set to predict the second network structure to determine the feedback amount of the first network structure.

对应地,步骤S1642和步骤S1644可以由处理器104实现,也即是说,处理器104可以用于根据第一网络结构在搜索空间的每个操作层采样一个操作以得到第二网络结构,以及用于利用测试集的一批测试数据预测第二网络结构以确定第一网络结构的反馈量。Correspondingly, step S1642 and step S1644 can be implemented by the processor 104, that is, the processor 104 can be used to sample an operation at each operation layer of the search space according to the first network structure to obtain the second network structure, and It is used to predict the second network structure using a batch of test data of the test set to determine the feedback amount of the first network structure.

如此,可以根据搜索到的第二网络结构在测试集上进行预测从而得到第一网络结构的反馈量。In this way, a prediction can be made on the test set according to the searched second network structure to obtain the feedback amount of the first network structure.

在本实施方式中,在搜索到第二网络结构后,可将搜索到的第二网络结构在测试集上预测,以得到反馈量来根据前述公式更新第一网络结构。请注意,并非直接用测试集训练LSTM。In this embodiment, after the second network structure is searched, the searched second network structure can be predicted on the test set to obtain feedback to update the first network structure according to the aforementioned formula. Please note that the LSTM is not directly trained on the test set.

可选地,训练第一网络结构步骤S164循环预设次数,步骤S164包括:Optionally, step S164 of training the first network structure loops a preset number of times, and step S164 includes:

步骤S1646,利用每次循环确定的第一网络结构的反馈量更新第一网络结构,每次循环确定第一网络结构的反馈量的数量为预设数量。In step S1646, the first network structure is updated with the feedback amount of the first network structure determined in each cycle, and the number of feedback amounts of the first network structure is determined to be a preset number in each cycle.

对应地,处理器104用于将第一网络结构循环训练预设次数,以及用于利用每次循环确定的反馈量更新第一网络结构。其中,每次循环确定的反馈量的数量为预设数量。Correspondingly, the processor 104 is configured to cyclically train the first network structure for a preset number of times, and is configured to update the first network structure by using the feedback amount determined in each cycle. Among them, the number of feedback amounts determined in each cycle is a preset number.

如此,实现训练第一网络结构。具体地,在本实施方式中,步骤S16循环50次,也即是说,训练总图步骤S162和训练第一网络结构步骤S164迭代进行训练过程中,每次训练第一网络结构步骤164执行时,第一网络结构更新50次从而实现第一网络结构的优化。训练第一网络结构步骤164完成后,下迭代中训练总图步骤162以最后更新的第一网络结构训练总图。In this way, the first network structure is trained. Specifically, in this embodiment, step S16 is looped 50 times, that is to say, the training general map step S162 and the training first network structure step S164 are iteratively performed during the training process, each time the training first network structure step 164 is executed , The first network structure is updated 50 times to realize the optimization of the first network structure. After the step 164 of training the first network structure is completed, the training master map step 162 in the next iteration uses the last updated first network structure to train the master map.

可以理解,在其他的例子中,预设次数可为10、20、30或其他数值,在此不对预设次数的具体数值进行限定。It can be understood that in other examples, the preset number of times may be 10, 20, 30 or other values, and the specific number of the preset times is not limited herein.

可选地,步骤S1642和步骤S1644循环进行以获得第一网络结构预设数量的反馈量。Optionally, step S1642 and step S1644 are cyclically performed to obtain a preset number of feedback amounts of the first network structure.

可以理解,每次循环时,都可以采样到一个第二网络结构,这个网络结构在测试集的一批训练数据上进行测试后,可以得到一个关于第一网络结构的反馈量,循环进行足够次数后可以获得第一网络结构预设数量的反馈量。It is understandable that every time you loop, you can sample a second network structure. After this network structure is tested on a batch of training data in the test set, you can get a feedback about the first network structure, and repeat enough times. Then, a preset number of feedback amounts of the first network structure can be obtained.

在一个例子中,预设数量为20,即第一网络结构的反馈量为20个。可以理解,在其他的例子中,预设数量可为10、15、25或其他数值。在此不对预设数量的具体数值进行限定。In an example, the preset number is 20, that is, the feedback amount of the first network structure is 20. It can be understood that in other examples, the preset number may be 10, 15, 25 or other numerical values. The specific value of the preset number is not limited here.

可选地,第一网络结构是根据长短期记忆网络模型来构建,步骤S164通过以下条件式实现:Optionally, the first network structure is constructed according to a long and short-term memory network model, and step S164 is implemented by the following conditional formula:

Figure PCTCN2019089697-appb-000001
Figure PCTCN2019089697-appb-000001

其中,R k为第k个反馈量,θ c为长短期记忆网络模型的参数,a t为在第t个操作层采 样到的所述操作,P(a t|a (t-1):1;θ c)为采样到操作的概率,m为反馈量的总数,T为第一网络结构预测的超参数的数量。 Wherein, R k is the k th feedback, θ c is the short and long term memory parameter network model, a t t th said operating layer in the sampled operation, P (a t | a ( t-1): 1 ; θ c ) is the probability of sampling to operation, m is the total number of feedbacks, and T is the number of hyperparameters predicted by the first network structure.

如此,实现根据多个反馈量的平均值更新第一网络结构。在本实施方式中,T包括操作层和跳线。在其他的实施方式中,还可能包含其他想要优化的超参数。在此不对T的具体内容进行限定。In this way, the first network structure is updated according to the average value of multiple feedback quantities. In this embodiment, T includes an operating layer and jumpers. In other embodiments, other hyperparameters to be optimized may also be included. The specific content of T is not limited here.

在本实施方式中,对总图的训练和对第一网络结构的更新是多次迭代进行的,总图与第一网络结构交替训练的第二迭代总次数为310次。也即是说,步骤S14和步骤S16迭代310次。如此,可以最终获得效果较好的第二网络结构。In this embodiment, the training of the overall graph and the update of the first network structure are performed in multiple iterations, and the total number of second iterations of alternate training of the overall graph and the first network structure is 310 times. In other words, step S14 and step S16 are iterated 310 times. In this way, a better second network structure can be finally obtained.

在每次交替训练时,循环步骤S16的预设次数为50次。也即是说,在步骤S14和步骤S16的每次迭代中,第一网络结构更新50次,根据上述条件式执行50次。可以理解,将步骤S16的循环次数选取为50,可以减少采样带来的随机性优化。In each alternate training, the preset number of cycles of step S16 is 50 times. In other words, in each iteration of step S14 and step S16, the first network structure is updated 50 times and executed 50 times according to the above conditional expression. It can be understood that selecting the number of loops in step S16 to be 50 can reduce the randomness optimization caused by sampling.

每次循环步骤S16时,采样的第二网络结构的数量为20个,以得到20个反馈量,将20个反馈量作为R k代入上述条件式,以实现第一网络结构的更新。也即是说,上述条件式中,m的值为20。 Each time step S16 is looped, the number of sampled second network structures is 20 to obtain 20 feedback quantities, and the 20 feedback quantities are substituted into the above conditional expression as R k to realize the update of the first network structure. In other words, in the above conditional expression, the value of m is 20.

请注意,训练样本遍历一遍,即为迭代一次。例如,训练样本的数量为10个,每次划分出2个样本作为训练集的一批来训练子图,5批后,训练样本使用完毕,则完成一次迭代。Please note that once the training sample is traversed, it is an iteration. For example, the number of training samples is 10, and each time 2 samples are divided as a batch of the training set to train the subgraph. After 5 batches, the training samples are used up, and one iteration is completed.

可选的,请参阅图10,步骤S164中,确定反馈量的步骤包括:Optionally, referring to FIG. 10, in step S164, the step of determining the feedback amount includes:

步骤S1646:根据预设的惩罚模型调整第一网络结构的反馈量。Step S1646: Adjust the feedback amount of the first network structure according to the preset penalty model.

对应地,步骤S1646可以由处理器104实现,也即是说,处理器104可以用于根据预设的惩罚模型调整第一网络结构的反馈量。Correspondingly, step S1646 can be implemented by the processor 104, that is, the processor 104 can be used to adjust the feedback amount of the first network structure according to a preset penalty model.

可以理解,预训练过程可以不将总图训练到充分收敛,此时,在预训练过程中未能使得总图充分收敛的情况下,根据预设的惩罚模型调整反馈量,可以避免在总图未充分收敛时优化第一网络结构导致的偏见,从而进一步提高第一网络结构的可信度,保证搜索到的模型是全局最优。It can be understood that the pre-training process may not train the overall image to fully converge. At this time, if the overall image fails to fully converge during the pre-training process, the feedback amount is adjusted according to the preset penalty model to avoid the The prejudice caused by optimizing the first network structure when the convergence is insufficient, thereby further improving the credibility of the first network structure, and ensuring that the searched model is the global optimum.

可选地,请参阅图11,步骤S1646包括:Optionally, referring to FIG. 11, step S1646 includes:

步骤S1648:根据预设信息、第二网络结构、当前的迭代次数和第二迭代总次数确定惩罚项;Step S1648: Determine the penalty item according to the preset information, the second network structure, the current number of iterations and the total number of second iterations;

步骤S1649:根据惩罚项调整反馈量。Step S1649: Adjust the feedback amount according to the penalty item.

对应地,步骤S1648和步骤S1649可以由处理器104实现,也即是说,处理器104用于根据预设信息、第二网络结构、当前的迭代次数和迭代总次数确定惩罚项;以及用于根据惩罚项调整反馈量。Correspondingly, step S1648 and step S1649 can be implemented by the processor 104, that is, the processor 104 is used to determine the penalty item according to the preset information, the second network structure, the current number of iterations and the total number of iterations; and Adjust the amount of feedback according to the penalty.

如此,实现根据惩罚模型调整反馈量。可以理解,为避免在搜索过程中,总图未充分收敛的情况下,优化第一网络结构引入偏见。我们给较大的卷积核加入惩罚项,对预测时的反馈量进行处理。In this way, the feedback amount is adjusted according to the penalty model. It can be understood that in order to avoid the introduction of bias in the optimization of the first network structure in the case of insufficient convergence of the general map during the search process. We add a penalty term to the larger convolution kernel to deal with the amount of feedback during prediction.

具体地,步骤S1648中的预设信息,可以是用户预先输入的信息,也可以是其他计算模型将计算的结果输入。在此不对预设信息的来源进行限定。Specifically, the preset information in step S1648 may be information input in advance by the user, or may be a result input by another calculation model. The source of the preset information is not limited here.

请注意,在本实施方式中,步骤S162和步骤S164迭代进行。第二迭代总次数指步骤S12和步骤S164迭代的总次数。当前的迭代次数是指在步骤S162和步骤S164迭代过程中,当前的迭代是第几次迭代。例如,第二迭代总次数为310,在第一次迭代时,当前的迭代次数为1;在第二次迭代时,当前的迭代次数为2;在第三次迭代时,当前的迭代次数为3……以此类推,步骤S162和步骤S164迭代完310次即停止迭代。Please note that in this embodiment, step S162 and step S164 are performed iteratively. The total number of second iterations refers to the total number of iterations of step S12 and step S164. The current iteration number refers to how many iterations the current iteration is during the iteration process of step S162 and step S164. For example, the total number of the second iteration is 310. In the first iteration, the current iteration number is 1; in the second iteration, the current iteration number is 2; in the third iteration, the current iteration number is 3...and so on, step S162 and step S164 iterate 310 times and stop the iteration.

因此,第二迭代总次数和每次迭代中当前的迭代次数也是可以确定的。其中,第二迭代总次数可以由用户进行设定。当前的迭代次数随着迭代的进行是可以累计得到的。Therefore, the total number of second iterations and the current number of iterations in each iteration can also be determined. The total number of second iterations can be set by the user. The current number of iterations can be accumulated as the iteration progresses.

请一并参阅图12,由于随着迭代的进行,总图逐渐收敛。可以理解,惩罚项的值随着总图的收敛过程而增大,其中,总图在训练的后期收敛了,应该予以正常的反馈量,不再对反馈量进行惩罚,因此,在总图收敛时,惩罚项的值为1。请注意,随着步骤S162和步骤S164的迭代的进行,总图逐渐收敛,在迭代完第二迭代总次数时,总图呈收敛状态。Please refer to Figure 12 together. As the iteration progresses, the overall graph gradually converges. It can be understood that the value of the penalty term increases with the convergence process of the overall graph, where the overall graph converges in the later stage of training and should be given a normal amount of feedback instead of penalizing the amount of feedback. Therefore, the overall graph converges When, the value of the penalty item is 1. Please note that as the iterations of step S162 and step S164 proceed, the overall graph gradually converges. After the total number of iterations of the second iteration, the overall graph is in a convergent state.

从图12中可以看出,惩罚项的数值可以是从0.76逐步过渡到1。对应地,调整后的反馈量,逐步从ACC*0.76到ACC。It can be seen from Figure 12 that the value of the penalty term can be gradually transitioned from 0.76 to 1. Correspondingly, the adjusted feedback amount gradually changes from ACC*0.76 to ACC.

在一些实施例中,惩罚项的大小选取可以与预训练时总图的收敛程度相关,此时,预设信息可以包括预训练的第一迭代总次数,也即是说,步骤S1648可以根据预训练的第一迭代总次数、第二网络结构、当前的迭代次数和第二迭代总次数确定惩罚项。In some embodiments, the size of the penalty item may be related to the degree of convergence of the total graph during pre-training. At this time, the preset information may include the total number of first iterations of pre-training, that is, step S1648 may be based on the pre-training. The total number of first iterations of training, the second network structure, the current number of iterations, and the total number of second iterations determine the penalty term.

本申请实施方式还提供一种计算机存储介质,其上存储有计算机程序,计算机程序被计算机执行时使得,计算机执行上述任一实施方式的方法。The embodiment of the present application also provides a computer storage medium on which a computer program is stored. When the computer program is executed by a computer, the computer executes the method of any of the above embodiments.

本申请实施方式还提供一种包含指令的计算机程序产品,指令被计算机执行时使得计算机执行上述任一实施方式的方法。The embodiment of the present application also provides a computer program product containing instructions, which when executed by a computer causes the computer to execute the method of any one of the foregoing embodiments.

本申请实施方式的计算机存储介质和计算机程序产品,将网络结构和第一网络结构交替进行优化,并根据预设的惩罚模型调整反馈量,可以避免在总图充分收敛时优化第一网络结构导致的偏见,从而提高第一网络结构的可信度,保证搜索到的模型是全局最优。The computer storage medium and the computer program product of the embodiment of the present application optimize the network structure and the first network structure alternately, and adjust the feedback amount according to the preset penalty model, which can avoid the optimization of the first network structure when the overall map is fully converged. Therefore, the credibility of the first network structure is improved, and the searched model is guaranteed to be globally optimal.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部 分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc. .

本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (18)

一种网络结构搜索的方法,其特征在于,包括:A method for network structure search, characterized in that it comprises: 定义搜索空间步骤:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;The step of defining search space: determining the search space of the neural network model to be searched for the network structure, the search space defining various operations on the operation layer between every two nodes in the neural network model; 预训练步骤:根据第一网络结构以所述第一网络结构的预设参数训练所述搜索空间的总图以得到具有预训练参数的所述总图,所述总图由所述操作构成;Pre-training step: training the general map of the search space according to the first network structure with preset parameters of the first network structure to obtain the general map with pre-training parameters, the general map being constituted by the operation; 训练步骤:根据所述第一网络结构训练具有所述预训练参数的所述总图和根据所述第一网络结构的反馈量更新所述第一网络结构。Training step: training the general map with the pre-training parameters according to the first network structure and updating the first network structure according to the amount of feedback of the first network structure. 根据权利要求1所述的网络结构搜索的方法,其特征在于,所述预训练步骤,包括:The method for searching a network structure according to claim 1, wherein the pre-training step comprises: 根据所述第一网络结构以所述第一网络结构的预设参数在所述搜索空间的每个所述操作层采样一个所述操作以得到所述总图的一个子图;Sampling one operation at each operation layer of the search space according to the first network structure and preset parameters of the first network structure to obtain a sub-picture of the overall picture; 利用训练集的一批训练数据训练所述总图的子图以得到具有所述预训练参数的所述总图。A batch of training data of the training set is used to train the sub-pictures of the overall image to obtain the overall image with the pre-training parameters. 根据权利要求1所述的网络结构搜索的方法,其特征在于,所述训练步骤包括:The method for searching a network structure according to claim 1, wherein the training step comprises: 训练总图步骤:根据所述第一网络结构训练具有所述预训练参数的所述总图;Training the general map step: training the general map with the pre-training parameters according to the first network structure; 训练第一网络结构步骤:确定所述反馈量并根据所述反馈量更新所述第一网络结构。The step of training the first network structure: determining the feedback amount and updating the first network structure according to the feedback amount. 根据权利要求3所述的网络结构搜索的方法,其特征在于,所述训练总图步骤和所述训练第一网络结构步骤迭代进行。The method for searching a network structure according to claim 3, wherein the step of training the general map and the step of training the first network structure are performed iteratively. 根据权利要求3所述的网络结构搜索的方法,其特征在于,所述训练总图步骤包括:The method for network structure search according to claim 3, wherein the step of training the master map comprises: 根据所述第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到具有所述预训练参数的所述总图的一个子图;Sampling one of the operations in each of the operation layers of the search space according to the first network structure to obtain a sub-picture of the overall picture with the pre-training parameters; 利用训练集的一批训练数据训练所述子图。A batch of training data of the training set is used to train the subgraph. 根据权利要求3所述的网络结构搜索的方法,其特征在于,所述训练第一网络结构步骤循环预设次数,根据所述反馈量更新所述第一网络结构,包括:The method for searching the network structure according to claim 3, wherein the step of training the first network structure loops a preset number of times, and updating the first network structure according to the feedback amount, comprising: 利用每次循环确定的所述反馈量更新所述第一网络结构,每次循环确定的所述反馈量的数量为预设数量。The first network structure is updated by using the feedback amount determined in each cycle, and the number of feedback amounts determined in each cycle is a preset number. 根据权利要求6所述的网络结构搜索的方法,其特征在于,确定所述反馈量,包括:The method for network structure search according to claim 6, wherein determining the feedback amount comprises: 根据所述第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到第二网络结构;Sampling one operation at each operation layer of the search space according to the first network structure to obtain a second network structure; 利用测试集的一批测试数据预测所述第二网络结构以确定所述反馈量。Using a batch of test data in the test set to predict the second network structure to determine the amount of feedback. 根据权利要求6所述的网络结构搜索方法,其特征在于,所述第一网络结构是根据长短期记忆网络模型来构建,根据所述反馈量更新所述第一网络结构,通过以下条件式实现:The network structure search method according to claim 6, wherein the first network structure is constructed according to a long- and short-term memory network model, and the first network structure is updated according to the feedback amount, which is realized by the following conditional expression :
Figure PCTCN2019089697-appb-100001
Figure PCTCN2019089697-appb-100001
其中,R k为第k个所述反馈量,θ c为所述长短期记忆网络模型的参数,a t为在第t个所述操作层采样到的所述操作,P(a t|a (t-1):1;θ c)为采样到所述操作的概率,m为所述反馈量的总数,T为所述第一网络结构预测的超参数的数量。 Wherein, R k is the k th feedback amount, θ c is the parameter for the short and long term memory network model, a t t th as the handle layer to the sampling operation, P (a t | a (t-1):1 ; θ c ) is the probability of sampling to the operation, m is the total number of feedback amounts, and T is the number of hyperparameters predicted by the first network structure.
一种网络结构搜索的装置,其特征在于,包括处理器和存储器,所述存储器存储有一个或多个程序,在所述程序被处理器执行的情况下,使得所述处理器用于执行:A network structure search device, characterized in that it comprises a processor and a memory, the memory stores one or more programs, and when the programs are executed by the processor, the processor is used to execute: 定义搜索空间步骤:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;The step of defining search space: determining the search space of the neural network model to be searched for the network structure, the search space defining various operations on the operation layer between every two nodes in the neural network model; 预训练步骤:根据第一网络结构以所述第一网络结构的预设参数训练所述搜索空间的总图以得到具有预训练参数的所述总图,所述总图由所述操作构成;Pre-training step: training the general map of the search space according to the first network structure with preset parameters of the first network structure to obtain the general map with pre-training parameters, the general map being constituted by the operation; 训练步骤:根据所述第一网络结构训练具有所述预训练参数的所述总图和根据所述第一网络结构的反馈量更新所述第一网络结构。Training step: training the general map with the pre-training parameters according to the first network structure and updating the first network structure according to the amount of feedback of the first network structure. 根据权利要求9所述的网络结构搜索的装置,其特征在于,所述处理器用于根据所述第一网络结构以所述第一网络结构的预设参数在所述搜索空间的每个所述操作层采样一个所述操作以得到所述总图的一个子图;以及用于利用训练集的一批训练数据训练所述总图的子图以得到具有所述预训练参数的所述总图。The network structure search device according to claim 9, wherein the processor is configured to use the preset parameters of the first network structure in each of the search spaces according to the first network structure. The operation layer samples one of the operations to obtain a sub-image of the overall image; and is used to train the sub-images of the overall image using a batch of training data of the training set to obtain the overall image with the pre-training parameters . 根据权利要求9所述的网络结构搜索的装置,其特征在于,所述处理器用于训练总图:根据所述第一网络结构训练具有所述预训练参数的所述总图;以及用于训练第一网络结构:确定所述反馈量并根据所述反馈量更新所述第一网络结构。The network structure search device according to claim 9, wherein the processor is used for training a general map: training the general map with the pre-training parameters according to the first network structure; and for training The first network structure: determining the feedback amount and updating the first network structure according to the feedback amount. 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述处理器用于迭代进行所述训练总图和所述训练第一网络结构。The network structure search device according to claim 11, wherein the processor is configured to iteratively perform the training master map and the training first network structure. 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述处理器用于根据所述第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到具有所述预训练参数的所述总图的一个子图,以及用于利用训练集的一批训练数据训练所述子图。The device for searching a network structure according to claim 11, wherein the processor is configured to sample one of the operations at each operation layer of the search space according to the first network structure to obtain the A sub-picture of the overall picture of the pre-training parameters, and a batch of training data for training the sub-picture using a training set. 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述处理器用于将所述训练第一网络结构步骤循环预设次数,以及用于利用每次循环确定的所述反馈量更新所述第一网络结构,每次循环确定的所述反馈量的数量为预设数量。The network structure search device according to claim 11, wherein the processor is configured to loop the step of training the first network structure for a preset number of times, and is configured to update the feedback amount determined by each loop. In the first network structure, the number of the feedback amounts determined in each cycle is a preset number. 根据权利要求14所述的网络结构搜索的装置,其特征在于,所述处理器用于根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到第二网络结构;以及用于利用测试集的一批测试数据预测所述第二网络结构以确定所述反馈量。The device for searching a network structure according to claim 14, wherein the processor is configured to sample one of the operations at each operation layer of the search space according to the first network structure to obtain the second network structure And for using a batch of test data in the test set to predict the second network structure to determine the amount of feedback. 根据权利要求14所述的网络结构搜索的装置,其特征在于,所述第一网络结构是根据长短期记忆网络模型来构建,所述处理器用于利用所述反馈量更新所述第一网络结构,通过以下条件式实现:The network structure search device according to claim 14, wherein the first network structure is constructed according to a long and short-term memory network model, and the processor is configured to update the first network structure using the feedback amount , Through the following conditional expression:
Figure PCTCN2019089697-appb-100002
Figure PCTCN2019089697-appb-100002
其中,R k为第k个所述反馈量,θ c为所述长短期记忆网络模型的参数,a t为在第t个所述操作层采样到的所述操作,P(a t|a (t-1):1;θ c)为采样到所述操作的概率,m为所述反馈量的总数,T为所述第一网络结构预测的超参数的数量。 Wherein, R k is the k th feedback amount, θ c is the parameter for the short and long term memory network model, a t t th as the handle layer to the sampling operation, P (a t | a (t-1):1 ; θ c ) is the probability of sampling to the operation, m is the total number of feedback amounts, and T is the number of hyperparameters predicted by the first network structure.
一种计算机存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行权利要求1至8中任一项所述的方法。A computer storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a computer, the computer executes the method according to any one of claims 1 to 8. 一种包含指令的计算机程序产品,其特征在于,所述指令被计算机执行时使得所述计算机执行权利要求1至8中任一项所述的方法。A computer program product containing instructions, wherein when the instructions are executed by a computer, the computer executes the method according to any one of claims 1 to 8.
PCT/CN2019/089697 2019-05-31 2019-05-31 Method and device for searching network structure, computer storage medium and computer program product Ceased WO2020237688A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980009276.7A CN111684472A (en) 2019-05-31 2019-05-31 Method and apparatus for network structure search, computer storage medium and computer program product
PCT/CN2019/089697 WO2020237688A1 (en) 2019-05-31 2019-05-31 Method and device for searching network structure, computer storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/089697 WO2020237688A1 (en) 2019-05-31 2019-05-31 Method and device for searching network structure, computer storage medium and computer program product

Publications (1)

Publication Number Publication Date
WO2020237688A1 true WO2020237688A1 (en) 2020-12-03

Family

ID=72433307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089697 Ceased WO2020237688A1 (en) 2019-05-31 2019-05-31 Method and device for searching network structure, computer storage medium and computer program product

Country Status (2)

Country Link
CN (1) CN111684472A (en)
WO (1) WO2020237688A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119623154A (en) * 2024-11-11 2025-03-14 南京理工大学 Intelligent design method and system of MEMS sensors for general design scenarios

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229647A (en) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 The generation method and device of neural network structure, electronic equipment, storage medium
CN108780519A (en) * 2016-03-11 2018-11-09 奇跃公司 Structure Learning in Convolutional Neural Networks
CN108921287A (en) * 2018-07-12 2018-11-30 开放智能机器(上海)有限公司 A kind of optimization method and system of neural network model
WO2019018375A1 (en) * 2017-07-21 2019-01-24 Google Llc Neural architecture search for convolutional neural networks
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of structure search method and device of deep neural network
CN109299142A (en) * 2018-11-14 2019-02-01 中山大学 A Convolutional Neural Network Structure Search Method and System Based on Evolutionary Algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018218481A1 (en) * 2017-05-31 2018-12-06 深圳市大疆创新科技有限公司 Neural network training method and device, computer system and mobile device
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN109344959A (en) * 2018-08-27 2019-02-15 联想(北京)有限公司 Neural network training method, nerve network system and computer system
CN109685204B (en) * 2018-12-24 2021-10-01 北京旷视科技有限公司 Image processing method and device, storage medium and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108780519A (en) * 2016-03-11 2018-11-09 奇跃公司 Structure Learning in Convolutional Neural Networks
WO2019018375A1 (en) * 2017-07-21 2019-01-24 Google Llc Neural architecture search for convolutional neural networks
CN108229647A (en) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 The generation method and device of neural network structure, electronic equipment, storage medium
CN108921287A (en) * 2018-07-12 2018-11-30 开放智能机器(上海)有限公司 A kind of optimization method and system of neural network model
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of structure search method and device of deep neural network
CN109299142A (en) * 2018-11-14 2019-02-01 中山大学 A Convolutional Neural Network Structure Search Method and System Based on Evolutionary Algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119623154A (en) * 2024-11-11 2025-03-14 南京理工大学 Intelligent design method and system of MEMS sensors for general design scenarios

Also Published As

Publication number Publication date
CN111684472A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
CN111126574B (en) Method, device and storage medium for training machine learning model based on endoscopic image
CN115427968B (en) Robust AI Inference in Edge Computing Devices
JP7478145B2 (en) Automatic generation of machine learning models
US20190294975A1 (en) Predicting using digital twins
US10546066B2 (en) End-to-end learning of dialogue agents for information access
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
WO2017181866A1 (en) Making graph pattern queries bounded in big graphs
WO2020237689A1 (en) Network structure search method and apparatus, computer storage medium, and computer program product
WO2020107264A1 (en) Neural network architecture search method and apparatus
CN116070557A (en) Data path circuit design using reinforcement learning
CN111582452A (en) Method and apparatus for generating neural network model
WO2023173878A1 (en) Quantum neural network training method and apparatus
CN118071161B (en) Method and system for evaluating threat of air cluster target under small sample condition
CN116090536A (en) Neural network optimization method, device, computer equipment and storage medium
CN115631631A (en) A Traffic Flow Forecasting Method and Device Based on Bidirectional Distillation Network
CN115169433A (en) Knowledge graph classification method based on meta-learning and related equipment
WO2024051655A1 (en) Method and apparatus for processing histopathological whole-slide image, and medium and electronic device
WO2020237687A1 (en) Network architecture search method and apparatus, computer storage medium and computer program product
WO2020237688A1 (en) Method and device for searching network structure, computer storage medium and computer program product
CN115114836A (en) A kind of data classification prediction method and equipment based on PSO-GA
WO2021146977A1 (en) Neural architecture search method and apparatus
CN116258923A (en) Image recognition model training method, device, computer equipment and storage medium
CN115034379B (en) A method and related equipment for determining causality
CN117591724A (en) Model generation method and device based on neural network structure search
WO2022127603A1 (en) Model processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19930938

Country of ref document: EP

Kind code of ref document: A1