Disclosure of Invention
The neural network structure searching method and the neural network structure searching device provided by the application can ensure that the neural network meeting the resource constraint condition of the target equipment is obtained by searching, so that the searching efficiency of the neural network structure can be improved.
In a first aspect, the present application provides a neural network structure searching method, including: determining constraint conditions of resources provided by target equipment for operating the neural network, wherein the neural network is used for processing images, texts or voices; obtaining a given search space of a neural network; training a sampling model according to the constraint condition and the given search space, so that the requirement of the neural network obtained by sampling the given search space by using the sampling model on the resource when the neural network runs in the target equipment meets the constraint condition; determining a candidate search space according to the sampling model and the given search space, wherein the candidate search space comprises a neural network sampled from the given search space based on the sampling model; and searching a target neural network according to the candidate search space.
In the method, because the sampling model is trained, the neural network which meets the resource constraint condition of the target equipment in the given search space can be sampled, and therefore the candidate search space determined based on the sampling model and the given search space contains the neural network which meets the resource constraint condition of the target equipment. Thus, the target neural network searched from the candidate search space can search and obtain a neural network structure meeting the resource constraint condition of the target device, and the search efficiency can be improved.
In addition, searching for a target neural network according to the candidate search space instead of searching for a given search space can avoid searching for a neural network that does not satisfy the resource constraint condition of the target device, and thus search efficiency can be improved.
Due to the fact that the method is high in searching efficiency, the target neural network can be searched by using large-scale training data, and therefore a better neural network running on the target device can be searched.
With reference to the first aspect, in a first possible implementation manner, the searching for a target neural network according to the candidate search space includes: searching a first network layer according to the candidate search space, wherein the first network layer is a network layer contained in any neural network in the given candidate search space; searching a second network layer according to the candidate search space and the first network layer; determining the target neural network according to the first network layer and the second network layer, wherein the second network layer comprises network layers except the first network layer in network layers included in any neural network in the given search space.
In this implementation, a key network layer (i.e., a first network layer) may be searched from the candidate search space, and then a non-key network layer (i.e., a second network layer) may be searched from the search space based on the searched network layers, so that a target neural network formed by the two network layers may be obtained.
The network layers that all the neural networks in the search space must include are usually key layers of the neural networks, and the role of the key layers in the task execution of the neural networks is usually more important, that is, the performance of the key layers usually determines the performance of the neural networks, therefore, in this implementation, the key layers (i.e., the first network layer) are searched first, the performance influence of the non-key layers (i.e., the second network layer) can be removed, so that the key layers with better performance can be searched, and the neural networks with better performance can be searched.
With reference to the first aspect or any one of the foregoing possible implementation manners, in a second possible implementation manner, the sampling model samples the given search space based on a gumbel-softmax sampling method to obtain a neural network.
With reference to the first aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the processing, by the neural network, an image may include: the image processing system is used for classifying the images, segmenting the images, detecting the images, identifying the images and generating the images; the neural network for processing text may include: the system is used for translating the text, repeating the text, generating the text and the like; the neural network for processing speech may include: for recognizing speech, for translating speech, for generating speech, etc.
With reference to the first aspect or any one of the foregoing possible implementation manners, in a fourth possible implementation manner, the resource is a computing resource and/or a storage resource.
In a second aspect, the present application provides a method for searching a neural network structure, the method comprising: the method comprises the steps that a server receives a first message from a target device, wherein the first message is used for requesting the server to conduct neural network structure search, and the neural network is used for processing images, texts or voice; the server searches a first network layer according to the given search space, wherein the first network layer is a network layer contained in any neural network in the given candidate search space; the server searches a second network layer according to the given search space and the first network layer, wherein the second network layer comprises network layers except the first network layer in the network layers included by any neural network in the given search space; the server sends a target neural network to the target device, the target neural network including the first network layer and the second network layer.
The network layers that all the neural networks in the search space must include are usually key layers of the neural networks, and the role of the key layers in the task execution of the neural networks is usually more important, that is, the performance of the key layers usually determines the performance of the neural networks, therefore, in this implementation, the server searches the key layers (i.e., the first network layer) first, and can remove the performance influence of the non-key layers (i.e., the second network layer), so that the key layers with better performance can be searched, and the neural networks with better performance can be searched.
In some possible implementations, the neural network for processing the image may include: the image processing system is used for classifying the images, segmenting the images, detecting the images, identifying the images and generating the images; the neural network for processing text may include: the system is used for translating the text, repeating the text, generating the text and the like; the neural network for processing speech may include: for recognizing speech, for translating speech, for generating speech, etc.
In a third aspect, the present application provides a neural network structure searching apparatus, including: the determining module is used for determining the constraint conditions of resources provided by the target equipment for operating the neural network, and the neural network is used for processing images, texts or voices; the acquisition module is used for acquiring a given search space of the neural network; a training module, configured to train a sampling model according to the constraint condition and the given search space, so that a requirement of a neural network obtained by sampling the given search space using the sampling model for the resource when the neural network runs in the target device meets the resource constraint condition; a determining module, configured to determine a candidate search space according to the sampling model and the given search space, where the candidate search space includes a neural network sampled from the given search space based on the sampling model; and the searching module is used for searching the target neural network according to the candidate searching space.
In the device, because the sampling model is trained, the neural network which meets the resource constraint condition of the target equipment in the given search space can be sampled, and therefore the candidate search space determined based on the sampling model and the given search space contains the neural network which meets the resource constraint condition of the target equipment. Thus, the target neural network searched from the candidate search space can search and obtain a neural network structure meeting the resource constraint condition of the target device, and the search efficiency can be improved.
In addition, searching for a target neural network according to the candidate search space instead of searching for a given search space can avoid searching for a neural network that does not satisfy the resource constraint condition of the target device, and thus search efficiency can be improved.
Because the device of the application has higher searching efficiency, the target neural network can be searched by using larger-scale training data, and thus a better neural network can be searched.
With reference to the third aspect, in a first possible implementation manner, the search module is specifically configured to: searching a first network layer according to the candidate search space, wherein the first network layer is a network layer contained in any neural network in the given candidate search space; searching a second network layer according to the candidate search space and the first network layer, wherein the second network layer comprises network layers except the first network layer in the network layers included by any neural network in the given search space; determining the target neural network according to the first network layer and the second network layer.
The network layers that all the neural networks in the search space must include are usually key layers of the neural networks, and the role of the key layers in the task execution of the neural networks is usually more important, that is, the performance of the key layers usually determines the performance of the neural networks, therefore, in this implementation, the key layers (i.e., the first network layer) are searched first, the performance influence of the non-key layers (i.e., the second network layer) can be removed, so that the key layers with better performance can be searched, and the neural networks with better performance can be searched.
With reference to the third aspect or any one of the foregoing possible implementation manners, in a second possible implementation manner, the sampling model samples the given search space based on a gumbel-softmax sampling method to obtain a neural network.
With reference to the third aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the processing, by the neural network, an image may include: the image processing system is used for classifying the images, segmenting the images, detecting the images, identifying the images and generating the images; the neural network for processing text may include: the system is used for translating the text, repeating the text, generating the text and the like; the neural network for processing speech may include: for recognizing speech, for translating speech, for generating speech, etc.
With reference to the third aspect or any one of the foregoing possible implementation manners, in a fourth possible implementation manner, the resource includes a computing resource and/or a storage resource.
In a fourth aspect, the present application provides a neural network structure searching apparatus, including: a receiving module, configured to receive a first message from a target device, where the first message is used to request the server to perform a neural network structure search; the device comprises an acquisition module, a search module and a processing module, wherein the acquisition module is used for acquiring a given search space of a neural network, and the neural network is used for processing images, texts or voice; the search module is used for searching a first network layer according to the given search space, wherein the first network layer is a network layer contained in any neural network in the given candidate search space; the search module is further configured to search a second network layer according to the given search space and the first network layer, where the second network layer includes network layers other than the first network layer among network layers included in any neural network in the given search space; a sending module, configured to send a target neural network to the target device, where the target neural network includes the first network layer and the second network layer.
The network layers that all the neural networks in the search space must include are usually key layers of the neural networks, and the role of the key layers in the task execution of the neural networks is usually more important, that is, the performance of the key layers usually determines the performance of the neural networks, therefore, in this implementation, the key layers (i.e., the first network layer) are searched first, the performance influence of the non-key layers (i.e., the second network layer) can be removed, so that the key layers with better performance can be searched, and the neural networks with better performance can be searched.
In some possible implementations, the neural network for processing the image may include: the image processing system is used for classifying the images, segmenting the images, detecting the images, identifying the images and generating the images; the neural network for processing text may include: the system is used for translating the text, repeating the text, generating the text and the like; the neural network for processing speech may include: for recognizing speech, for translating speech, for generating speech, etc.
In a fifth aspect, the present application provides a neural network structure searching apparatus, including: a memory to store instructions; a processor for executing the memory-stored instructions, the processor being configured to perform the method of the first aspect when the memory-stored instructions are executed.
In a sixth aspect, the present application provides a neural network structure searching apparatus, including: a memory to store instructions; a processor for executing the memory-stored instructions, the processor being configured to perform the method of the second aspect when the memory-stored instructions are executed.
In a seventh aspect, the present application provides a computer readable medium storing instructions for execution by a device to implement the method of the first aspect.
In an eighth aspect, the present application provides a computer readable medium storing instructions for execution by a device to implement the method of the second aspect.
In a ninth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect.
In a tenth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the second aspect.
In an eleventh aspect, the present application provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to perform the method in the first aspect.
Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in the first aspect.
In a twelfth aspect, the present application provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method in the second aspect.
Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to execute the method in the second aspect.
In a thirteenth aspect, the present application provides a computing device comprising a processor and a memory, wherein: the memory has stored therein computer instructions that are executed by the processor to implement the method of the first aspect.
In a fourteenth aspect, the present application provides a computing device comprising a processor and a memory, wherein: the memory has stored therein computer instructions that are executed by the processor to implement the method of the second aspect.
Detailed Description
Some terms used in the embodiments of the present application will be explained below.
The embodiments of the present application relate to related applications of neural networks, and in order to better understand the solution of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be related to the embodiments of the present application.
(1) Neural network
Neural Networks (NN) are complex network systems formed by a large number of simple processing units (called neurons) widely connected to each other, reflect many basic features of human brain functions, and are highly complex nonlinear dynamical learning systems.
Neural networkMay be composed of neural units, which may be referred to as xsAnd an arithmetic unit with intercept 1 as input, the output of which can be as shown in equation (1-1):
wherein s is 1, 2, … … n, n is a natural number greater than 1, and W issIs xsB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Deep neural network
Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.
Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:
wherein,
is the input vector of the input vector,
is the output vector of the output vector,
is an offset vector, W is a weight matrix (also called coefficient), α () is an activation function
Obtaining the output vector through such simple operation
Due to the large number of DNN layers, the coefficient W and the offset vector
The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as
The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as
Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.
(3) Convolutional Neural Network (CNN)
A convolutional neural network is a deep neural network with a convolutional structure. The convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. The feature extractor may be viewed as a filter and the convolution process may be viewed as convolving an input image or convolved feature plane (feature map) with a trainable filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The underlying principle is: the statistics of a certain part of the image are the same as the other parts. Meaning that image information learned in one part can also be used in another part. We can use the same learned image information for all locations on the image. In the same convolution layer, a plurality of convolution kernels can be used to extract different image information, and generally, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.
The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Back propagation algorithm
The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial super-resolution model in the training process, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, error loss occurs when an input signal is transmitted in a forward direction until the input signal is output, and parameters in an initial super-resolution model are updated by reversely propagating error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the super-resolution model, such as a weight matrix.
(5) Recurrent Neural Networks (RNN)
The purpose of RNNs is to process sequence data. In the traditional neural network model, from an input layer to a hidden layer to an output layer, all layers are connected, and nodes between each layer are connectionless. But such a general neural network is not capable of failing to address many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. RNNs are called recurrent neural networks, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer comprises not only the output of the input layer but also the output of the hidden layer at the last moment. Theoretically, RNNs can process sequence data of any length.
The RNN is trained as well as a conventional Artificial Neural Network (ANN). The BP error back-propagation algorithm is also used, but with a little difference. If the RNNs are to be network deployed, then the parameters W, U, V are shared, whereas conventional neural networks are not. And in using the gradient descent algorithm, the output of each step depends not only on the network of the current step, but also on the state of the network of the previous steps. For example, when t is 4, three steps need to be passed backwards, and all the last three steps need to be added with various gradients. This learning algorithm is referred to as a time-based back propagation algorithm.
The technical scheme of the application can be applied to the fields of cloud service, picture retrieval, photo album management, safe cities, automatic driving and the like which need the convolutional neural network. In practical application, according to a specific application scenario and the limitation of computing resources and storage resources of application equipment (such as a mobile phone terminal), a neural network model meeting given resource constraints is obtained by searching according to the technical scheme of the application, and then corresponding tasks, such as image recognition, object detection, object segmentation and the like, are executed by applying the neural network model.
Several exemplary application scenarios of the solution of the present application are presented below.
Application scenario 1: photo album management system
The user stores a large number of pictures in the mobile phone album, and wants to be able to perform classified management on the pictures in the album. For example, the user may wish to have the phone automatically categorize all bird images together and all people photos together.
In such a scenario, the technical scheme provided by the application can be utilized to search out the image classification model structure matched with the mobile phone computing resource based on the mobile phone computing resource of the user. Therefore, the image classification model structure is operated on the mobile phone, and the pictures of different types in the mobile phone photo album can be classified and managed, so that the searching of a user is facilitated, the management time of the user is saved, and the photo album management efficiency is improved.
Application scenario 2: object detection and segmentation
In automatic driving, objects such as pedestrians and vehicles on the street are detected and segmented, and safety driving decisions of the vehicles are made to be very important. In the application scene, the technical scheme provided by the application can be utilized, and the target detection and segmentation model structure matched with the vehicle computing resource is searched out based on the vehicle computing resource. Therefore, the target detection and segmentation model with the model structure is operated on the vehicle, and the target in the image acquired by the vehicle can be accurately detected, positioned and segmented.
Fig. 1 is an exemplary flowchart of a neural network structure search method of the present application. As shown in fig. 1, the method may include S110 to S150.
S110, obtaining a given search space of a neural network, wherein the neural network is used for processing images, texts or voice.
The neural network may be a neural network for classifying an image, a neural network for segmenting an image, a neural network for detecting an image, a neural network for recognizing an image, a neural network for generating a specific image, a neural network for translating a text, a neural network for repeating a text, a neural network for generating a specific text, a neural network for recognizing a voice, a neural network for translating a voice, a neural network for generating a specific voice, or the like.
In another dimension, the neural network may be a convolutional neural network or a cyclic neural network, etc.
It is understood that, in the present embodiment, the two concepts of the neural network and the neural network structure are equivalent. For example, obtaining a given search space of a neural network may be understood as obtaining a given search space of a neural network structure, the neural network being used for processing images, text or speech, and may be understood as having a neural network structure for processing images, text or speech.
Many operations may be included in the given search space, and all or some of the operations may constitute different neural networks based on different connection manners, or the network structures of the neural networks constituted by all or some of the operations may differ based on different connection manners.
And S120, determining the constraint conditions of the resources provided by the target device for operating the neural network.
The resources include computing resources and/or storage resources of the target device, where one of the computing resources of the target device includes a floating-point operation per second (floating-point operation per second) of the target device, and an example of a storage resource constraint of the target device is a memory resource of the target device.
The computing resources may be used to constrain a computational volume of the searched target neural network, and the storage resources may be used to constrain a parametric volume of the searched target neural network.
For example, the memory that can be provided by the target device to operate the neural network is preset to be 5 mbytes, that is, when the target device operates the neural network, 5 mbytes of memory can be provided. In this case, the parameter amount of the neural network should be about 5 mbytes.
For another example, the preset target device may provide the computing resource for operating the neural network, so that 500M times of computation are performed per second, and the target device is expected to complete the operation of the neural network within the target duration, that is, the target device may perform 500M times of computation per second when operating the neural network. At this time, the parameters of the neural network should be around 500M target duration bytes.
The target device may be an intelligent terminal device, such as a smart phone, a tablet computer, an intelligent home device, a vehicle, a robot, an unmanned aerial vehicle, and the like.
S130, training a sampling model according to the constraint condition and the given search space, so that the requirement of the neural network obtained by sampling the given search space by using the sampling model on the resource when the neural network runs in the target equipment meets the resource constraint condition.
Wherein the sampling model is used for sampling the given search space to obtain the neural network in the given search space.
In this embodiment, training the sampling model according to the given search space may include: sampling from a given search space by using a sampling model to obtain a neural network; then calculating the resource demand of the neural network, such as calculating the parameter quantity and the calculated quantity of the neural network; then, judging whether the resource demand of the neural network meets a preset resource constraint condition; if not, adjusting the parameters of the sampling model; and continuously repeating the four steps by using the adjusted sampling model until the training stopping condition is met, for example, the preset training times are reached, or the resource requirement of any neural network sampled by using the sampling model meets the resource constraint condition.
For example, when the image classification model needs to be run on the mobile phone for album management, it may be determined in advance according to the size of the memory of the mobile phone that the memory allocated to the image classification model when the image classification model is run by the mobile phone is 5 mbytes, and the memory constraint condition of the mobile phone is set to be 5 mbytes. In this way, when searching for the neural network structure, the sampling model may be trained according to the constraint condition of the parameter number of 5 mbytes, so that the sampling model may search for the neural network with the parameter number of about 5 mbytes from a given search space.
For another example, when the target detection model needs to be run on the vehicle to detect the target object on the street, it may be determined in advance according to the size of the memory of the vehicle control system that the memory that the vehicle control system can allocate to the target detection model when running the target detection model is 10 mbytes, and the memory constraint condition of the vehicle control system is set to be 10 mbytes. In this way, when searching for a neural network structure, the sampling model may be trained according to the constraint condition of 10 mbytes, so that the sampling model may search for a neural network with a parameter of about 10 mbytes from a given search space.
For convenience of description, the parameters of the sampling model are referred to as structural parameters in this embodiment.
In some possible implementations, the sampling model may sample the given search space based on a gumbel-softmax sampling method to derive a neural network.
It is understood that the present embodiment does not limit the sampling method used by the sampling model for sampling. For example, the sampling model may also sample the given search space based on the Gumbel-max method or the ST-Gumbel-softmax method.
It can be understood that, the term that the resource requirement of the sampled neural network satisfies the resource constraint condition in the present application does not mean that the resource requirement of the sampled neural network is completely the same as the resource requirement indicated in the constraint condition, but means that a difference between the resource requirement of the sampled neural network and the resource requirement indicated in the constraint condition is within a reasonable range, for example, the difference is less than or equal to a preset threshold.
And S140, determining a candidate search space according to the sampling model and the given search space, wherein the candidate search space comprises a neural network sampled from the given search space based on the sampling model.
For example, the candidate search space refers to a search space formed by a neural network that can be sampled from the given search space using the sampling model.
And S150, searching a target neural network according to the candidate search space.
In other words, a target neural network is searched from the candidate search space, and the network structure of the target neural network is the searched target neural network structure.
In some implementations, the target neural network structure can be searched from the candidate search space with reference to a neural network structure search method in the prior art.
In this embodiment, because the sampling model is trained, the neural network that satisfies the target device resource constraint condition in the given search space can be sampled, and therefore, the candidate search space determined based on the sampling model and the given search space includes the neural network that satisfies the target device resource constraint condition. Therefore, the target neural network searched from the candidate search space can search and obtain a neural network structure meeting the resource constraint condition of the target equipment, and the search efficiency can be improved.
In addition, the target neural network is searched according to the candidate search space instead of searching the given search space, so that the neural network which does not meet the resource constraint condition of the target equipment can be prevented from being searched, and the search efficiency can be improved.
In other implementations, the searching for the target neural network according to the candidate search space may include: searching a first network layer according to the candidate search space; searching a second network layer according to the candidate search space and the first network layer; determining the target neural network according to the first network layer and the second network layer.
In other words, a part of the network layers may be searched in the candidate search space, and the network layer that has been searched may be used as the network layer of the target neural network, and then other network layers of the target neural network may be searched.
For convenience of description, the present embodiment refers to a network layer searched first as a first network layer, and a network layer searched based on the first network layer as a second network layer. The first network layer can be one or a plurality of; likewise, the second network layer may be one or more.
In this implementation, since the second network layer is not required to be searched when the first network layer is searched, the first network layer can be searched more efficiently; when the second network layer is searched, the first network layer is determined, so that the second network layer can be searched more efficiently, and the searching efficiency of the neural network is improved finally.
In some implementations, the first network layer may be a network layer included in any neural network in the candidate search space, or the first network layer may be a network layer included in any neural network in the given candidate search space.
The network layers that all neural networks must contain, whether in a candidate search space or a given search space, are typically key layers of the neural networks. That is, in this implementation, the key layer is searched first, and then the non-key layer is searched.
Because the role of the key layer in the task execution of the neural network is generally important, that is, the performance of the key layer generally determines the performance of the neural network, in the implementation manner, the key layer is searched first, so that the performance influence of the non-key layer can be removed, the key layer with better performance can be searched, and the neural network with better performance can be searched.
An exemplary flowchart of a search method for searching for a key layer first and then searching for a non-key layer in the present application is shown in fig. 2. The method includes S210 to S270.
S210, obtaining a given search space.
S220, obtaining given resource constraint conditions.
And S230, sampling according to the given search space and the given resource constraint condition to obtain a candidate search space.
And S240, determining a key layer.
And S250, searching the structure of the key layer from the candidate search space.
And S260, searching the structure of the non-key layer from the candidate search space.
And S270, searching to obtain the structure of the target neural network.
In this embodiment, because the candidate search space is obtained by sampling according to the given resource constraint condition, the target neural network structure obtained by searching in the candidate search space can satisfy the resource constraint condition, and the search range can be narrowed, thereby improving the search efficiency. In addition, the key layer is searched first, and then the non-key layer is searched, so that the importance of different layers in the neural network is fully considered, and the neural network structure with better performance can be searched.
An exemplary implementation of the present application to determine the key layer is described below.
Firstly, a super network (SuperNet) corresponding to a given search space is established, the super network comprises operations in the given search space and connection modes among the operations, and the super network is marked as N.
And then converting the super network N into a directed acyclic graph G ═ V, E, wherein V is a node of the directed acyclic graph and corresponds to a feature graph output by the operation in the super network, and E is an edge of the directed acyclic graph and corresponds to the operation in the super network. Each subgraph in the directed acyclic graph is a path from a starting point to an end point, and each subgraph corresponds to a neural network or a neural network structure.
Assuming a whole super network packetComprising n different subgraphs GiI is an integer, and n is taken from 1, the key layer of the neural network in a given search space can be obtained by using the following method:
the method comprises the following steps: g ═ S1(initialization key layer S), i ═ 1;
step two: i is i + 1;
step three, S is S ∩ Gi;
And repeating the second step and the third step until i is larger than n. At this time, the layer in which the operation included in S is located is the key layer.
In the embodiment of the application, when the sampling model samples the given search space based on a gumbel-softmax sampling method, the neural networks obtained by sampling all satisfy the following constraints:
Aθ,i=GumbelSoftmax(θ)
wherein, theta represents the structural parameter of the sampling model, Gumbel Softmax (theta) represents the Gumbel-Softmax sampling, Aθ,iRepresents the ith resource requirement, R, of the neural network sampled using the Gumbel-softmax sampling methodiIndicating that the resource constraint condition includes the ith resource requirement, α is a preset value, LiShould be less than or equal to a predetermined threshold, MiThe ith resource requirement of a certain neural network sampled by using a Gumbel-softmax sampling method in the given search space is represented, and the ith resource requirement is the maximum requirement in the ith resource requirements of all the neural networks in the given search space.
In various embodiments of the present application, optionally, the sampling model may be trained multiple times to obtain different structural parameters. Furthermore, different candidate search spaces can be obtained by sampling from a given search space based on different structural parameters, and the neural network model with better performance can be obtained by searching based on different search spaces.
One implementation of the sampling model for sampling the given search space based on the gumbel-softmax sampling method is described below.
In this implementation, a given search space may be divided into M sub-search spaces, each sub-search space including one or more operations, M being a positive integer. The M sub-search spaces are in one-to-one correspondence with M network layers of the neural network, that is, each sub-search space is used for searching an operation that one network layer of the neural network should include, or an operation that one network layer of the neural network should include can be searched from each sub-search space, the M sub-search spaces can search M operations, the M operations are respectively operations that each network layer of the M network layers should include, and the M operations constitute a target neural network or a target neural network structure.
It is to be appreciated that null operations can be included in any of the sub-search spaces. When a null operation is searched, it means that the target neural network may not contain the corresponding network layer.
The sampling model may include a plurality of structural parameters, and the plurality of structural parameters may be divided into M structural parameter sets, and the M structural parameter sets are in one-to-one correspondence with the M sub-search spaces. Each structural parameter set may include one or more structural parameters, and the structural parameters in each structural parameter set correspond to operations in the sub-search space corresponding to the structural parameter set in a one-to-one manner.
When the sampling model is used for sampling the given search space based on a gumbel-softmax sampling method, for each sub-search space, gumbel-softmax sampling is carried out on the structural parameters in the structural parameter set corresponding to the sub-search space, and the operation corresponding to the structural parameters obtained through sampling is the operation obtained through searching in the sub-search space.
And performing Gumbel-softmax sampling on each sub-search space in the M sub-search spaces to obtain M operations, wherein a neural network formed by the M operations is the neural network obtained by sampling.
In the embodiment, a neural network is obtained by sampling from a given search space by using a sampling model based on a gumbel-softmax sampling method; then calculating the resource demand of the neural network, such as calculating the parameter quantity and the calculated quantity of the neural network; then, judging whether the resource demand of the neural network meets a preset resource constraint condition; if not, adjusting the parameters of the sampling model; and continuously repeating the four steps by using the adjusted sampling model until the training stopping condition is met, for example, the preset training times are reached, or the resource requirement of any neural network sampled by using the sampling model meets the resource constraint condition.
After the training of the sampling model is completed, any neural network searched from the given search space based on the gumbel-softmax sampling method can meet the resource constraint condition according to the sampling model.
One implementation of searching for the key layer first and then searching for the non-key layer in the present application is described below with reference to fig. 3. The arrows in fig. 3 indicate the data flow direction.
As shown in fig. 3, a given search space may be divided into 7 sub-search spaces, where the 7 sub-search spaces correspond to 7 network layers of a neural network one-to-one, and each sub-search space includes 4 different operations.
As shown in fig. 3 (a), of the 7 network layers of the neural network in the given search space, the first, fourth, and seventh network layers are key layers, and the other network layers are non-key layers. Wherein the critical layer may be determined using the methods described previously.
Accordingly, the first sub-search space, the fourth sub-search space, and the seventh sub-search space in the given search space are key sub-search spaces and are respectively used for searching operations of the first network layer, the fourth network layer, and the seventh network layer.
As shown in (b) of fig. 3, an operation that the first network layer should contain may be searched for from four operations in the first sub-search space, an operation that the fourth network layer should contain may be searched for from four operations in the fourth sub-search space, and an operation that the seventh network layer should contain may be searched for from four operations in the seventh sub-search space.
One implementation of the search operation from the first, fourth, and seventh sub-search spaces in the given search space comprises the steps of:
inputting the same training data to four operations in the first sub-search space, then respectively adding the outputs of the four operations and the structural parameters corresponding to the four operations, and then summing the four values obtained by adding, wherein the sum is called as first output data for convenience of description; then, inputting the first output data into four operations in a fourth sub-search space, respectively adding outputs of the four operations and structural parameters corresponding to the four operations, respectively, and summing up four values obtained by the addition, for convenience of description, the sum is referred to as second output data; subsequently, the second output data is respectively input into four operations in the seventh sub-search space, and the outputs of the four operations are respectively added to the structural parameters corresponding to the four operations, and the four values obtained by the addition are summed, and for convenience of description, the sum is referred to as third output data; finally, adjusting parameters of operations in the first sub-search space, the fourth sub-search space and the seventh sub-search space and adjusting structural parameters corresponding to the operations according to the third output data and the training data.
The above steps are repeated until the training is stopped, so that the loss value of the third output data relative to the training data is smaller and smaller. When the operations in the key sub-search space are trained, the outputs of the operations are weighted and summed by using the structural parameters corresponding to the operations to obtain the input of the operations in the next network layer, that is, the outputs of the operations in the key sub-search space are limited by the corresponding structural parameters to be used as the input of the operations in the next network layer, so that the search of the key network layer can be considered to be performed in the candidate search space.
After the training is stopped, the target operation can be obtained by sampling from each key sub-search space according to the adjusted structure parameters. As shown in (c) of fig. 3, the third operation is searched from the first sub-search space, the first operation is searched from the fourth sub-search space, and the second operation is searched from the seventh sub-search space.
Next, it may be assumed that the first network layer, the fourth network layer, and the seventh network layer of the target neural network respectively contain the three operations searched in the foregoing, and then the operations of the other network layers of the target neural network are searched in the other sub-search spaces.
The method for searching the operation of other network layers of the target neural network in other sub-search spaces is similar to the method for searching the operation of the key network layer in the key sub-search space, and is not repeated here.
As shown in (d) of fig. 3, the third operation is searched from the second sub-search space, the second operation is searched from the third sub-search space, the fourth operation is searched from the fifth sub-search space, and the first operation is searched from the sixth sub-search space.
That is, the neural network formed by the seven operations shown in fig. 3(d) is the target neural network.
The information statistical table of the image classification model obtained by the neural network structure searching method and other neural network structure searching methods is shown in table 1. It is to be understood that the image classification model may be used alone for image classification, or may be migrated to other models to assist in performing other tasks, for example, the image classification model may be migrated to an image detection or image segmentation model.
TABLE 1 statistics of information results for models searched by various neural network structure search methods
In table 1, the model (model) in the first column represents the name of the model searched using the corresponding search method; the type (type) in the second column refers to a method of searching for a corresponding model, for example, whether it is manually designed (manual) or automatically searched (auto); the search dataset (search dataset) in the third column is the dataset used for searching or training the corresponding model, such as CIFAR-10 dataset or ImageNet dataset; the search cost (search cost) in the fourth column means that several image processor days (GPU days) are consumed to search for the corresponding model; the parameter number (params) in the fifth column refers to the parameter amount of the corresponding model, and the unit is M; the calculated quantity in the sixth column refers to the FLOPS of the corresponding model, and the unit is M; top-1 in the seventh column refers to the probability that the first ranked result of the multiple results obtained by executing the task using the corresponding model is the true result; top-5 in the eighth column indicates the probability that the Top five results of the multiple results obtained by executing the task using the corresponding model contain the real structure; "-" indicates that there is no such item of data.
HiNAS-A, HiNAS-B and HiNAS-C are three image classification models obtained by searching with the applied neural network structure searching method, and the three models are different in that the three models are obtained by searching based on different resource constraint conditions. The resource constraint condition corresponding to the HiNAS-A is that the parameter number is 4.8M, and the calculated amount is 3M; the resource constraint condition corresponding to the HiNAS-B is that the parameter number is 5.5, and the calculated amount is 4M; the resource constraint condition corresponding to HiNAS-C is that the calculated amount is 5M.
Wherein CIFAR-10 is a small dataset for identifying pervasive objects, organized by Alex Krizhevsky and Ilya Sutskeeper, students of Hinton; ImageNet refers to a public data set used for image network large scale visual recognition challenge (ILSVRC) competition; the GPU is an abbreviation of an image processing unit (graphics processing unit); GPU days means the number of days required in the case of using one GPU, for example, 0.2GPU days means that a task can be completed within 0.2 days using one GPU.
The method can complete searching in 0.2GPU days. Compared with FBNet-B, the method of the application requires shorter search time under the condition of the same accuracy.
In addition, the accuracy of top1 of the HiNAS-B model obtained by the method is 0.4% higher than that of FBNet-C under the condition of the same FLOPs and parameter quantity. Under the condition that the FLOPs are only restricted to be 500M, the accuracy of the obtained HiNAS-C model top1 is 75.6%, which is superior to the existing method.
The present application also provides a searching method of a neural network structure, which may include S410 to S440, as shown in fig. 4.
S410, the server receives a first message from the target device, wherein the first message is used for requesting the server to search for the neural network structure.
The function of the neural network that the target device requests to search, such as requesting to search an image classification model, an image detection model, an image segmentation model, and the like, may also be included in the first message.
The target device may be an intelligent terminal device, such as a smart phone, a tablet computer, an intelligent home device, a vehicle, a robot, an unmanned aerial vehicle, and the like.
S420, the server acquires a given search space of a neural network, and the neural network is used for processing images, texts or voices.
S430, the server searches a first network layer according to the given search space, wherein the first network layer is a network layer contained in any neural network in the given candidate search space.
Here, the server searching for the first network layer refers to searching for an operation included in the first network layer of the target neural network. That is, all neural networks in a given candidate search space contain the same certain network layers, referred to as the first network layer, from which it can be determined that the target neural network will also contain the first network layer, and thus the operation as the first network layer of the target neural network is searched from the search space of the first network layer of the given search space.
S440, the server searches a second network layer according to the given search space and the first network layer, where the second network layer includes network layers other than the first network layer in network layers included in any neural network in the given search space.
Here, the server searching for the second network layer refers to searching for an operation included in the second network layer of the target neural network. That is, the second network layer's operation as a target neural network is searched from the search space of the second network layer of the given search space.
S450, the server sends a target neural network to the target device, the target neural network comprises the first network layer and the second network layer, and the neural network is used for processing images, texts or voices.
After the server searches and obtains the first network layer and the second network layer, the neural network formed by the first network layer and the second network layer can be obtained and sent to the target device.
The implementation of each step in the method can refer to the related content, and is not described herein again.
In the method, the first network layer can be searched more efficiently because the second network layer is not required to be searched when the first network layer is searched; when the second network layer is searched, the first network layer is determined, so that the second network layer can be searched more efficiently, and the searching efficiency of the neural network is improved finally.
The network layers that all the neural networks in the search space must include are usually key layers of the neural networks, and the role of the key layers in the task execution of the neural networks is usually more important, that is, the performance of the key layers usually determines the performance of the neural networks, therefore, in this implementation, the key layers (i.e., the first network layer) are searched first, the performance influence of the non-key layers (i.e., the second network layer) can be removed, so that the key layers with better performance can be searched, and the neural networks with better performance can be searched.
For example, when the image classification model needs to be run on a mobile phone for album management, a first message may be sent to the server to request for searching the image classification model. After the server searches for the image classification model, the image classification model can be retrained and then sent to the mobile phone.
For another example, when the vehicle needs to run the object detection model to detect the object on the street, a first message may be sent to the server requesting to search for the image detection model. After the server searches for the image classification model, the image classification model may be retrained and then sent to the vehicle.
Fig. 5 is a diagram showing an exemplary configuration of the neural network structure search device according to the present invention. The apparatus 500 includes an acquisition module 510, a training module 520, a determination module 530, and a search module 540. The apparatus 500 may implement the method described above with reference to any of fig. 1-3.
For example, the determining module 530 is configured to perform S110 and S140, the obtaining module 510 is configured to perform S120, the training module 520 is configured to perform S130, and the searching module 540 is configured to perform S150.
Optionally, the searching module 540 may be specifically configured to execute S240 to S270.
In some implementations, the apparatus 500 may be deployed in a cloud environment, which is an entity that utilizes underlying resources to provide cloud services to users in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, which may include a large number of computing devices (e.g., servers), and a cloud service platform. The apparatus 500 may be a server for neural network structure search in a cloud data center. The apparatus 500 may also be a virtual machine created in a cloud data center for neural network structure search. The apparatus 500 may also be a software apparatus deployed on a server or a virtual machine in a cloud data center, the software apparatus being used for performing neural network structure search, and the software apparatus may be deployed in a distributed manner on a plurality of servers, or in a distributed manner on a plurality of virtual machines, or in a distributed manner on a virtual machine and a server. For example, the training module 520, the determination module 530, and the search module 540 in the apparatus 500 may be deployed in a distributed manner on multiple servers, or in a distributed manner on multiple virtual machines, or in a distributed manner on a virtual machine and a server.
The device 500 may be abstracted as a cloud service for searching a neural network structure by a cloud service provider on a cloud service platform and provided to a user, after the user purchases the cloud service on the cloud service platform, the cloud environment provides the cloud service for searching the neural network structure to the user by using the cloud service, the user may upload a resource constraint condition to the cloud environment through an Application Program Interface (API) or a web interface provided by the cloud service platform, the device 500 receives the resource constraint condition, the neural network structure search is performed according to the resource constraint condition, and the finally searched neural network structure is returned to the edge device where the user is located by the device 500.
When the apparatus 500 is a software apparatus, the apparatus 500 may also be deployed on one computing device of any environment alone.
Fig. 6 is a diagram showing an exemplary configuration of the neural network structure search device according to the present invention. The apparatus 600 includes a receiving module 610, an obtaining module 620, a searching module 630, and a sending module 640. The apparatus 600 may implement the method illustrated in fig. 4 described above.
For example, the receiving module may be configured to execute S410, the obtaining module 620 is configured to execute S420, the searching module 620 is configured to execute S420 to S440, and the sending module 640 is configured to execute S450.
In some implementations, the apparatus 600 may be deployed in a cloud environment, which is an entity that utilizes underlying resources to provide cloud services to users in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, which may include a large number of computing devices (e.g., servers), and a cloud service platform. The apparatus 600 may be a server for neural network structure search in a cloud data center. The apparatus 600 may also be a virtual machine created in a cloud data center for neural network structure search. The apparatus 600 may also be a software apparatus deployed on a server or a virtual machine in a cloud data center, the software apparatus being used for performing neural network structure search, and the software apparatus may be deployed in a distributed manner on a plurality of servers, or in a distributed manner on a plurality of virtual machines, or in a distributed manner on a virtual machine and a server. For example, the search module 620 in the apparatus 600 may be deployed in a distributed manner on a plurality of servers, or in a distributed manner on a plurality of virtual machines, or in a distributed manner on a virtual machine and a server.
The apparatus 600 may be abstracted as a cloud service for searching a neural network structure by a cloud service provider on a cloud service platform and provided to a user, after the cloud service is purchased by the user on the cloud service platform, the cloud environment provides the cloud service for searching the neural network structure to the user by using the cloud service, the user may upload a model type to the cloud environment through an Application Program Interface (API) or through a web interface provided by the cloud service platform, the apparatus 600 receives the model type (for example, image classification, image detection, or image segmentation), searches the neural network structure according to the model type, and the finally searched neural network structure of the type is returned to an edge device where the user is located by the apparatus 600.
When the apparatus 600 is a software apparatus, the apparatus 600 may also be deployed on a computing device in any environment.
The present application also provides an apparatus 700 as shown in fig. 7, the apparatus 700 comprising a processor 702, a communication interface 703 and a memory 704. One example of the apparatus 700 is a chip. Another example of an apparatus 700 is a computing device. Another example of the apparatus 700 is a server.
The processor 702, memory 704, and communication interface 703 may communicate over a bus. The memory 704 has executable code stored therein, and the processor 702 reads the executable code in the memory 704 to perform a corresponding method. The memory 704 may also include other software modules required to run a process, such as an operating system. The operating system may be LINUXTM,UNIXTM,WINDOWSTMAnd the like.
For example, the executable code in the memory 704 is used to implement the method shown in any of fig. 1 to 4, and the processor 702 reads the executable code in the memory 704 to perform the method shown in any of fig. 1 to 4.
The processor 702 may be a Central Processing Unit (CPU). The memory 704 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 704 may also include a non-volatile memory (2 NVM), such as a read-only memory (2 ROM), a flash memory, a Hard Disk Drive (HDD) or a Solid State Drive (SSD).
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.