Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
It will be appreciated that in the specific embodiments of the present application, related data such as images of items, user information, etc. are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.
Fig. 1 shows a schematic diagram of a system 100 in which embodiments of the application may be applied. As shown in fig. 1, the system 100 may include a server 101 and a terminal 102.
The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. In one implementation of the present example, server 101 is a cloud server, and server 101 may provide artificial intelligence cloud services, such as artificial intelligence cloud services that provide large image classification.
The terminal 102 may be any device, and the terminal 102 includes, but is not limited to, a cell phone, a computer, a smart voice interaction device, a smart home appliance, a vehicle terminal, a VR/AR device, a smart watch, a computer, and the like. In one embodiment, the server 101 or terminal 102 may be a node device in a blockchain network or a map internet of vehicles platform.
In one implementation manner of the present example, the server 101 or the terminal 102 may obtain an initial network structure unit, where the initial network structure unit includes at least one node representing a feature map, and there is a connection edge between the nodes, obtain a candidate operation set, where the candidate operation set includes at least one candidate operation corresponding to each connection edge, where the candidate operation is an operation for processing the feature map, calculate, according to a structural parameter of the candidate operation corresponding to each connection edge, a probability distribution of each connection edge in the current forward propagation, where the probability distribution of each connection edge selects a corresponding candidate operation, and according to the probability distribution of each connection edge, select, for each connection edge, only one candidate operation from the corresponding candidate operation to be added to the network structure unit, to obtain a current network to be optimized, perform discrete distribution serialization processing on the probability distribution, and perform gradient optimization on the structural parameter of the candidate operation in the current network to be optimized based on the serialization processing result until a predetermined optimization condition is met, to obtain a target image processing network.
In one implementation manner of the present example, the server 101 or the terminal 102 may acquire an image to be classified, and perform classification processing on the image to be classified by using a target image processing network for performing image classification, to obtain a classification result corresponding to the image to be classified, where the target image processing network is generated by using the image processing network generation method according to any embodiment of the present application.
Fig. 2 schematically shows a flow chart of an image processing network generation method according to an embodiment of the application. The execution subject of the image processing network generation method may be any device, such as the server 101 or the terminal 102 shown in fig. 1.
As shown in fig. 2, the image processing network generation method may include steps S210 to S250.
Step S210, an initial network structure unit is obtained, wherein the initial network structure unit comprises at least one node representing a feature map, and connecting edges are arranged between the nodes;
Step S220, a candidate operation set is obtained, wherein the candidate operation set comprises at least one candidate operation corresponding to each connecting edge, and the candidate operation is used for processing the feature map;
step S230, generating probability distribution of selecting the corresponding candidate operation by each connecting edge in the current forward propagation according to the structure parameters of the candidate operation corresponding to each connecting edge;
step S240, according to the probability distribution corresponding to each connection edge, only selecting one candidate operation from the corresponding candidate operations for each connection edge and adding the candidate operation to the initial network structure unit to obtain the current network to be optimized;
and S250, carrying out discrete distribution continuous processing on the probability distribution, and carrying out gradient optimization on the structural parameters of the candidate operation in the current network to be optimized based on the continuous processing result until the structural parameters meet the preset optimization conditions, so as to obtain the target image processing network.
The initial network structural unit is the neural network structural unit of the candidate operation to be searched. The neural network structural units can be called cell units, and the whole neural network can be formed by stacking the cell units. The initial network structure unit can comprise at least one node, each node represents a feature map, the feature map is a feature matrix, for example, the feature matrix is obtained through convolution operation, the nodes can be connected through a directional connecting edge, the connecting edge is an unknown candidate operation in the initial network structure, and the candidate operation at the connecting edge is to be searched. The initial network structural unit may be obtained from a predetermined location.
The candidate operation set includes at least one candidate operation corresponding to each connection edge, where the candidate operation is an operation for processing the feature map, and the candidate operation is an operation such as a convolution operation, a pooling operation, a jump connection, and the like. The candidate operation set may be a search space composed of candidate operations, and the candidate operation set may be acquired from a predetermined location.
Each initial network element may be abstracted to a directed acyclic graph comprising N nodes { x (0),x(1),…,x(N-1) }, for example, fig. 3 illustrates an initial network element including 7 nodes (e.g., 0 and 1), where each node x (i) represents a feature graph in the network, and a candidate operation (such as a convolution operation) may be connected between two nodes connected by a connection edge, and the purpose of generating the image processing network is to select, by searching the network structure, a candidate operation from the candidate operation set that is most suitable for each connection edge. The feature map represented by the previous node can be processed through candidate operation to obtain the feature map represented by the next node.
Each time forward propagation is performed (before gradient optimization is performed until a predetermined optimization condition is met, multiple forward propagation may be performed, for example, each time after gradient optimization is performed on a current network to be optimized is completed, next forward propagation may be performed), probability distribution of selecting corresponding candidate operations by each connection side in current forward propagation may be calculated according to structural parameters of the candidate operations corresponding to the connection sides, and the probability distribution is generated by probability of selecting corresponding candidate operations at each connection side. For example, the connection edge a corresponds to 9 predetermined candidate operations, and the probability that the connection edge selects each of the 9 predetermined candidate operations may be calculated, thereby generating a probability distribution corresponding to the connection edge a.
According to the probability distribution corresponding to each connection edge, only one candidate operation is selected from the corresponding candidate operations for each connection edge to be added to the initial network structural unit, for example, the connection edge A corresponds to 9 predetermined candidate operations, only one is selected from the 9 predetermined candidate operations according to the probability distribution corresponding to the connection edge A, and the selected single candidate operation is added to the initial network structural unit at the connection edge A. And searching a candidate operation for each connecting edge in the initial network structure unit respectively to obtain a complete network comprising nodes and the candidate operation, namely the current network to be optimized.
The probability distribution generated based on the structural parameters of the candidate operation is discretized, discrete distribution continuous processing is carried out on the probability distribution, and gradient optimization can be carried out on the structural parameters of the candidate operation in the current network to be optimized based on the continuous processing result, so that the structural parameters of each candidate operation have gradients from a deep network when the structure parameters are reversely propagated through gradient optimization, further the structural parameters meeting the requirements can be optimized, and the target image processing network is obtained when the optimization is carried out until the preset optimization conditions are met.
The target image processing network is a network composed of network structural units (i.e., searched network structural units) of candidate operations (i.e., searched network structural units) that search for structural parameters that meet the requirements, for example, a searched network structural unit obtained after searching for candidate operations (e.g., skip-connect) between 7 nodes (e.g., 0 and 1) in the initial network structural units in fig. 3 shown in fig. 4. The image to be processed can be processed based on a target image processing network, and the target image processing network can be a neural network for performing functions such as image classification or image target detection.
In this way, based on steps S210 to S250, during the image processing network generation process, the probability distribution of the corresponding candidate operation is selected by calculating each connection edge, and then only one candidate operation is selected from the corresponding candidate operations for each connection edge and added to the network structure unit, so as to implement the image processing network generation method based on the single-path sampling network structure search, avoid the requirement of the connection edge between the nodes on all the candidate operations, effectively reduce the consumption of the computing resource in the image processing network search generation process, improve the efficiency of the network structure search, save the occupation of the video memory, and also flexibly deploy on the device with limited video memory.
Other specific alternative embodiments of the steps performed when the embodiment of fig. 2 performs image processing network generation are described below.
In one embodiment, step S230, generating a probability distribution of selecting a candidate operation corresponding to each connection edge in the current forward propagation according to the structure parameters of the candidate operation corresponding to each connection edge, includes:
The method comprises the steps of carrying out exponential operation on structural parameters of each candidate operation corresponding to each connecting edge to obtain parameter operation results corresponding to each candidate operation, summing the parameter operation results corresponding to each candidate operation corresponding to each connecting edge to obtain summation results corresponding to the connecting edges, dividing summation of the parameter operation results corresponding to each candidate operation corresponding to each connecting edge and summation results corresponding to the connecting edges to obtain probability that each candidate operation is selected by the corresponding connecting edge, summing the probability that each candidate operation is selected by the corresponding connecting edge with a random sampling value to obtain target probability values corresponding to each candidate operation, and obtaining probability distribution of each candidate operation selected by each connecting edge in current forward propagation based on the target probability values corresponding to each candidate operation corresponding to each connecting edge.
Specifically, the probability distribution of selecting the corresponding candidate operation for each connection edge may be calculated based on the following gummel-Max method formula:
wherein, Is the structural parameter of each candidate operation o corresponding to the connecting edge (i, j); The probability of selecting each candidate operation o is represented by a connecting edge (i, j), i.e., the probability that each candidate operation o is selected by the corresponding connecting edge (i, j), i.e., the connecting edge between node i and node j. Candidate operation o belongs to a set of at least one candidate operation corresponding to the connection edge I.e. the parameter operation result corresponding to the candidate operation o; G is a random sampling value of Gumbel distribution, and candidate operations sampled in any forward propagation can be different due to randomness; i.e. the target probability value corresponding to candidate operation o, and the set of V, i.e. the connecting edges, selects the probability distribution of the corresponding candidate operation.
In one embodiment, according to probability distribution corresponding to each connecting edge, only one candidate operation is selected from the corresponding candidate operations for each connecting edge and added to an initial network structure unit, wherein the method comprises the steps of determining the candidate operation corresponding to the maximum target probability value in the probability distribution corresponding to each connecting edge, and obtaining the target candidate operation corresponding to each connecting edge; and adding the target candidate operation corresponding to each connecting edge to the corresponding position of each connecting edge in the initial network structure unit.
Wherein, can be formulated asDetermining a maximum target probability value in a probability distribution corresponding to each connecting edge (i, j), by which the set of at least one candidate operation corresponding to the connecting edge can be determinedTarget probability value corresponding to each candidate operation oNormalized to normalized value a i,j, then, the largest normalized value is processed to 1 by one-hot encoding (one_hot), the rest normalized values are processed to 0, and then, the target probability value corresponding to the normalized value corresponding to 1 is the largest.
And adding the determined target candidate operation to the corresponding position of each connecting edge in the initial network structure unit to obtain the network structure unit to be optimized of which the candidate operation is primarily searched.
In one embodiment, in step S250, the probability distribution is subjected to discrete distribution serialization processing, which may be the following formula of gummel-Softmax methodThe formed probability distribution discrete distribution is continuous, and a continuous processing result B i,j is obtained:
where τ is the temperature coefficient.
In one embodiment, step S250 performs gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result, and comprises directly performing gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result.
In one embodiment, step S250 performs gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result, and the method comprises the steps of adopting the current network to be optimized and a preset image processing network to process sample images respectively to obtain an image processing result, and performing gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result according to the image processing result and a network middle layer of the preset image processing network.
The preset image processing network is a pre-designed and trained image processing network. The search process directly using candidate operations for gradient optimization has instability, and the network performance obtained by searching is reduced when the number of search rounds becomes long, because there is imbalance in the inter-layer gradient in the current network to be optimized, and non-parametric candidate operations such as jump connection and the like can provide an additional path for gradient conduction, so that as the search is performed, meaningless jump connection in the network is more prone to be selected as the candidate operation between nodes. The gradient optimization is carried out through the combination of the preset image processing networks, so that the current network to be optimized learns the interlayer gradient distribution of the preset image processing network, the gradient distribution of the current network to be optimized is further smoothed, the stability of the searching process is improved, and meanwhile, the current network to be optimized can be subjected to information supervision from the preset image processing network, the performance of the target image processing network can be further improved, and the image processing effect of the target image processing network is improved.
In one embodiment, the image processing result comprises a first result corresponding to a current network to be optimized and a second result corresponding to a preset image processing network, the sample image is calibrated with a preset result, the structure parameters of candidate operations in the current network to be optimized are subjected to gradient optimization based on the continuous processing result according to the processing result and the network middle layer of the preset image processing network, the method comprises the steps of calculating first loss according to the first result and the preset result, calculating second loss according to the first result and the second result, calculating third loss according to the preset image processing network and the network middle layer corresponding to the same level in the current network to be optimized, and carrying out gradient optimization on the structure parameters of the candidate operations in the current network to be optimized based on the continuous processing result according to the first loss, the second loss and the third loss.
The first loss, the second loss and the third loss respectively form learning guide relative to a preset image processing network for the current network to be optimized from different angles, and the structure parameters of candidate operation in the current network to be optimized are subjected to gradient optimization based on the continuous processing result according to the first loss, the second loss and the third loss, in this way, the current network to be optimized can be used as a student network in a knowledge distillation way, the preset image processing network is used as a teacher network, so that the current network to be optimized further effectively learns the interlayer gradient distribution of the preset image processing network, further smoothes the gradient distribution of the current network to be optimized, further improves the stability of the searching process, and can further effectively receive information supervision from the preset image processing network, further improves the performance of the target image processing network and improves the network stability.
Referring to fig. 5, the preset image processing network is divided into 3 network blocks, each network block is a hierarchical network middle layer, the current network to be optimized includes 3 cell units, each cell unit corresponds to an initial network structure unit, and each cell unit is a hierarchical network middle layer. Third losses can be calculated for network blocks and cell units of the same hierarchyCalculate a second penalty for the second result output by the last network block (i.e., teacher output) and the first result output by the last cell unit (i.e., student output)Calculating a first loss for a predetermined result (i.e., a true label) of sample image calibration and a first result (i.e., student output) of last cell unit output
In one embodiment, the third loss is calculated according to network intermediate layers corresponding to the same level in a preset image processing network and a current network to be optimized, and the method comprises the steps of carrying out mean pooling operation on two network feature images corresponding to the network intermediate layers of the same level in the preset image processing network and the current network to be optimized to obtain two target feature images with uniform channel numbers, respectively converting the two target feature images into a first weighted feature image and a second weighted feature image, and calculating the third loss according to the first weighted feature image and the second weighted feature image.
Specifically, two network feature maps corresponding to the same hierarchical network middle layer (for example, each cell 1 and each network block 1) are subjected to a mean pooling operation, and the channel numbers of the two network feature maps are uniformly reduced to a smaller value between the two network feature maps through the mean pooling operationAnd obtaining two target feature graphs with uniform channel numbers.
The target feature map corresponding to the network feature map output by the cell unit of the ith level can be represented by F i and adopts a formulaThe target feature map F i may be converted to a first weighted feature mapI kd is the total number of cell units, j is the jth feature in the target feature map F i, and the network feature map output by the ith level network block can be usedRepresentation, using the formulaThe target feature map can be mappedAnd converting the first weighted feature map into a second weighted feature map A i, wherein the third loss obtained by calculation can effectively guide the learning of the current network to be optimized according to the first weighted feature map and the second weighted feature map.
In one embodiment, calculating the third loss from the first weighted feature map and the second weighted feature map includes calculating the first weighted feature map and the second weighted feature map using a mean square error loss function to obtain the third loss.
Specifically, the first weighted feature map and the second weighted feature map may be calculated based on the following mean square error loss function to obtain the third loss
Wherein, For the first weighted feature map, a i is the second weighted feature map.
In one embodiment, calculating the first loss based on the first result and the predetermined result includes calculating the first loss using a cross entropy loss function to obtain the first loss, and calculating the second loss based on the first result and the second result includes calculating the first result and the second result using a relative entropy loss function to obtain the second loss.
Specifically, the first result and the predetermined result may be calculated based on the following cross entropy loss function to obtain the first loss
Where N is the total number of sub-results (e.g., classification probabilities) in the first result and the predetermined result, y k is the kth sub-result in the predetermined result,Is the kth sub-result in the first result.
Specifically, the first result and the second result may be calculated based on the following relative entropy loss function (i.e., KL divergence loss function) to obtain the second loss
Wherein N is the total number of sub-results (e.g., classification probabilities) in the first result and the second result,Is the kth sub-result in the second result and p L, is the kth sub-result in the first result.
In one embodiment, the gradient optimization of the structure parameters of the candidate operation in the current network to be optimized is performed based on the continuous processing result, and the method comprises the steps of performing second-order approximate estimation processing on the network parameters in the current network to be optimized and the continuous processing result to obtain a second-order approximate estimation result, and performing alternating gradient optimization on the network parameters and the structure parameters in the current network to be optimized based on the second-order approximate estimation result.
And carrying out second-order approximate estimation processing on the network parameters in the current network to be optimized and the continuous processing result to obtain a second-order approximate estimation result, enabling the loss function to be micro to both the network parameters omega and the structural parameters alpha based on the second-order approximate result, and carrying out alternating gradient optimization on the network parameters and the structural parameters in the current network to be optimized based on the second-order approximate estimation result until the preset optimization condition is met, so that the target image processing network can be obtained.
For example, the goal of the gradient optimization corresponding search process is the following two-objective optimization problem:
However, the loss function under the target is not differentiable with respect to the structural parameter α, the first order approximation can directly approximate the current network parameter ω as ω * (α), and the second order approximation approximates the optimal network parameter to one-step down gradient network parameter After second-order approximation estimation, the loss function can be used for carrying out micro-scale on the network parameter omega and the structural parameter alpha, then the current network to be optimized is trained by adopting the alternating gradient optimization of training the one-step structural parameter and training the one-step network parameter until the preset optimization condition is met, and the target image processing network can be obtained. The method of alternating gradient optimization may be to fix the value of the structural parameter α matrix on the training set, then gradient down the value of the network parameter ω matrix, and then gradient down the value of the structural parameter α matrix on the verification set.
Fig. 6 schematically shows a flow chart of an image classification method according to an embodiment of the application. The execution subject of the image classification method may be any device, such as the server 101 or the terminal 102 shown in fig. 1.
As shown in fig. 6, the image classification method may include steps S310 to S320.
Step S310, obtaining an image to be classified, step S320, performing classification processing on the image to be classified by using a target image processing network for performing image classification to obtain a classification result corresponding to the image to be classified, wherein the target image processing network is generated by the image processing network generation method according to any one of the embodiments of the present application.
In this way, in some embodiments, the target image processing network for image classification has the advantages of low consumption of computing resources, high generation efficiency, low occupation of video memory and flexible deployment on devices with limited video memory in the search generation process, and the target image processing network for image classification can further reduce the resource consumption of image classification tasks as a whole and promote the deployment flexibility of image classification tasks.
The foregoing embodiments are further described below in connection with classifying an image to be classified in a scenario in which the image to be classified is classified by applying the foregoing embodiments of the present application.
In this scenario, classifying the image to be classified may include steps (1) to (2).
And (1) searching the convolutional neural network in the image classification task by using a network structure searching method.
The method comprises the steps of obtaining an initial network structure unit, wherein the initial network structure unit comprises at least one node representing a feature graph, connecting edges are arranged between the nodes, obtaining a candidate operation set, the candidate operation set comprises at least one candidate operation corresponding to each connecting edge and used for processing the feature graph, calculating probability distribution of each connecting edge in current forward propagation, which corresponds to the candidate operation, according to structural parameters of the candidate operation corresponding to each connecting edge, selecting only one candidate operation from the corresponding candidate operation according to the probability distribution corresponding to each connecting edge, adding the candidate operation to the initial network structure unit, obtaining a current network to be optimized, carrying out discrete distribution continuous processing on the probability distribution, and carrying out gradient optimization on the structural parameters of the candidate operation in the current network to be optimized based on a continuous processing result until the structural parameters meet preset optimization conditions, thus obtaining the target image processing network. The target image processing network is a convolutional neural network searched for image classification.
Further, generating probability distribution of selecting corresponding candidate operations by each connecting edge in current forward propagation according to the structure parameters of the candidate operations corresponding to each connecting edge comprises carrying out exponential operation on the structure parameters of each candidate operation corresponding to each connecting edge to obtain parameter operation results corresponding to each candidate operation, summing the parameter operation results corresponding to each candidate operation corresponding to each connecting edge to obtain summation results corresponding to the connecting edge, dividing the summation results of the parameter operation results corresponding to each candidate operation corresponding to the connecting edge by the summation results corresponding to the connecting edge to obtain probability of selecting each candidate operation by the corresponding connecting edge, summing the probability of selecting each candidate operation by the corresponding connecting edge with a random sampling value to obtain target probability value corresponding to each candidate operation, and obtaining probability distribution of selecting the corresponding candidate operation by each connecting edge in current forward propagation based on the target probability value corresponding to each candidate operation corresponding to each connecting edge.
Specifically, the probability distribution of selecting the corresponding candidate operation for each connection edge may be calculated based on the following gummel-Max method formula:
wherein, Is the structural parameter of each candidate operation o corresponding to the connecting edge (i, j); The probability of selecting each candidate operation o is represented by a connecting edge (i, j), i.e., the probability that each candidate operation o is selected by the corresponding connecting edge (i, j), i.e., the connecting edge between node i and node j. Candidate operation o belongs to a set of at least one candidate operation corresponding to the connection edge I.e. the parameter operation result corresponding to the candidate operation o; G is a random sampling value of Gumbel distribution, and candidate operations sampled in any forward propagation can be different due to randomness; i.e. the target probability value corresponding to candidate operation o, and the set of V, i.e. the connecting edges, selects the probability distribution of the corresponding candidate operation.
Further, according to the probability distribution corresponding to each connecting edge, only one candidate operation is selected from the corresponding candidate operations for each connecting edge and added to the initial network structure unit, wherein the method comprises the steps of determining the candidate operation corresponding to the maximum target probability value in the probability distribution corresponding to each connecting edge, and obtaining the target candidate operation corresponding to each connecting edge; and adding the target candidate operation corresponding to each connecting edge to the corresponding position of each connecting edge in the initial network structure unit.
Wherein, can be according to the formulaDetermining a maximum target probability value in a probability distribution corresponding to each connecting edge (i, j), by which the set of at least one candidate operation corresponding to the connecting edge can be determinedTarget probability value corresponding to each candidate operation oNormalized to normalized value a i,j, then, the largest normalized value is processed to 1 by one-hot encoding (one_hot), the rest normalized values are processed to 0, and then, the target probability value corresponding to the normalized value corresponding to 1 is the largest.
And adding the determined target candidate operation to the corresponding position of each connecting edge in the initial network structure unit to obtain the network structure unit to be optimized of which the candidate operation is primarily searched.
Further, the probability distribution is continuously processed in discrete distribution, which can be the probability distribution according to the following formula of Gumbel-Softmax methodThe formed probability distribution discrete distribution is continuous, and a continuous processing result B i,j is obtained:
where τ is the temperature coefficient.
Further, the gradient optimization of the structure parameters of the candidate operation in the current network to be optimized is performed based on the continuous processing result, which comprises the step of directly performing gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result.
Further, the gradient optimization of the structure parameters of the candidate operation in the current network to be optimized is carried out based on the continuous processing result, and the method comprises the steps of respectively processing sample images by the current network to be optimized and a preset image processing network to obtain an image processing result, and carrying out gradient optimization on the structure parameters of the candidate operation in the current network to be optimized based on the continuous processing result according to the image processing result and a network middle layer of the preset image processing network.
The preset image processing network is a pre-designed and trained image processing network. The search process directly using candidate operations for gradient optimization has instability, and the network performance obtained by searching is reduced when the number of search rounds becomes long, because there is imbalance in the inter-layer gradient in the current network to be optimized, and non-parametric candidate operations such as jump connection and the like can provide an additional path for gradient conduction, so that as the search is performed, meaningless jump connection in the network is more prone to be selected as the candidate operation between nodes. The gradient optimization is carried out through the combination of the preset image processing networks, so that the current network to be optimized learns the interlayer gradient distribution of the preset image processing network, the gradient distribution of the current network to be optimized is further smoothed, the stability of the searching process is improved, and meanwhile, the current network to be optimized can be subjected to information supervision from the preset image processing network, the performance of the target image processing network can be further improved, and the image processing effect of the target image processing network is improved.
The image processing result comprises a first result corresponding to the current network to be optimized and a second result corresponding to the preset image processing network, the sample image is calibrated with a preset result, the structure parameters of candidate operations in the current network to be optimized are subjected to gradient optimization based on the continuous processing result according to the processing result and the network middle layer of the preset image processing network, the method comprises the steps of calculating first loss according to the first result and the preset result, calculating second loss according to the first result and the second result, calculating third loss according to the preset image processing network and the network middle layer corresponding to the same level in the current network to be optimized, and carrying out gradient optimization on the structure parameters of the candidate operations in the current network to be optimized according to the first loss, the second loss and the third loss based on the continuous processing result.
The first loss, the second loss and the third loss respectively form learning guide relative to a preset image processing network for the current network to be optimized from different angles, and the structure parameters of candidate operation in the current network to be optimized are subjected to gradient optimization based on the continuous processing result according to the first loss, the second loss and the third loss, in this way, the current network to be optimized can be used as a student network in a knowledge distillation way, the preset image processing network is used as a teacher network, so that the current network to be optimized further effectively learns the interlayer gradient distribution of the preset image processing network, further smoothes the gradient distribution of the current network to be optimized, further improves the stability of the searching process, and can further effectively receive information supervision from the preset image processing network, further improves the performance of the target image processing network and improves the network stability.
Referring to fig. 5, the preset image processing network is divided into 3 network blocks, each network block is a hierarchical network middle layer, the current network to be optimized includes 3 cell units, each cell unit corresponds to an initial network structure unit, and each cell unit is a hierarchical network middle layer. Third losses can be calculated for network blocks and cell units of the same hierarchyCalculate a second penalty for the second result output by the last network block (i.e., teacher output) and the first result output by the last cell unit (i.e., student output)Calculating a first loss for a predetermined result (i.e., a true label) of sample image calibration and a first result (i.e., student output) of last cell unit output
Further, calculating third loss according to the network intermediate layer corresponding to the same level in the preset image processing network and the current network to be optimized comprises the steps of carrying out mean value pooling operation on two network feature images corresponding to the network intermediate layer of the same level in the preset image processing network and the current network to be optimized to obtain two target feature images with uniform channel number, respectively converting the two target feature images into a first weighted feature image and a second weighted feature image, and calculating third loss according to the first weighted feature image and the second weighted feature image.
Specifically, two network feature maps corresponding to the same hierarchical network middle layer (for example, each cell 1 and each network block 1) are subjected to a mean pooling operation, and the channel numbers of the two network feature maps are uniformly reduced to a smaller value between the two network feature maps through the mean pooling operationAnd obtaining two target feature graphs with uniform channel numbers.
The target feature map corresponding to the network feature map output by the cell unit of the ith level can be represented by F i and adopts a formulaThe target feature map F i may be converted to a first weighted feature mapI kd is the total number of cell units, j is the jth feature in the target feature map F i, and the network feature map output by the ith level network block can be usedRepresentation, using the formulaThe target feature map can be mappedAnd converting the first weighted feature map into a second weighted feature map A i, wherein the third loss obtained by calculation can effectively guide the learning of the current network to be optimized according to the first weighted feature map and the second weighted feature map.
Further, calculating a third loss according to the first weighted feature map and the second weighted feature map includes calculating the first weighted feature map and the second weighted feature map with a mean square error loss function to obtain the third loss.
Specifically, the first weighted feature map and the second weighted feature map may be calculated based on the following mean square error loss function to obtain the third loss
Wherein, For the first weighted feature map, a i is the second weighted feature map.
Further, calculating the first loss according to the first result and the preset result comprises calculating the first loss by adopting a cross entropy loss function to obtain the first loss, and calculating the second loss according to the first result and the second result comprises calculating the first result and the second result by adopting a relative entropy loss function to obtain the second loss.
Specifically, the first result and the predetermined result may be calculated based on the following cross entropy loss function to obtain the first loss
Where N is the total number of sub-results (e.g., classification probabilities) in the first result and the predetermined result, y k is the kth sub-result in the predetermined result,Is the kth sub-result in the first result.
Specifically, the first result and the second result may be calculated based on the following relative entropy loss function (i.e., KL divergence loss function) to obtain the second loss
Wherein N is the total number of sub-results (e.g., classification probabilities) in the first result and the second result,Is the kth sub-result in the second result and p L,k is the kth sub-result in the first result.
Further, the gradient optimization of the structure parameters of the candidate operation in the current network to be optimized is carried out based on the continuous processing result, and the gradient optimization method comprises the steps of carrying out second-order approximate estimation processing on the network parameters in the current network to be optimized and the continuous processing result to obtain a second-order approximate estimation result, and carrying out alternating gradient optimization on the network parameters and the structure parameters in the current network to be optimized based on the second-order approximate estimation result.
And carrying out second-order approximate estimation processing on the network parameters in the current network to be optimized and the continuous processing result to obtain a second-order approximate estimation result, enabling the loss function to be micro to both the network parameters omega and the structural parameters alpha based on the second-order approximate result, and carrying out alternating gradient optimization on the network parameters and the structural parameters in the current network to be optimized based on the second-order approximate estimation result until the preset optimization condition is met, so that the target image processing network can be obtained.
For example, the goal of the gradient optimization corresponding search process is the following two-objective optimization problem:
However, the loss function under the target is not differentiable with respect to the structural parameter α, the first order approximation can directly approximate the current network parameter ω as ω * (α), and the second order approximation approximates the optimal network parameter to one-step down gradient network parameter After second-order approximation estimation, the loss function can be used for carrying out micro-scale on the network parameter omega and the structural parameter alpha, then the current network to be optimized is trained by adopting the alternating gradient optimization of training the one-step structural parameter and training the one-step network parameter until the preset optimization condition is met, and the target image processing network can be obtained. The method of alternating gradient optimization may be to fix the value of the structural parameter α matrix on the training set, then gradient down the value of the network parameter ω matrix, and then gradient down the value of the structural parameter α matrix on the verification set.
And (2) classifying the images by using the searched convolutional neural network.
The method comprises the steps of obtaining images to be classified, and classifying the images to be classified by adopting a searched convolutional neural network to obtain classification results corresponding to the images to be classified.
In the scene, the method and the device for classifying the images by using the embodiment of the application have the advantages that the convolutional neural network searching process has low consumption of computing resources, high generation efficiency, low occupation of video memory and flexible deployment on the device with limited video memory, the convolutional neural network is used for classifying the images, the resource consumption of the image classification task can be further reduced on the whole, the deployment flexibility of the image classification task is improved, and the convolutional neural network for classifying the images has excellent stability and can further effectively improve the classification accuracy of the image classification on the whole.
In order to facilitate better implementation of the image processing network generation method provided by the embodiment of the application, the embodiment of the application also provides an image processing network generation device based on the image processing network generation method. Where the meaning of the terms is the same as in the above-described image processing network generation method, specific implementation details may be referred to in the description of the method embodiment. Fig. 7 shows a block diagram of an image processing network generating apparatus according to an embodiment of the present application.
As shown in fig. 7, the image processing network generating apparatus 400 may include a structural unit acquiring module 410, an operation set acquiring module 420, a probability calculating module 430, a single operation sampling module 440, and an optimizing module 450.
The system comprises a structure unit acquisition module, an operation set acquisition module, a probability calculation module and a single operation sampling module, wherein the initial network structure unit comprises at least one node representing a feature map, a connection edge is arranged between the nodes, the operation set acquisition module is used for acquiring a candidate operation set, the candidate operation set comprises at least one candidate operation corresponding to each connection edge and used for processing the feature map, the probability calculation module is used for generating probability distribution of each connection edge selected to correspond to the candidate operation in the current forward propagation according to the structure parameters of the candidate operation corresponding to each connection edge, the single operation sampling module is used for selecting only one candidate operation from the corresponding candidate operation to be added to the initial network structure unit according to the probability distribution corresponding to each connection edge, the current network to be optimized is obtained, and the optimization module is used for carrying out discrete distribution continuous processing on the probability distribution and carrying out gradient optimization on the structure parameters of the candidate operation in the current network to be optimized until the structure parameters meet preset optimization conditions and the target image processing network is obtained.
In some embodiments of the present application, the optimization module includes a sample input unit configured to process sample images respectively by using the current network to be optimized and a preset image processing network to obtain image processing results, and a network optimization unit configured to perform gradient optimization on structural parameters of candidate operations in the current network to be optimized based on a continuous processing result according to the image processing results and a network middle layer of the preset image processing network.
In some embodiments of the present application, the image processing result includes a first result corresponding to the current network to be optimized and a second result corresponding to the preset image processing network, the sample image is calibrated with a predetermined result, the network optimizing unit is configured to calculate a first loss according to the first result and the predetermined result, calculate a second loss according to the first result and the second result, calculate a third loss according to the preset image processing network and a network middle layer corresponding to the same level in the current network to be optimized, and perform gradient optimization on the structural parameters of candidate operations in the current network to be optimized based on the continuous processing result according to the first loss, the second loss and the third loss.
In some embodiments of the present application, the probability calculation module is configured to perform an exponential operation on a structural parameter of each candidate operation corresponding to each connection edge to obtain a parameter operation result corresponding to each candidate operation, sum the parameter operation results corresponding to each candidate operation corresponding to the connection edge to obtain a sum result corresponding to the connection edge, divide the sum of the parameter operation results corresponding to each candidate operation corresponding to the connection edge by the sum result corresponding to the connection edge to obtain a probability that each candidate operation is selected by the corresponding connection edge, sum the probability that each candidate operation is selected by the corresponding connection edge by a random sampling value to obtain a target probability value corresponding to each candidate operation, and obtain a probability distribution of each candidate operation selected by the corresponding connection edge in current forward propagation based on the target probability value corresponding to each candidate operation corresponding to each connection edge.
In some embodiments of the present application, the single operation sampling module is configured to determine a candidate operation corresponding to a maximum target probability value in a probability distribution corresponding to each connection edge, to obtain a target candidate operation corresponding to each connection edge, and add the target candidate operation corresponding to each connection edge to a position corresponding to each connection edge in the initial network structure unit.
In some embodiments of the present application, the network optimization unit is configured to perform a mean pooling operation on two network feature graphs corresponding to a network middle layer of the same hierarchy in the preset image processing network and the current network to be optimized to obtain two target feature graphs with uniform channel numbers, convert the two target feature graphs into a first weighted feature graph and a second weighted feature graph, and calculate the third loss according to the first weighted feature graph and the second weighted feature graph.
In some embodiments of the present application, the network optimization unit is configured to calculate the first weighted feature map and the second weighted feature map by using a mean square error loss function, so as to obtain the third loss.
In some embodiments of the present application, the network optimization unit is configured to perform calculation processing on the first result and the predetermined result by using a cross entropy loss function to obtain the first loss, and perform calculation processing on the first result and the second result by using a relative entropy loss function to obtain the second loss.
In some embodiments of the present application, the network optimization unit is configured to perform a second-order approximate estimation process on the network parameter in the current network to be optimized and the continuous processing result to obtain a second-order approximate estimation result, and perform an alternating gradient optimization on the network parameter and the structure parameter in the current network to be optimized based on the second-order approximate estimation result.
The embodiment of the application also provides an image classification device based on the image classification method. Where the meaning of nouns is the same as in the image classification method described above, specific implementation details may be referred to in the description of the method embodiments. Fig. 8 shows a block diagram of an image classification apparatus according to an embodiment of the application.
As shown in fig. 8, the image classification apparatus 500 may include an image acquisition module 510 and a classification module 520.
The image classification system comprises an image acquisition module, a classification module and a classification module, wherein the image acquisition module is used for acquiring an image to be classified, the classification module is used for classifying the image to be classified by adopting a target image processing network for performing image classification to obtain a classification result corresponding to the image to be classified, and the target image processing network is generated by the image processing network generation method according to any one embodiment of the application.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In addition, the embodiment of the present application further provides an electronic device, which may be a terminal or a server, as shown in fig. 9, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, specifically:
The electronic device may include one or more processing cores 'processors 601, one or more computer-readable storage media's memory 602, power supply 603, and input unit 604, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 9 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
Processor 601 is the control center of the electronic device and uses various interfaces and lines to connect the various parts of the overall computer device, and to perform various functions of the computer device and process data by running or executing software programs and/or modules stored in memory 602, and invoking data stored in memory 602. Alternatively, the processor 601 may include one or more processing cores, and preferably the processor 601 may integrate an application processor that primarily processes operating systems, user pages, applications, etc., and a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 may execute various functional applications and data processing by executing the software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the computer device, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 601.
The electronic device further comprises a power supply 603 for supplying power to the various components, preferably the power supply 603 may be logically connected to the processor 601 by a power management system, so that functions of managing charging, discharging, power consumption management and the like are achieved by the power management system. The power supply 603 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 604, which input unit 604 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 601 in the electronic device loads executable files corresponding to the processes of one or more computer programs into the memory 602 according to the following instructions, and the processor 601 executes the computer programs stored in the memory 602, so as to implement the functions of the foregoing embodiments of the present application.
The processor 601 may perform, for example, obtaining an initial network structure unit, where the initial network structure unit includes at least one node representing a feature map, and there is a connection edge between the nodes, obtaining a candidate operation set, where the candidate operation set includes at least one candidate operation corresponding to each connection edge, where the candidate operation is an operation for processing the feature map, calculating a probability distribution of selecting a corresponding candidate operation for each connection edge in current forward propagation according to a structure parameter of the candidate operation corresponding to each connection edge, selecting only one candidate operation from the corresponding candidate operations for each connection edge according to the probability distribution corresponding to each connection edge, adding the selected candidate operation to the network structure unit to obtain a current network to be optimized, performing discrete distribution serialization processing on the probability distribution, and performing gradient optimization on the structure parameter of the candidate operation in the current network to be optimized based on the serialization processing result until a predetermined optimization condition is met, thereby obtaining a target image processing network.
As another example, the processor 601 may perform obtaining an image to be classified, and performing classification processing on the image to be classified by using a target image processing network for performing image classification, to obtain a classification result corresponding to the image to be classified, where the target image processing network is generated by using the image processing network generating method according to any embodiment of the present application.
It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application also provide a computer readable storage medium having stored therein a computer program that can be loaded by a processor to perform the steps of any of the methods provided by the embodiments of the present application.
The computer readable storage medium may include, among others, read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disks, and the like.
Since the computer program stored in the computer readable storage medium may execute the steps of any one of the methods provided in the embodiments of the present application, the beneficial effects that can be achieved by the methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the application described above.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It will be understood that the application is not limited to the embodiments which have been described above and shown in the drawings, but that various modifications and changes can be made without departing from the scope thereof.