CN112686299A

CN112686299A - Method and device for acquiring neural network model executed by computer

Info

Publication number: CN112686299A
Application number: CN202011593032.9A
Authority: CN
Inventors: 张选杨
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-20

Abstract

The present disclosure provides a computer-implemented neural network model acquisition method, a model training method, a target classification method, a neural network model acquisition apparatus for a computer, a model training apparatus, a target classification apparatus, an electronic device, and a computer-readable storage medium, wherein the computer-implemented neural network model acquisition method includes: acquiring a training set, wherein the training set comprises a plurality of training data and classification labels corresponding to the training data; training a super network in an initial state based on the training set to obtain a trained super network; model search is carried out based on the trained super network to obtain a plurality of sub networks consisting of candidate nodes to form a candidate set; and determining a neural network model based on the variation degree of the parameters of the sub-networks in the candidate set. The method can be independent of the quality of the classification label of the training data, reduces the cost and can obtain a network architecture of a high-quality target classification model.

Description

Method and device for acquiring neural network model executed by computer

Technical Field

The present disclosure relates generally to the field of image processing, and more particularly to a computer-implemented neural network model acquisition method, a model training method, an object classification method, a neural network model acquisition apparatus for a computer, a model training apparatus, an object classification apparatus, an electronic device, and a computer-readable storage medium.

Background

At present, in some classification tasks, for example, extensive development and application of classification and identification on images cannot be separated from deep learning. Convolutional Neural Networks (CNN) Network frameworks used in picture classification are continuously developed from simple vgg (visual Geometry Group Network) models, to Residual Networks (resnets) to imagenets, and so on, but the manual design of Network frameworks is not only time-consuming but also prone to errors. For this reason, an automatic search neural network architecture is proposed. On one hand, the automatic Neural Network Architecture Search (NAS) can traverse the Architecture to find the Architecture with the best performance, and on the other hand, the automatic Neural network Architecture Search can also break the limitation of human thinking to find the Architecture organization mode which is not thought by human beings. People are increasingly more familiar with the fact that the quantity and time consumption of Graphics Processing Units (GPUs) used by neural architecture search are reduced by continuously improving search space, search strategies and evaluation strategies.

As described above, parameters of the network model can be adjusted through training, and the architecture of the network model can be realized through neural network architecture search, so as to find a better neural network architecture.

In the neural network architecture search process, a large amount of training data is also needed, and the training data has classification labels. The cost of a large amount of training data is very high, manual labeling is time-consuming and labor-consuming, and the accuracy is difficult to guarantee.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, a first aspect of the present disclosure provides a computer-implemented neural network model obtaining method, wherein the neural network model obtaining method includes: acquiring a training set, wherein the training set comprises a plurality of training data and classification labels corresponding to the training data; training a super network in an initial state based on the training set to obtain a trained super network, wherein the super network comprises a plurality of candidate nodes; model search is carried out based on the trained super network to obtain a plurality of sub networks consisting of the candidate nodes to form a candidate set; and determining a neural network model based on the variation degree of the parameters of the sub-networks in the candidate set, wherein the variation degree of the parameters of the sub-networks in the candidate set comprises the variation degree of the parameters of the sub-networks after training compared with the variation degree before training.

In one embodiment, the degree of change in the parameters of the sub-network is determined by: determining a first vector based on parameters of candidate nodes constituting the sub-network before training; determining a second vector based on the trained parameters of the candidate nodes forming the sub-network; determining a degree of change in a parameter of the sub-network based on a distance between the first vector and the second vector.

In an embodiment, the distance between the first vector and the second vector comprises any one of: cosine distance, euclidean distance, or manhattan distance.

In an embodiment, the determining a neural network model based on a degree of variation of a parameter of a subnet in the candidate set includes: the method comprises the following steps that firstly, a plurality of sub-networks with the maximum change degree of parameters in a candidate set and/or a plurality of sub-networks with the change degree of the parameters in the candidate set larger than a change threshold are determined as a plurality of candidate networks; a second step of generating one or more derived networks based on the plurality of candidate networks; a third step of forming a new candidate set by using all the candidate networks and all the derivative networks as sub-networks in the new candidate set; repeatedly executing the first step, the second step and the third step to a preset number of times; and according to the formed sub-network with the maximum change degree of the parameters in all the candidate sets, using the sub-network as the neural network model.

In one embodiment, the generating one or more derivative networks based on the plurality of candidate networks includes: and recombining the nodes based on any two or more than two candidate networks to obtain the derivative network.

In one embodiment, the generating one or more derivative networks based on the plurality of candidate networks further comprises: and replacing the corresponding nodes of the candidate network according to any one or more candidate nodes in the trained super network based on any one of the candidate networks to obtain a derivative network.

In one embodiment, the obtaining the training set includes: acquiring the plurality of training data; and randomly generating a classification label corresponding to each training data based on the category.

In an embodiment, the training of the initial state of the super network based on the training set to obtain a trained super network includes: constructing the initial state super network, wherein the initial state super network comprises a plurality of candidate nodes; determining a sub-network to be trained based on the candidate nodes; and adjusting the parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

In one embodiment, the following steps are repeatedly executed to a specified number of times to obtain the trained hyper-network: the determining the sub-network to be trained based on the candidate node; and adjusting the parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

In an embodiment, adjusting parameters of the sub-network to be trained based on the training set to obtain the trained sub-network includes: determining batch data based on the training set, wherein the batch data comprises all or part of the training data in the training set; inputting the training data into the sub-network to be trained to obtain a prediction classification; and calculating loss according to the prediction classification and the classification label corresponding to the training data, and adjusting the parameters of the sub-network to be trained to obtain the trained sub-network.

In an embodiment, the determining a sub-network to be trained based on the candidate node includes: randomly selecting a plurality of candidate nodes as the nodes of the sub-network to be trained; all nodes of the sub-network to be trained are connected in a unidirectional manner.

In one embodiment, the neural network model is used for classifying and identifying images; the training data is image data.

A second aspect of the present disclosure provides a model training method, wherein the model training method includes: acquiring a neural network model to be trained, wherein the neural network model is obtained by the computer-implemented neural network model acquisition method according to the first aspect; training the neural network model based on a classification data set, wherein the classification data set comprises classification data and real classification labels corresponding to the classification data, and the real classification labels are consistent with target classes in the corresponding classification data.

A third aspect of the present disclosure provides an object classification method, wherein the object classification method includes: acquiring an image to be classified; obtaining the category information of the image to be classified through a neural network model, wherein the neural network model is obtained through the model training method according to the second aspect.

A fourth aspect of the present disclosure provides a neural network model acquisition apparatus for a computer, wherein the neural network model acquisition apparatus includes: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of training data and classification labels corresponding to the training data; the training system comprises a super network training module, a data processing module and a data processing module, wherein the super network training module is used for training a super network in an initial state to obtain a trained super network, and the super network comprises a plurality of candidate nodes; the searching module is used for carrying out model searching based on the trained super network to obtain a plurality of sub networks consisting of the candidate nodes and form a candidate set; the determining module is used for determining a neural network model based on the variation degree of the parameters of the sub-networks in the candidate set; wherein the degree of change of the parameters of the sub-network is determined by: determining a first vector based on parameters of candidate nodes constituting the sub-network before training; determining a second vector based on the trained parameters of the candidate nodes forming the sub-network; determining a degree of change in a parameter of the sub-network based on a distance between the first vector and the second vector. .

A fifth aspect of the present disclosure provides a model training apparatus, wherein the model training apparatus includes: a second obtaining module, configured to obtain a neural network model to be trained, where the neural network model is obtained by using the neural network model obtaining method according to the first aspect; and the second training module is used for training the neural network model based on a classification data set, wherein the classification data set comprises classification data and real classification labels corresponding to the classification data, and the real classification labels are consistent with target classes in the corresponding classification data.

A sixth aspect of the present disclosure provides an object classification apparatus, wherein the object classification apparatus includes: the third acquisition module is used for acquiring the image to be classified; and the classification module is used for obtaining the class information of the image to be classified through a neural network model, wherein the neural network model is obtained through the model training method in the second aspect.

A seventh aspect of the present disclosure provides an electronic device, comprising: a memory to store instructions; and a processor for invoking the memory-stored instructions to perform the computer-implemented neural network model acquisition method of the first aspect, or the model training method of the second aspect, or the target classification method of the third aspect.

An eighth aspect of the present disclosure provides a computer-readable storage medium in which instructions are stored, which when executed by a processor, perform the neural network model acquisition method according to the first aspect, or the model training method according to the second aspect, or the target classification method according to the third aspect.

The computer-implemented neural network model acquisition method, the model training method, the target classification method, the neural network model acquisition device for the computer, the model training device, the target classification device, the electronic device, and the computer-readable storage medium provided by the present disclosure can obtain a network architecture of a neural network model of higher quality without depending on the quality of classification labels of training data, reducing cost.

Drawings

The above and other objects, features and advantages of the embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 shows a flow diagram of a computer-implemented neural network model acquisition method according to an embodiment of the present disclosure.

FIG. 2 shows a flow diagram of a computer-implemented neural network model acquisition method, according to another embodiment of the present disclosure.

FIG. 3 shows a flow diagram of a computer-implemented neural network model acquisition method, according to another embodiment of the present disclosure.

FIG. 4 shows a flow diagram of a computer-implemented neural network model acquisition method, according to another embodiment of the present disclosure.

FIG. 5 shows a flow diagram of a model training method according to an embodiment of the present disclosure.

FIG. 6 shows a flow diagram of a target classification method according to an embodiment of the present disclosure.

Fig. 7 shows a schematic diagram of a neural network model acquisition device for a computer according to an embodiment of the present disclosure.

FIG. 8 shows a schematic diagram of a model training apparatus according to an embodiment of the present disclosure.

FIG. 9 shows a schematic diagram of an object classification device according to an embodiment of the present disclosure.

Fig. 10 is a schematic diagram of an electronic device provided in an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way.

It should be noted that, although the expressions "first", "second", etc. are used herein to describe different modules, steps, data, etc. of the embodiments of the present disclosure, the expressions "first", "second", etc. are merely used to distinguish between different modules, steps, data, etc. and do not indicate a particular order or degree of importance. Indeed, the terms "first," "second," and the like are fully interchangeable.

Currently, the NAS algorithm can be expressed as two related optimization problems: optimizing model parameters and searching a model structure. Mathematically, a unified expression can be defined as:

where a is a candidate structure with weight Wa sampled from the search space (search space) a, and L represents the objective function. Score represents the performance evaluation of the candidate structure. According to different optimization objectives in equation (1), score can be instantiated using different performance evaluation metrics. For example, the prediction accuracy on the verification set is used as Score in the nested NAS or the one-shot NAS. In the gradient-based NAS, a negative objective function is cited as Score, in the unas (unsupervieved Neural Architecture search), the prediction accuracy of the helper task on the verification set is taken as Score, etc., because both cases combine the prediction performance in the model evaluation, we collectively refer to these NAS paradigms represented by equation (1) and equation (2) as performance-based NAS.

In the neural network architecture search process, a large amount of training data is also needed, and the training data has classification labels. As described in equations (1) and (2), the class labels have two effects on the NAS optimization problem: firstly, the label is used as a supervision signal to guide the weight optimization to reach the optimum; second, the label is taken as the basis for the prediction results involved in the model evaluation Score.

However, it is difficult for the performance-based NAS to obtain the expected weight W and the candidate architecture a because the shared training budget of the candidate architecture is insufficient. For example, for nested NAS, hundreds of candidate architectures are trained starting only from zero, and the controller improves the search strategy using the validation accuracy of the candidate architectures as feedback. In practice, it is difficult for the controller to cover the entire search space with a limited number of pre-training architectures. For darts (differentiated Architecture search) family work, after discretizing the structural parameters, the operation associated with the largest architectural parameter will typically not result in the highest validation accuracy. Even simple random search methods can outperform the original DARTS. For One-Shot NAS, only a small number of network architectures are trained in the hyper-net training phase, and other architectures in the search space are never oversampled.

As described above, under the current recognition, in order to obtain an optimal neural network architecture in the NAS process, a large amount of training data labeled with classification labels is required. The cost is high, the time is long, the data volume is small, and the finally obtained network architecture is poor in quality.

In order to solve the above problem, the embodiment of the present disclosure provides a computer-implemented neural network model obtaining method 10, which is used for searching a good-quality network architecture through computer implementation and can be used as a target classification model. As shown in FIG. 1, steps S11-S14 may be included, which are described in detail below:

step S11, a training set is obtained, where the training set includes a plurality of training data and classification labels corresponding to the training data.

In the embodiment of the present disclosure, the obtained training set may be obtained through a network, or may be obtained locally. The training set may include a plurality of training data and classification labels corresponding to the training data. In some embodiments, the neural network model may be for performing classification recognition on the image, and the training data is image data. For example, in an image classification task, a neural network model for image classification is obtained by the computer-implemented neural network model obtaining method 10, then the training data may be an image, and the classification labels may be categories of objects in the image, such as people, vehicles, trees, and the like.

It should be particularly noted that, in the embodiment of the present disclosure, the classification label corresponding to the training data refers to that there is one classification label for each piece of training data, and the class represented by the representative classification label does not correspond to the actual class in the training data. In other words, the classification label corresponding to each training data may be the same as the class of the training data or may be different from the class of the training data. For example: in the image classification, the category of the first image is a person, the category of the classification label corresponding to the first image is a vehicle, the category of the second image is a tree, and the classification label corresponding to the second image is a tree. For another example, the classification labels may only identify the class differences, such as the class of the first image being a person, the class of the classification label corresponding to the first image being a class a, the class of the second image being a tree, and the class of the classification label corresponding to the second image being a class B.

In the embodiment of the present disclosure, at the stage of searching the network architecture (rather than the stage of training the network architecture), whether the classification label is the same as the actual class of the training data or not does not affect the result. The method for evaluating the network architecture is not the accuracy of the network architecture for specific classification recognition, but judges whether the network architecture is easy to train or whether a more excellent classification recognition model can be obtained after the network architecture is subjected to classification training.

The number of the types of the classification labels can be the same as the number of the types which need to be classified actually, for example, the types of three types of people, vehicles and trees need to be identified actually, and three types of the types can be set correspondingly, so that the neural network model which can be found can be classified in a corresponding number.

The specific manner in which the network architecture is searched and evaluated is described in more detail below.

And step S12, training the super network in the initial state based on the training set to obtain the trained super network, wherein the super network comprises a plurality of candidate nodes.

Super networks (Super nets), also known as Super networks, Super networks. The super network is not a specific network architecture but comprises a plurality of candidate node sets. By training the initial state of the super network, parameters of each candidate node in the super network can be adjusted. The trained super-network also comprises the candidate nodes, except that the parameters of the candidate nodes are adjusted.

In some embodiments, as shown in fig. 2, step S12 may further include: step S121 to step S123. Specifically, step S121 constructs a super network in an initial state, wherein the super network in the initial state includes a plurality of candidate nodes. The super network can be constructed by different Search spaces (Search Space). Different candidate nodes may be of the same type with different superparameters, e.g., two convolution layers, but where the convolution kernels are of different sizes. The different candidate nodes may also be different types of nodes, e.g., a pooling layer, a convolutional layer. In a word, in the constructed hyper-network, candidate nodes with various types and various hyper-parameters are provided, and the types and the hyper-parameters can be enriched as much as possible, so that various forms of network architectures can be combined and realized, and basic materials are provided for subsequent excellent network architectures.

And step S122, determining the sub-network to be trained based on the candidate nodes. Based on the plurality of candidate nodes, a plurality of subnetworks may be determined, and the subnetwork just determined is untrained. Each sub-network also has corresponding parameters based on the parameters of each candidate node. The sub-network is determined, the candidate nodes in the super-network can be randomly selected, the selection can be performed based on a certain basic rule, and a network structure which breaks through conventional cognition of people can be realized.

In an embodiment of the present disclosure, step S122 may include: randomly selecting a plurality of candidate nodes as nodes of a sub-network to be trained; all nodes of the sub-network to be trained are connected unidirectionally. Through random selection and connection, a plurality of sub-networks can be conveniently obtained, excessive manual participation is avoided, on one hand, the labor cost is reduced, on the other hand, the inherent influence of the experience of personnel on the network architecture setting can be avoided, more new network architectures which are possibly different from the conventional concept are formed, and materials are provided for searching excellent network architectures.

In the process of random selection, the super network may be represented as a Directed Acyclic Graph (DAG), represented as a (O, E), where O is a set of candidate nodes and E is a set of connections between two element nodes (each connection instantiated as a candidate operation). The candidate architecture, i.e., the subnetwork, is uniformly randomly sampled from the supernet and denoted as a (E ^, E). The feature node O of the candidate structure is the same as the super-net, and the subset of part of the feature nodes is marked as E ^. The subnetwork model can be represented using a weight vector V (a, W) and constructed by concatenating the model parameters of all possible paths O _ in to O _ out.

And step S123, adjusting parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

In the embodiment of the disclosure, a sub-network constructed from candidate nodes of a super-network is trained, the sub-network obtains predicted classification information of training data by inputting the training data, and then is supervised based on a classification label corresponding to the training data, so as to adjust parameters of the sub-network, and finally obtain the trained sub-network.

Here, again, in step S123, the classification label in the training set for training the sub-network may be the same as or different from the corresponding training data actual classification. In an embodiment of the present disclosure, step S123 may include: determining batch data based on the training set, wherein the batch data comprises all or part of the training data in the training set; inputting the training data into a sub-network to obtain a prediction classification; and calculating loss according to the prediction classification and the classification labels corresponding to the training data, and adjusting the parameters of the sub-network to be trained to obtain the trained sub-network. In this embodiment, the data in the training set may be set into batches (batch), and a sub-network may be trained only through one batch of training data, so that the calculation amount of training is reduced, and the calculation cost is reduced. By means of a loss function, parameters of the sub-network, such as weight values of the nodes, are updated, for example by means of a cross-entropy objective function.

The training mode can adopt a One-time (One Shot) training mode, namely, corresponding training data is input into the sub-network only once, namely, parameters of the sub-network are adjusted once, and repeated iterative training is not needed. On one hand, the calculation cost can be saved, and the training process at the stage can be completed more quickly. On the other hand, since the classification labels may not correspond to the corresponding actual classes of training data, multiple training is not required so that the data is sufficiently converged. Here, the parameters of the sub-network may be adjusted once through a training mode, and according to the adjustment and through a mode described in detail later, whether the network architecture of the sub-network is excellent or not is evaluated. Therefore, the label marking cost and the training cost are saved.

In an embodiment of the present disclosure, the steps S122 and S123 may be repeatedly executed to a specified number of times, so as to obtain a trained hyper-network. The sub-networks are continuously and randomly established, and the established sub-networks are trained through a training set, so that the parameters of the candidate nodes in the super-network are adjusted. And continuously iterating the step S122 and the step S123 for training, wherein the candidate nodes may be randomly trained in different sub-networks, parameters are continuously adjusted, and after the specified iteration times are reached, a trained super-network is obtained. By the method, the training effect of the candidate nodes in the super network can be guaranteed. In addition, because the real classification label is not adopted in the present disclosure, but a random label can be adopted, the prediction value of the model may not be converged, and the training can be completed at the specified times by adopting the method of the present embodiment.

And step S13, performing model search based on the trained super network to obtain a plurality of sub networks consisting of candidate nodes to form a candidate set. In the embodiment of the disclosure, in the training completed hyper-network, a plurality of candidate nodes are searched to obtain a plurality of subnetworks, a candidate set for determining the neural network model later is formed, the candidate set includes the subnetworks obtained by the search, and in the subsequent step, the subnetworks in the candidate set can be updated.

And step S14, determining a neural network model based on the variation degree of the parameters of the sub-networks in the candidate set.

In the embodiment of the present disclosure, in the candidate set, whether the sub-networks are excellent or not is evaluated according to the degree of variation of the parameter of each sub-network, so that a neural network model suitable for performing an actual target classification task is searched. The parameters referred to herein may be parameters that are adjusted by the sub-network during the training process based on the supervision of the class labels, and do not include the hyper-parameters of the network architecture. For example, the parameter may be a weight, or the like.

Here, the evaluation of whether a sub-network is excellent is, in principle, to determine whether the network architecture of the sub-network is excellent, that is, to select a sub-network with the optimal network architecture from a plurality of sub-networks randomly assembled from the super-network. As mentioned above, the quality of a model depends on the network architecture of the model, i.e. the setting and connection relationship of each link; on the other hand, on the effect of the training to adjust the parameters of the model. In the computer-implemented neural network model obtaining method 10 according to the embodiment of the present disclosure, the problem of the first aspect is solved. In the process of searching for an excellent network architecture, the actual accuracy of the network architecture on the target classification may not be concerned, because the classification accuracy on the actual target may be optimized through the subsequent actual training on the model.

The computer-implemented neural network model obtaining method 10 according to the embodiment of the present disclosure performs the determination based on the degree of change of the parameters of the sub-network before and after the training, and specifically, the larger the degree of change of the parameters of the sub-network after the training is compared with the parameters before the training, the more significant the training effect of the sub-network is, the more easily the sub-network can be trained, and the more suitable the task type (for example, image) of the training data is. Further, the sub-network can be shown to be capable of converging more quickly in the later training process with real classification labels, so that a more accurate and reliable classification model is obtained. In some embodiments, the computer-implemented neural network model obtaining method 10 may use one or more sub-networks with the largest parameter variation before and after training as the determined neural network model, thereby saving a great deal of computation cost; and the cost for carrying out classification labeling on the training data or the cost for acquiring a large amount of training data is saved.

In an embodiment of the present disclosure, the degree of change of the parameters of the sub-network can be determined by: determining a first vector based on parameters of candidate nodes constituting the sub-network before training; determining a second vector based on the trained parameters of the candidate nodes composing the sub-network; based on the distance between the first vector and the second vector, a degree of change in a parameter of the subnetwork is determined.

In the embodiment, the parameter change degree before and after the sub-network training can be conveniently and accurately determined in a vector mode. The larger the distance between the parameter vectors before and after training, such as cosine distance, euclidean distance, manhattan distance, etc., the larger the degree of change between the vectors can be said to be. The cosine distance may also be converted to an angle.

As in the previous example, the subnetwork model may be represented using weight vectors V (a, W), and the weights before and after training may be respectively identified as W₀And W_tThe vectors before and after training can be expressed as V (a, W), respectively₀) And V (a, W)_t). Then, the cosine distance scaling angle between the parameter vectors before and after training can be expressed as:

where angle (a) represents the angle of the parameter vector for sub-network a, before and after training. The larger the value of angle (a), the larger the degree of parameter change of the sub-network a before and after training, and further, the more easily the sub-network a is trained in the classification task corresponding to the training data.

In an embodiment of the present disclosure, as shown in fig. 3, the step S14 of the computer-implemented neural network model obtaining method 10 may include: step S141, a first step of determining a plurality of sub-networks with the maximum variation degree of the parameters in the candidate set and/or a plurality of sub-networks with the variation degree of the parameters in the candidate set larger than a variation threshold as a plurality of candidate networks; step S142, a second step of generating one or more derivative networks based on the plurality of candidate networks; step S143, a third step, forming a new candidate set by all the candidate networks and all the derivative networks, and using the new candidate set as a sub-network in the new candidate set; step S144, repeatedly executing the step S141 to the step S143 for a preset number of times; step S145 is to use the sub-network with the largest variation degree of the parameters in all the formed candidate sets as the neural network model.

In this embodiment, a plurality of subnetworks may be selected as candidates in step S141, one or more network architectures may be further expanded based on the candidate network in step S142, and a final neural network model may be determined according to the variation degree of the parameters based on the candidate network and the expanded derivative network.

Specifically, in step S141, a plurality of subnets may be determined as candidates according to the degree of change in the parameters of the subnets before and after training. For example, a number N may be preset, and the N sub-networks with the largest parameter variation degree are the candidate networks. For another example, a change threshold of the parameter change degree may be preset, the sub-network whose parameter change degree exceeds the threshold is the candidate network, in an example of expressing the parameter change degree based on the value of the vector Angle, an Angle threshold, for example, 90 ° or 120 ° or 60 °, may be set, and the sub-network exceeding the threshold is the candidate network.

In step S142, after the candidate network is selected, the derived network may be formed by expanding according to the candidate network. And a plurality of candidate networks determined according to the parameter change degree represent network architectures or nodes in the candidate networks and are suitable for the current classification task. In some cases, all possible network architectures cannot be exhaustively enumerated from the super network in order to save computational cost or be limited in computational load. Therefore, in this embodiment, the candidate networks selected as excellent candidates are further expanded, and the candidate networks may be subjected to changes such as local replacement, addition and deletion, and the changes may be random or based on rules. The derived network retains most of the architecture and node settings of the candidate network, but has certain changes compared with the candidate network, thereby possibly forming a better network architecture. And after the changed derivative networks are obtained, updating the candidate set, and selecting an optimal plurality of networks to change again according to the parameter change degree. By iterating a preset number of times, a plurality of candidate sets, respectively having a plurality of subnetworks, can be generated by iteration. And then, according to the method of step S145, that is, according to the network with the largest parameter variation degree of all the sub-networks in all the formed candidate sets, the neural network model is obtained as the final result. Wherein, the parameter variation degree of the candidate network is the parameter variation degree before and after the sub-network training. The degree of change in the parameters of the derivative network is determined based on the parameters of the nodes of the derivative network, the parameters in the super network, and the parameters after training in step S12. In particular, the derived network is changed according to the candidate network, i.e. the sub-network, in such a way that only some of the nodes are replaced, added or deleted, etc., but the parameters of the nodes themselves are not changed. In other words, each node in the derived network is also trained as a candidate node in the super network. That is, the parameter variation degree of the derivative network is also a parameter for comparing the derivative network before and after training, wherein the parameter before training of the derivative network is an original parameter of each node of the derivative network when not trained, that is, a parameter of an original candidate node in the super network. The parameters after the training of the derivative network are the parameters after each node of the derivative network trains the sub-network through step S12. Each node of the derivative network may not belong to the same sub-network, and when randomly sampling candidate nodes from the super-network to form the sub-network, the candidate nodes may belong to different sub-networks, and after training the sub-networks, the different candidate nodes may be trained in different sub-networks and parameters are adjusted. Therefore, the candidate nodes in each super network are trained, and parameters before and after the training are changed to a certain extent. When calculating the parameter variation degree of the diffraction network, the parameter before training and the parameter after training of each node in the derivative network can be compared, or the Angle value can be compared.

By means of the embodiment, the screened high-quality candidate network can be further expanded, and a higher-quality network architecture can be searched under the condition that the calculation amount of the initial sampling sub-network is saved.

In an embodiment of the present disclosure, step S143 may include: and recombining the nodes based on any two or more than two candidate networks to obtain the derivative network. The method of this embodiment may be referred to as cross-over replacement, that is, a new network, that is, a derivative network, may be formed by performing cross-over replacement between two or more candidate networks. In this way, more network architectures which are possibly better can be combined in the excellent candidate networks. The derived network data generated by this example may be set, for example, twenty-five times cross-over on a ten candidate network basis.

For example, two candidate networks are randomly selected, and each node in the derived network is formed by randomly sampling or weighting the corresponding nodes in the two candidate networks. To illustrate by way of example, ten candidate networks are obtained through screening, and two of the ten candidate networks are randomly selected to generate a new network architecture (i.e., an extended network), wherein the operation of each layer (i.e., node) in the new network architecture is obtained from the operation of the corresponding layer in the two candidate networks with a medium probability (i.e., 1/2) sampling. The above example may also be changed to some extent, equal probabilities may not be adopted in the sampling, or the sampling probabilities may be set according to values of parameter change degrees of the two candidate networks, and the operation of each layer in the new network architecture is obtained by sampling from the operation of the corresponding layer in the two candidate networks according to the sampling probabilities. For example, two candidate networks are selected as a first candidate network and a second candidate network, the Angle value of the first candidate network is 120 °, the Angle value of the second candidate network is 60 °, and the sampling probability of the first candidate network and the second candidate network can be 2:1, and in the process of forming the derivative network, the operation of each layer is sampled according to the probability of 2:1 from the operation of the corresponding layer in the two candidate networks. Such a sampling manner can enable more nodes in more excellent candidate networks to be reserved in the extended network.

In an embodiment of the present disclosure, step S143 may include: and replacing the corresponding nodes of the candidate network according to any one or more candidate nodes in the trained super network based on any candidate network to obtain the derivative network. The method of this embodiment may be referred to as mutation, that is, some nodes in the network may be replaced on the basis of any candidate network. In the former embodiment, it can be regarded as that the nodes in one candidate network are sampled and replaced from the nodes in other candidate networks to form a new network architecture. However, the present embodiment may be considered as sampling and replacing the nodes in one candidate network from all the candidate nodes in the super network. In this embodiment, the candidate node sampled from the super network is also a trained candidate node, and the candidate node may be obtained after training in other sub networks. In this way, the possible network architecture can be further expanded on the basis of excellent candidate networks, and the search range is increased. The derived network data generated by this example may be set, for example, twenty-five mutations on a ten candidate network basis.

In a manner that illustrates the present embodiment in a particular example, one candidate network is randomly sampled from the ten candidate networks, and then each layer of operation of the network architecture of the candidate network is traversed. For each layer of operations, an operation is uniformly randomly sampled from the candidate nodes in the search space with a probability P, e.g., 0.1, to replace the original operation of the layer, and the original operation of the layer is retained with a probability (1-P). Because all candidate nodes are replaced, and some candidate nodes are not included in the relatively excellent candidate network, the relatively low probability P is set, so that the excessive candidate nodes can be prevented from being replaced, small variation is performed on the basis of reserving more candidate network contents, and the possibility of forming a more excellent network architecture is improved.

In an embodiment of the present disclosure, as shown in fig. 4, step S11 may include: step S111, acquiring a plurality of training data; and step S112, randomly generating a classification label corresponding to each training data based on the category. The disclosed embodiments determine a more excellent network architecture according to the degree of variation of the parameters, and thus the class represented by the class label for the training data may be different from that of the training data actually. The training set needs to ensure that the training data matches the actual task, for example, to obtain a target classification model for image classification, the training data in the training set needs to be images. Meanwhile, the training data needs to have classification labels corresponding to each other, but the content of the classification labels may be different from the actual classification of the training data. For example: in the image classification, the category of the first image is a person, the category of the classification label corresponding to the first image is a vehicle, the category of the second image is a tree, and the classification label corresponding to the second image is a tree. For another example, the classification labels may only identify the class differences, such as the class of the first image being a person, the class of the classification label corresponding to the first image being a class a, the class of the second image being a tree, and the class of the classification label corresponding to the second image being a class B.

In this embodiment, the acquired training data may be data without classification labels, such as only images. By means of the embodiment, the classification labels can be randomly generated for the training data. In step S112, the number of categories of the classification tags is determined based on the category, i.e., according to the number of categories actually required. For example, in a task that needs to classify three categories of people, vehicles, and trees in an image, three categories of classification tags may be set, which may be defined as people, vehicles, and trees, or may be defined as only a classification tag: class A, class B and class C. Further, a classification label is generated for each training data. By the method, a large amount of data can be conveniently acquired at low cost, the data of the existing label does not need to be acquired, and the data without the label does not need to be marked. In addition, by the method, the relative average of the number of different classes of the classification labels can be ensured, and the condition that the number of some classes in the training data of the obtained classification labels is obviously more than that of other classes is avoided.

Based on the same inventive concept, the present disclosure also provides a model training method 20, as shown in fig. 5, the model training method 20 may include: step S21, obtaining a neural network model to be trained, wherein the neural network model is obtained by the neural network model obtaining method 10 executed by the computer according to any of the foregoing embodiments; and step S22, training the neural network model based on the classification data set, wherein the classification data set comprises classification data and real classification labels corresponding to the classification data, and the real classification labels are consistent with target classes in the corresponding classification data. In the embodiment of the present disclosure, the neural network model obtaining method 10 executed by a computer can search for an excellent neural network model, and further training of the neural network model is required to enable the neural network model to perform a target classification task. The training process is different from the training process of the sub-network in the neural network model obtaining method 10 executed by the computer, and in the model training method 20, the classification accuracy of the neural network model needs to be improved, so that a classification data set with real classification labels needs to be adopted to train the neural network model. That is, in training the neural network model, the real classification labels used are consistent with the actual classes of the classification data. Since the network architecture obtained by the neural network model acquisition method 10 executed by the computer is excellent and easy to train, the training cost of the neural network model by the model training method 20 is lower, the training effect is better, the training can be converged faster, and the classification accuracy of the obtained neural network model is higher.

Based on the same inventive concept, an embodiment of the present disclosure further provides an object classification method 30, as shown in fig. 6, the object classification method 30 may include: step S31, acquiring an image to be classified; step S32, obtaining the class information of the image to be classified through a neural network model, wherein the neural network model is obtained through the training of the model training method 20 according to any of the foregoing embodiments. In this embodiment, based on the neural network model obtained by training in the model training method 20, the images are classified and identified, and the categories of the images can be accurately detected.

Based on the same inventive concept, the present disclosure also provides a neural network model acquisition apparatus 100 for a computer, as shown in fig. 7, the neural network model acquisition apparatus 100 for a computer may include: a first obtaining module 110, configured to obtain a training set, where the training set includes a plurality of training data and classification labels corresponding to the training data; a super network training module 120, configured to train a super network in an initial state based on the training set, to obtain a trained super network, where the super network includes a plurality of candidate nodes; the searching module 130 is configured to perform model searching based on the trained super network to obtain a plurality of sub networks composed of candidate nodes, and form a candidate set; a determining module 140, configured to determine a neural network model based on a degree of change of a parameter of the sub-network in the candidate set, where the degree of change of the parameter of the sub-network in the candidate set includes a degree of change of the parameter of the sub-network after training compared with a degree of change of the parameter before training.

In one embodiment, the degree of change in the parameters of the sub-network is determined by: determining a first vector based on parameters of candidate nodes forming a sub-network before training; determining a second vector based on the trained parameters of the candidate nodes forming the sub-network; based on the distance between the first vector and the second vector, a degree of change in a parameter of the subnetwork is determined.

In one embodiment, the determining module 140 is further configured to perform the following steps: a first step of determining a plurality of subnetworks with the maximum degree of change of parameters in the candidate set and/or a plurality of subnetworks with the degree of change of parameters in the candidate set larger than a change threshold as a plurality of candidate networks; a second step of generating one or more derivative networks based on the plurality of candidate networks; a third step, forming a new candidate set by all the candidate networks and all the derivative networks, and using the new candidate set as a sub-network in the new candidate set; repeatedly executing the first step, the second step and the third step to preset times; and (4) according to the formed sub-network with the maximum change degree of the parameters in all the candidate sets, using the sub-network as a neural network model.

In an embodiment, the determining module 140 is further configured to: and recombining the nodes based on any two or more than two candidate networks to obtain the derivative network.

In an embodiment, the determining module 140 is further configured to: and replacing the corresponding nodes of the candidate network according to any one or more candidate nodes in the trained super network based on any candidate network to obtain the derivative network.

In an embodiment, the first obtaining module 110 is further configured to: acquiring a plurality of training data; and randomly generating a classification label corresponding to each training data based on the category.

In one embodiment, the super network training module 120 is further configured to: constructing a super network in an initial state, wherein the super network in the initial state comprises a plurality of candidate nodes; determining a sub-network to be trained based on the candidate nodes; and adjusting the parameters of the sub-network to be trained on the basis of the training set to obtain the trained sub-network.

In one embodiment, the super network training module 120 is further configured to: and repeatedly executing the following steps to a specified number of times to obtain the trained hyper-network: determining a sub-network to be trained based on the candidate nodes; and adjusting the parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

In one embodiment, the super network training module 120 is further configured to: determining batch data based on the training set, wherein the batch data comprises all or part of the training data in the training set; inputting training data into a sub-network to be trained to obtain prediction classification; and calculating loss according to the prediction classification and the classification label corresponding to the training data, and adjusting the parameters of the sub-network to be trained to obtain the trained sub-network.

In one embodiment, the super network training module 120 is further configured to: randomly selecting a plurality of candidate nodes as nodes of a sub-network to be trained; all nodes of the sub-network to be trained are connected unidirectionally.

In one embodiment, the neural network model is used for classifying and identifying the image; the training data is image data.

With respect to the neural network model acquisition apparatus 100 for a computer in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Based on the same inventive concept, the present disclosure further provides a model training apparatus 200, wherein the training neural network model may be used for image classification, as shown in fig. 8, the model training apparatus 200 may include: a second obtaining module 210, configured to obtain a neural network model to be trained, where the neural network model is obtained by the computer-implemented neural network model obtaining method 10 according to any one of the foregoing embodiments; the second training module 220 is configured to train the target neural network based on a classification dataset, where the classification dataset includes classification data and real classification tags corresponding to the classification data, and the real classification tags are consistent with target classes in the corresponding classification data.

With respect to the model training apparatus 200 in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Based on the same inventive concept, the present disclosure also provides an object classification apparatus 300, as shown in fig. 9, the object classification apparatus 300 includes: a third obtaining module 310, configured to obtain an image to be classified; the classification module 320 is configured to obtain the category information of the image to be classified through a neural network model, where the neural network model is obtained through the training according to the model training method 20 in any of the foregoing embodiments.

With regard to the object classification apparatus 300 in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

As shown in fig. 10, one embodiment of the present disclosure provides an electronic device 400. The electronic device 400 includes a memory 401, a processor 402, and an Input/Output (I/O) interface 403. The memory 401 is used for storing instructions. A processor 402, configured to invoke the instructions stored in the memory 401 to execute the semantic segmentation model training method or the image semantic segmentation method according to the embodiment of the present disclosure. The processor 402 is connected to the memory 401 and the I/O interface 403, respectively, for example, through a bus system and/or other connection mechanism (not shown). The memory 401 may be used to store programs and data, including programs of a semantic segmentation model training method or an image semantic segmentation method, which are referred to in the embodiments of the present disclosure, and the processor 402 executes various functional applications and data processing of the electronic device 400 by executing the programs stored in the memory 401.

The processor 402 in the embodiment of the present disclosure may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and the processor 402 may be one or a combination of several Central Processing Units (CPUs) or other Processing units with data Processing capability and/or instruction execution capability.

Memory 401 in the disclosed embodiments may comprise one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile Memory may include, for example, a Random Access Memory (RAM), a cache Memory (cache), and/or the like. The nonvolatile Memory may include, for example, a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), a Solid-State Drive (SSD), or the like.

In the embodiment of the present disclosure, the I/O interface 403 may be used to receive input instructions (e.g., numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device 400, etc.), and may also output various information (e.g., images or sounds, etc.) to the outside. The I/O interface 403 may include one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, a touch panel, and the like in embodiments of the present disclosure.

It is to be understood that although operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.

The methods and apparatus related to embodiments of the present disclosure can be accomplished with standard programming techniques with rule-based logic or other logic to accomplish the various method steps. It should also be noted that the words "means" and "module" as used herein and in the claims is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving input.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code, which is executable by a computer processor for performing any or all of the described steps, operations, or procedures.

The foregoing description of the implementations of the disclosure has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosure. The embodiments were chosen and described in order to explain the principles of the disclosure and its practical application to enable one skilled in the art to utilize the disclosure in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A computer-implemented neural network model acquisition method, wherein the neural network model acquisition method comprises:

acquiring a training set, wherein the training set comprises a plurality of training data and classification labels corresponding to the training data;

training a super network in an initial state based on the training set to obtain a trained super network, wherein the super network comprises a plurality of candidate nodes;

model search is carried out based on the trained super network to obtain a plurality of sub networks consisting of the candidate nodes to form a candidate set;

and determining a neural network model based on the variation degree of the parameters of the sub-networks in the candidate set, wherein the variation degree of the parameters of the sub-networks in the candidate set comprises the variation degree of the parameters of the sub-networks after training compared with the variation degree before training.

2. The computer-implemented neural network model acquisition method of claim 1, wherein the degree of change in the parameters of the sub-network is determined by:

determining a first vector based on parameters of candidate nodes constituting the sub-network before training;

determining a second vector based on the trained parameters of the candidate nodes composing the sub-network;

determining a degree of change in a parameter of the sub-network based on a distance between the first vector and the second vector.

3. The computer-implemented neural network model acquisition method of claim 2, wherein the distance between the first vector and the second vector comprises any one of: cosine distance, euclidean distance, or manhattan distance.

4. The computer-implemented neural network model acquisition method of any one of claims 1-3, wherein determining a neural network model based on a degree of variation of a parameter of a neural network in the candidate set comprises:

a first step of determining a plurality of subnetworks with the maximum variation degree of the parameters in the candidate set and/or a plurality of subnetworks with the variation degree of the parameters in the candidate set larger than a variation threshold as a plurality of candidate networks;

a second step of generating one or more derivative networks based on the plurality of candidate networks;

a third step of forming a new candidate set by using all the candidate networks and all the derivative networks as sub-networks in the new candidate set;

repeatedly executing the first step, the second step and the third step to a preset number of times;

and according to the formed sub-network with the maximum change degree of the parameters in all the candidate sets, using the sub-network as the neural network model.

5. The computer-implemented neural network model acquisition method of claim 4, wherein the generating one or more derivative networks based on the plurality of candidate networks comprises:

and recombining the nodes based on any two or more than two candidate networks to obtain the derivative network.

6. The computer-implemented neural network model acquisition method of claim 4, wherein the generating one or more derivative networks based on the plurality of candidate networks further comprises:

and replacing the corresponding nodes of the candidate network according to any one or more candidate nodes in the trained super network based on any one of the candidate networks to obtain a derivative network.

7. The computer-implemented neural network model acquisition method of any one of claims 1-3, wherein the acquiring a training set comprises:

acquiring the plurality of training data;

and randomly generating a classification label corresponding to each training data based on the category.

8. The computer-implemented neural network model acquisition method of any one of claims 1-7, wherein the training an initial state of the super network based on the training set to obtain a trained super network comprises:

constructing the initial state super network, wherein the initial state super network comprises a plurality of candidate nodes;

determining a sub-network to be trained based on the candidate nodes;

and adjusting the parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

9. The computer-implemented neural network model acquisition method of claim 8, wherein the trained super-network is obtained by repeating the following steps for a specified number of times: the determining the sub-network to be trained based on the candidate node; and adjusting the parameters of the sub-network to be trained based on the training set to obtain the trained sub-network.

10. The computer-implemented neural network model obtaining method of claim 8, wherein adjusting parameters of the sub-network to be trained based on the training set to obtain a trained sub-network comprises:

determining batch data based on the training set, wherein the batch data comprises all or part of the training data in the training set;

inputting the training data into the sub-network to be trained to obtain a prediction classification;

and calculating loss according to the prediction classification and the classification label corresponding to the training data, and adjusting the parameters of the sub-network to be trained to obtain the trained sub-network.

11. The computer-implemented neural network model obtaining method of claim 8, wherein the determining a sub-network to be trained based on the candidate nodes comprises:

randomly selecting a plurality of candidate nodes as the nodes of the sub-network to be trained;

all nodes of the sub-network to be trained are connected in a unidirectional manner.

12. The computer-implemented neural network model acquisition method of any one of claims 1-3, wherein the neural network model is used for performing classification recognition on an image;

the training data is image data.

13. A model training method, wherein the model training method comprises:

obtaining a neural network model to be trained, wherein the neural network model is obtained by a computer-implemented neural network model obtaining method according to any one of claims 1 to 12;

training the neural network model based on a classification data set, wherein the classification data set comprises classification data and real classification labels corresponding to the classification data, and the real classification labels are consistent with target classes in the corresponding classification data.

14. An object classification method, wherein the object classification method comprises:

acquiring an image to be classified;

obtaining the class information of the image to be classified through a neural network model, wherein the neural network model is obtained through training by the model training method according to claim 13.

15. A neural network model acquisition apparatus for a computer, wherein the neural network model acquisition apparatus comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of training data and classification labels corresponding to the training data;

the training set comprises a training set training module, a hyper-network training module and a training result generation module, wherein the training set training module is used for training a hyper-network in an initial state based on the training set to obtain a trained hyper-network, and the hyper-network comprises a plurality of candidate nodes;

the searching module is used for carrying out model searching based on the trained super network to obtain a plurality of sub networks consisting of the candidate nodes and form a candidate set;

and the determining module is used for determining a neural network model based on the change degree of the parameters of the sub-networks in the candidate set, wherein the change degree of the parameters of the sub-networks in the candidate set comprises the change degree of the parameters of the sub-networks after training compared with the change degree before training.

16. A model training apparatus, wherein the model training apparatus comprises:

a second obtaining module, configured to obtain a neural network model to be trained, where the neural network model is obtained by the computer-implemented neural network model obtaining method according to any one of claims 1 to 11;

and the second training module is used for training the neural network model based on a classification data set, wherein the classification data set comprises classification data and real classification labels corresponding to the classification data, and the real classification labels are consistent with target classes in the corresponding classification data.

17. An object classification apparatus, wherein the object classification apparatus comprises:

the third acquisition module is used for acquiring the image to be classified;

a classification module, configured to obtain class information of the image to be classified through a neural network model, where the neural network model is obtained through the model training method according to claim 13.

18. An electronic device, wherein the electronic device comprises:

a memory to store instructions; and

a processor for invoking the memory-stored instructions to perform the computer-implemented neural network model acquisition method of any one of claims 1-12, or the model training method of claim 13, or the object classification method of claim 14.

19. A computer-readable storage medium having stored therein instructions which, when executed by a processor, perform a computer-implemented neural network model acquisition method as claimed in any one of claims 1-12, or a model training method as claimed in claim 13, or an object classification method as claimed in claim 14.