WO2021103675A1 - 神经网络的训练及人脸检测方法、装置、设备和存储介质 - Google Patents
神经网络的训练及人脸检测方法、装置、设备和存储介质 Download PDFInfo
- Publication number
- WO2021103675A1 WO2021103675A1 PCT/CN2020/110160 CN2020110160W WO2021103675A1 WO 2021103675 A1 WO2021103675 A1 WO 2021103675A1 CN 2020110160 W CN2020110160 W CN 2020110160W WO 2021103675 A1 WO2021103675 A1 WO 2021103675A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- learning rate
- neural network
- target value
- training
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Definitions
- the present disclosure relates to deep learning technology, for example, it relates to a neural network training and face detection method, device, device, and storage medium.
- Deep learning methods based on neural networks have a wide range of applications in many fields such as computer vision, natural language processing, and text understanding, and these fields basically cover many technologies such as image and video processing, voice processing, text processing, and so on, which are required by Internet applications.
- Deep learning uses neural networks as a feature extraction tool for data, trains parameters in the neural network through a large number of samples, and fits the annotations of the samples, such as types, so as to have predictive capabilities in scenarios similar to sample distribution.
- an objective function is defined to calculate the prediction value and label of the current neural network
- the error between the actual values of, the error is also called the loss value, and then the optimization method is used to update the parameters of the neural network.
- the goal of the optimization method is to update the parameters of the neural network to reduce the loss value, that is, to minimize the loss value of the objective function as much as possible.
- the training of the neural network is to move in the direction of its gradient reduction, and update the parameters according to a specific learning rate (also called step size).
- the neural network in related technologies uses the results of learning rate and objective function to update parameters.
- the learning rate of the optimization method determines the magnitude of parameter update.
- the learning rate has a greater impact on training neural networks.
- the neural network in related technologies The network usually uses a single optimization method for training. While meeting the needs of one aspect, it is easy to ignore the needs of other aspects.
- the learning rate has an impact on the speed and generalization ability of training neural networks: if the learning rate is too small, the training of the neural network will be slow, resulting in a long training period and affecting the training efficiency of the neural network; if the learning rate is too large , It is possible to skip the optimal parameters, and the generalization ability of the neural network is poor.
- the present disclosure provides a neural network training and face detection method, device, equipment and storage medium to solve how to balance the training cycle and generalization ability of the neural network.
- a neural network training method including:
- a face detection method including:
- the image data is input into a preset neural network for processing to identify the area where the face data is located in the image data, wherein the neural network adopts the neural network training method described in the first aspect training.
- a neural network training device including:
- Neural network determination module set to determine the neural network
- a first training module configured to train the neural network at a first learning rate according to a first optimization method, and the first learning rate is updated every time the neural network is trained;
- the learning rate mapping module is configured to map the first learning rate of the first optimization mode to the second learning rate of the second optimization mode in the same vector space;
- Switching determining module configured to determine that the second learning rate satisfies a preset update condition
- the second training module is configured to continue training the neural network at the second learning rate according to the second optimization method.
- a face detection device including:
- Image data receiving module set to receive image data
- the face area recognition module is configured to input the image data into a preset neural network for processing to identify the area where the face data is located in the image data, wherein the neural network passes through the present disclosure
- the training device of the neural network mentioned above is trained.
- a computer device is also provided, and the computer device includes:
- One or more processors are One or more processors;
- Memory set to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the neural network training method or the face detection method described in the present disclosure.
- a computer-readable storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, the neural network training method or the face detection method as described in the present disclosure is realized.
- Fig. 1 is a schematic diagram of a neural network training provided by an embodiment of the present invention
- FIG. 2 is an example diagram of a saddle point provided by an embodiment of the present invention.
- FIG. 3 is a flowchart of a neural network training method provided by Embodiment 1 of the present invention.
- Fig. 4 is a flowchart of a neural network training method provided in the second embodiment of the present invention.
- FIG. 5 is a flowchart of a face detection method provided by Embodiment 3 of the present invention.
- FIG. 6 is a diagram of an example of face detection provided by Embodiment 3 of the present invention.
- FIG. 7 is a schematic structural diagram of a neural network training device provided by the fourth embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a face detection apparatus provided by Embodiment 5 of the present invention.
- FIG. 9 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention.
- the neural network in deep learning is usually composed of layers with different functions. Take the Convolutional Neural Network (CNN) used in computer vision as an example. CNN usually contains a large number of convolutional layers, activation layers, and pooling layers. and many more.
- CNN Convolutional Neural Network
- Each layer calculates the input data through the functional formula expressed by the parameters stored in the layer to obtain the output data, and the output data is used as the input data of the next layer.
- the neural network can be regarded as a kind of function mapping, and the training process of the neural network is a process of function optimization.
- the goal of optimization solution is to continuously update the parameters contained in the neural network, and use the labeled samples as input data. After the calculation of the neural network, the loss between the output predicted value and the label is the smallest.
- the process of neural network training is the process of parameter update: calculate the gradient of the objective function in the current parameter, and then calculate the update amplitude of the parameter according to the loss value and the learning rate, and update the parameter in the opposite direction of the gradient.
- the parameter gradient g t of the objective function at the t-th time can be expressed as:
- the update range of the parameters at the t-th time can be expressed as:
- the update at time t+1 can be expressed as:
- the solution process of the neural network mainly depends on the parameters at the current moment, the definition of the objective function, and the learning rate.
- the parameters of the neural network are generally initialized randomly, and then learned according to the sample, the parameters at the current moment depend on the distribution of the sample and the update of the parameters at the previous moment; the definition of the objective function is determined according to different tasks, such as for For classification tasks, you can use the Softmax function, for position regression tasks, you can use the Smooth L1 function, and so on.
- the learning rate determines the speed of parameter update. Since it is not known whether the direction of parameter update at the current moment is toward the direction of the optimal solution, it is hoped to update the parameters as quickly as possible in the direction of the optimal solution, and in other directions Try not to update as much as possible. However, it is more difficult to set the learning rate. If the learning rate is too small, the neural network will not be able to converge, which greatly affects the efficiency of training; if the learning rate is too large, the parameters of the neural network will linger at one level. In the shock interval, it affects the generalization ability of the neural network, which is also a problem that should be avoided as much as possible during the training process.
- the optimization methods of neural networks in related technologies are mainly divided into two categories: one is the optimization method of manually setting the learning rate represented by the stochastic gradient descent (SGD) algorithm; The other is an optimization method for adaptively setting the learning rate represented by Adaptive Moment Estimation (Adam).
- SGD stochastic gradient descent
- Adam Adaptive Moment Estimation
- the SGD method and the update method of the above parameters are basically the same. However, in actual use, due to the consideration of training efficiency and hardware limitations, it is generally chosen to train the data in batches during one iteration.
- the batch is called mini-batch.
- the gradient is calculated and the parameters are updated in a batch. Therefore, this iterative process is also called mini-batch gradient descent (MBGD). Regardless of whether it is SGD or MBGD, the learning rate at a time is manually set.
- saddle point is the local lowest point of a smooth function.
- Surfaces, or hypersurfaces, are located on different sides of the tangent at this point.
- SGDM momentum-based stochastic gradient descent
- inertia that is, momentum
- SGDM introduces first-order momentum on the basis of SGD.
- the first-order momentum is the weighted value of the gradient movement direction at each time.
- the first-order momentum of the parameter at the t-th time can be expressed as:
- the calculation method is:
- m t ⁇ 1 ⁇ m t-1 +(1- ⁇ 1 ) ⁇ g t
- m t represents the first-order momentum at time t (declining direction)
- m t-1 represents the first-order momentum at time t-1
- g t represents the parameter gradient (including the direction and the amount of movement) at time t
- ⁇ 1 is the hyperparameter of the first-order momentum (generally set to an empirical value, such as 0.9) .
- the first-order momentum is approximately equal to the average value of the sum of the gradient vectors at the most recent 1/(1- ⁇ 1) moments.
- m t is determined not only by the gradient direction of the current point, but also by the descending direction accumulated before.
- ⁇ 1 is generally set to 0.9, which means that the descending direction is mainly determined by the recorded historical descending direction, and is slightly biased to the current descending direction. In this way, the possibility of oscillation can be greatly reduced, and the convergence of the model can be accelerated to a certain extent.
- the current parameter update situation can be calculated:
- the SGDM introduces first-order momentum, and some subsequent optimization methods introduce second-order momentum on the basis of it, such as Adam.
- the second-order momentum is the sum of the squares of the gradient values at each moment, which can be expressed as:
- the calculation method is:
- the calculation method can be adjusted to:
- ⁇ 2 is a hyperparameter of the second-order momentum (generally set to an empirical value, such as 0.999), combined with the first-order momentum m t :
- m t ⁇ 1 ⁇ m t-1 +(1- ⁇ 1 ) ⁇ g t
- the parameter update that introduces second-order momentum can be expressed as:
- ⁇ is a minimum value added to avoid the denominator being 0.
- m 0 and V 0 are both 0. Since ⁇ 1 and ⁇ 2 are both relatively large, the initial m t and V t will be close to 0. .
- the errors of m t and V t are often corrected according to the following equations:
- Is the corrected m t Is the corrected V t , Is a hyperparameter used to control how much of the first-order momentum obtained at time t is determined at the previous time, It is a hyperparameter used to control how much second-order momentum obtained at time t is determined at the previous time.
- m t and V t are similar to m t-1 and V t-1 , that is, it is completely determined by the first-order momentum and the second-order momentum at the previous moment; when versus When it is 0, it has nothing to do with the first-order momentum and the second-order momentum at the previous moment, that is, it is completely separated from the current moment g t , Decided.
- the second-order momentum V t is accumulated in a fixed window period, and with the change of time, the training data of the neural network may undergo huge changes, which causes the V t to be large and sometimes small, which will affect the latter part of the training.
- the turbulence of the learning rate results in poor convergence, and the generalization ability is affected.
- the learning rate is basically composed of a small or very large learning rate. This extreme learning rate has a potential adverse effect on the performance of the neural network.
- FIG. 3 is a flowchart of a neural network training method provided in the first embodiment of the present invention. This embodiment can be applied to the case of using two or more optimization methods to train the neural network.
- the method can be trained by the neural network.
- the neural network training device can be implemented by software and/or hardware, and can be configured in a computer device, such as a server, a workstation, a personal computer, etc., the method includes the following steps:
- ANN Artificial Neural Networks
- neural networks can be divided into continuous networks and discrete networks, or deterministic networks and random networks.
- the neural network can be divided into a forward network and a feedback network.
- neural networks can be divided into supervised learning networks and unsupervised learning networks.
- neural networks can be divided into first-order linear association networks and high-order nonlinear association networks.
- the neural network may include but is not limited to at least one of the following:
- DNN Deep Neural Networks
- Neural network is based on the extension of the perceptron, and DNN can be understood as a neural network with many hidden layers. Multi-layer neural network and deep neural network DNN are essentially the same. DNN is sometimes called Multi-Layer perceptron (MLP).
- MLP Multi-Layer perceptron
- DNN has the following limitations:
- CNN Because CNN limits the number of parameters and mines the characteristics of the local structure, CNN is suitable for image recognition.
- the signals of each layer of neurons can only propagate to the upper layer, and the processing of samples is independent at each moment (this is a feedforward neural network).
- the output of the neuron can directly affect itself at the next time stamp.
- the final result O(t+1) of the network at time (t+1) is the result of the input and all history at that time, which achieves the purpose of modeling time series.
- RNN can be seen as a neural network transmitted in time, its depth is the length of time, and the phenomenon of gradient disappearance appears on the time axis.
- those skilled in the art can detect objects (such as faces, garbage, characters, license plates, traffic lights, etc.) in the image field according to actual needs, and identify diseases in the medical field.
- Predictive analysis in the financial field such as sales, financial allocation between products, capacity utilization), etc., select a suitable neural network as a model, and wait for training.
- MTCNN multi-task convolutional neural networks
- LSTM Long Short-Term Memory
- the first optimization method in the two adjacent stages, for the selected neural network, in the previous stage, can be used to train the neural network at the first learning rate.
- Both the first optimization method and the second optimization method belong to optimization methods, which are also called optimization algorithms, optimization solving methods, etc., which are aimed at different optimization methods at different stages of training neural networks.
- the first learning rate and the second learning rate are both learning rates, which are specific to different learning rates at different stages of training the neural network.
- the first optimization method and the second optimization method have differences in two or more dimensions for training neural networks.
- the dimension includes the speed of training the neural network and the generalization ability of the neural network.
- the speed of training the neural network using the first optimization method is greater than the speed of training the neural network using the second optimization method, and the generalization ability of the neural network trained using the first optimization method is lower than the generalization ability of the neural network trained using the second optimization method .
- Generalization ability refers to the ability of neural networks to adapt to fresh samples. The purpose of learning is to learn the laws hidden behind the samples. For data other than samples with the same laws, the trained network can also give Appropriate output.
- the first optimization method includes an optimization method for adaptively setting the learning rate such as adaptive moment estimation Adam
- the second optimization method includes an optimization method for manually setting the learning rate such as stochastic gradient descent SGD.
- adaptive moment estimation Adam to train the neural network in the previous stage can ensure the speed of training the neural network and achieve rapid descent convergence.
- the neural network trained with stochastic gradient descent SGD can ensure the generalization of the neural network.
- Ability not only solves the problem of insufficient generalization ability of adaptive moment estimation Adam training neural network, but also solves the problem of slow speed of stochastic gradient descent SGD training neural network.
- first optimization method and second optimization method are just examples.
- other first optimization methods and second optimization methods can be set according to the actual dimensions.
- the first optimization method is used to train neural network occupancy
- the resources of is less than the resources occupied by training the neural network using the second optimization method, and the generalization ability of the neural network trained using the first optimization method is lower than the generalization ability of the neural network trained using the second optimization method, or use the first optimization
- the resource occupied by the neural network training method is less than the resources occupied by the neural network training using the second optimization method, and the speed of training the neural network using the first optimization method is greater than the speed of training the neural network using the second optimization method.
- those skilled in the art can also adopt other first optimization methods and second optimization methods according to actual needs, and this embodiment does not limit this.
- training the neural network using the first optimization method and training the neural network using the second optimization method are in the same vector space, so that the first learning rate of the first optimization method can be mapped to the second optimization method of the second optimization method. Learning rate.
- Training the neural network using the first optimization method is iterative.
- the first learning rate of the first optimization method is updated every time the neural network is trained, and the first learning rate of the first optimization method is updated every time the first learning rate of the first optimization method is updated.
- the first learning rate of the optimization mode is mapped to the second learning rate of the second optimization mode.
- the update conditions can be preset, such as within a preset value range, the value converges, the number of updates exceeds a preset threshold, etc., if the second learning rate meets the update condition, you can switch The second optimization method.
- the second learning rate of the second optimization method is generally relatively small. Therefore, after using the first optimization method, the value of the second learning rate can be kept unchanged, and the neural network can be directly trained at the second learning rate.
- this embodiment can also update the value of the second learning rate each time the neural network is trained, which is not limited in this embodiment.
- training a neural network includes two stages.
- the first optimization method is used to train the neural network at the first learning rate.
- the first learning rate of the first optimization method Mapped to the second learning rate of the second optimization method.
- the second learning rate converges, switch from the first stage to the second stage.
- training a neural network includes more than two stages.
- the first optimization method is used to train the neural network at the first learning rate.
- the first learning rate of the first optimization method is mapped to the second learning rate of the second optimization method.
- the second learning rate converges, switch from the previous stage to the later stage, and in the latter stage, use the second optimization method .
- you can use other optimization methods and train the neural network at other learning rates that is, switch from other optimization methods and other learning rates to using the first optimization method
- the first learning rate continues to train the neural network.
- other optimization methods can also be used to train the neural network at other learning rates, that is, switching from the second optimization method and the second learning rate to using other optimization methods, Other learning rates train neural networks, etc., which are not limited in this embodiment.
- the neural network is trained at the first learning rate according to the first optimization method.
- the first learning rate is updated every time the neural network is trained.
- the first learning rate of the first optimization method is Map as the second learning rate of the second optimization method, determine the convergence of the second learning rate, and continue training the neural network at the second learning rate according to the second optimization method.
- the stage can switch the suitable optimization method to train the neural network, which can give play to the advantages of the suitable optimization method in different stages, reduce or avoid the problems caused by other optimization methods, and meet the needs of training neural networks in two or two aspects at the same time.
- Fig. 4 is a flowchart of a neural network training method provided by the second embodiment of the present invention. This embodiment is based on the foregoing embodiment to refine the mapping between the first learning rate and the second learning rate, and the second learning Rate convergence and other operations, the method includes the following steps:
- the first learning rate is updated every time the neural network is trained.
- the update range indicates the range of updating the first network parameter in the case of training the neural network at the first learning rate according to the first optimization method
- the first network parameter indicates the update range of the first network parameter according to the first optimization method and the first learning rate.
- the learning rate is the parameters of the neural network in the case of training the neural network.
- the first-order momentum and the second-order momentum can be determined.
- the product between the first learning rate and the first-order momentum of the first optimization method is calculated, and the product between the first learning rate and the first-order momentum of the first optimization method is used as the first target value.
- the arithmetic square root of the sum of the second-order momentum and the preset first value is calculated, and the arithmetic square root of the sum of the second-order momentum and the preset first value is used as the second target value.
- the second network parameters represent the parameters of the neural network in the case of training the neural network at the second learning rate according to the second optimization method.
- the adaptive moment estimation Adam is used as the first optimization method and the stochastic gradient descent SGD is used as the second optimization method as an example for description.
- the parameter update in the optimization solution process of the neural network can be expressed as:
- w t+1 is the parameter of the neural network at time t+1 (that is, the first network parameter)
- w t is the parameter of the neural network at time t (that is, the first network parameter)
- m t is the first-order momentum at time t
- V t is the second-order momentum at time t
- ⁇ is the first value
- ⁇ is generally the value
- a small constant prevents the denominator from being zero.
- the parameter update can be expressed as:
- w t+1 is the parameter of the neural network at time t+1 (that is, the second network parameter)
- w t is the parameter of the neural network at time t (that is, the second network parameter)
- g t is the parameter gradient of the second network parameter at time t.
- the value is different. Therefore, distinguish the first network parameter and the second Network parameter representation.
- ⁇ 1 is The weight of ⁇ 2 is the weight of.
- the update amplitude can be transposed to obtain the target vector.
- a fourth target value and a fifth target value are determined, the fourth target value is the product of the target vector and the update amplitude, and the fifth target value is the product of the target vector and the parameter gradient.
- the ratio between the fourth target value and the fifth target value is calculated, and the ratio between the fourth target value and the fifth target value is used as the second learning rate of the second optimization mode.
- the second learning rate of the second optimization method can be expressed as:
- the calculation of the second learning rate will inevitably result in jitter noise, and the second learning rate can be smoothed to reduce jitter noise.
- the first weight may be determined, and the second weight may be determined, and the first weight and the second weight are added to one.
- the second learning rate during the training of the neural network this time and after the smoothing process is the sum of the sixth target value and the seventh target value.
- the sixth target value is the first weight and the first weight after the previous training of the neural network and after the smoothing process.
- the product of the two learning rates, the seventh target value is the product of the second weight and the second learning rate before the smoothing process when the neural network is trained this time.
- ⁇ 3 is the first weight
- (1- ⁇ 3 ) is the second weight
- ⁇ t is the second learning rate after smoothing at the t-th time (that is, the t-th training neural network)
- ⁇ t-1 is the t-th The second learning rate after smoothing at time 1 (that is, the t-1th training neural network).
- the first weight is a parameter.
- the first-order momentum and the second-order momentum can be determined.
- the eighth target value is the difference between the preset second value and the hyperparameter of the first-order momentum
- the ninth target value is the preset third value and the second-order momentum The difference between the hyperparameters.
- the ratio between the arithmetic square root of the eighth target value and the arithmetic square root of the ninth target value is determined, and the ratio between the arithmetic square root of the eighth target value and the arithmetic square root of the ninth target value is used as the first weight.
- the first weight can be expressed as:
- ⁇ 1 is the hyperparameter of the first-order momentum
- ⁇ 2 is the hyper-parameter of the second-order momentum
- the update condition set for the second learning rate is numerical convergence.
- the second learning rate of the second optimization method will also update the value every time the neural network is trained, in this embodiment, a series of values of the second learning rate can be compared to determine whether the second learning rate has converged .
- the second learning rate is stable, it can be determined that the second learning rate converges.
- an error can be introduced into the second learning rate as the learning rate error.
- the second learning rate after the smoothing process is determined, and the target hyperparameter is determined, and the target hyperparameter is used to control the second learning rate of the training neural network this time.
- the learning rate error can be expressed as:
- Is the learning rate error at time t ⁇ t is the second learning rate after smoothing at time t, and the example value of the fourth value is 1, It is the target hyperparameter, which is used to control how much the second learning rate obtained at time t is determined at time t-1.
- S407 Determine the deviation of the second learning rate after the smoothing process from the learning rate error, and use the deviation as the learning rate deviation.
- the deviation between the second learning rate and the learning rate error can be calculated, and the deviation can be regarded as the learning rate deviation. If the learning rate deviation is less than the preset threshold, the value of the second learning rate can be considered Convergence and meet the update conditions, you can switch to using the second optimization method to continue training the neural network at the second learning rate. If the learning rate deviation is greater than or equal to the preset threshold, it is confirmed that the value of the second learning rate has not converged. If the update conditions are not met, continue to use the first optimization method and perform the next training at the first learning rate.
- the difference between the learning rate error and the second learning rate may be determined as the eleventh target value, the absolute value of the eleventh target value is determined, and the absolute value is used as the learning rate deviation.
- FIG. 5 is a flowchart of a face detection method provided by Embodiment 3 of the present invention. This embodiment may be applicable to a situation where a neural network trained by two or more optimization methods is used for face detection.
- the training device can be implemented by software and/or hardware, and can be configured in computer equipment, for example, personal computers, mobile terminals (such as mobile phones, tablet computers, etc.), wearable devices (such as smart watches, smart glasses, etc.), etc.,
- the method includes the following steps:
- the operating system of the computer device may include Android (Android), IOS, Windows, and so on.
- applications that can perform image processing are supported, such as short video applications, live broadcast applications, image editing applications, camera applications, instant messaging tools, gallery applications, and so on.
- the user interface can provide imported controls.
- the user can operate the imported controls through peripherals such as touch or mouse, and select locally stored images Data (represented by thumbnails or paths), or image data stored on the network (represented by Uniform Resource Locators (URL)) can also be selected, so that the application can obtain the image data.
- peripherals such as touch or mouse
- image Data represented by thumbnails or paths
- image data stored on the network represented by Uniform Resource Locators (URL)
- URL Uniform Resource Locators
- the UI can provide controls for taking photos and videos. Users can operate the controls for taking photos and videos through peripherals such as touch or mouse, and notify The application calls the camera to collect image data.
- a neural network can be pre-configured, and the neural network can be used to detect the location of face data.
- a user starts a short video application, shoots a short video at a sports meeting, and inputs image data 601 in the short video to the neural network, and the neural network can output the area where the athlete’s face is located in the image data 601 602.
- the application can perform other processing such as beautification, for example, detecting key points of the face in the area, so as to use the key points of the face for processing such as stretching and zooming, or , Add decorations to the key points of the face.
- beautification for example, detecting key points of the face in the area, so as to use the key points of the face for processing such as stretching and zooming, or , Add decorations to the key points of the face.
- the image data of the region where the face data is labeled is provided as a sample, and the neural network is trained by the neural network training method provided in the first and second embodiments.
- the neural network training method includes: determining a neural network; training the neural network at a first learning rate according to a first optimization method, and the first learning rate is used every time the neural network is trained Update; in the same vector space, map the first learning rate of the first optimization mode to the second learning rate of the second optimization mode; determine that the second learning rate meets the preset update condition; according to the second optimization Way, continue training the neural network at the second learning rate.
- the mapping the first learning rate of the first optimization mode to the second learning rate of the second optimization mode in the same vector space includes: determining an update range, the update range indicating that the update range is based on the first optimization Method, in the case of training the neural network at the first learning rate, the amplitude of updating the first network parameter, where the first network parameter represents the The parameters of the neural network in the case of training the neural network at a higher rate; determine the parameter gradient of the second network parameter, and the second network parameter represents the training of the neural network at the second learning rate according to the second optimization method In the case of the parameters of the neural network; in the same vector space, determine the projection of the update amplitude on the parameter gradient, and use this projection as the second learning rate of the second optimization method.
- the determining the update range includes: determining the first-order momentum and the second-order momentum; determining the ratio between the first target value and the second target value, and taking the ratio between the first target value and the second target value as the third Target value, the first target value is the product of the first learning rate of the first optimization method and the first-order momentum, and the second target value is the second-order momentum and the preset first
- determining the projection of the update amplitude on the parameter gradient, and using the projection as the second learning rate of the second optimization mode includes: transposing the update amplitude, Obtain a target vector; determine a fourth target value and a fifth target value, the fourth target value is the product of the target vector and the update amplitude, and the fifth target value is the target vector and the The product between the parameter gradients; calculate the ratio between the fourth target value and the fifth target value, and use the ratio between the fourth target value and the fifth target value as the second The second learning rate of the optimization method.
- the mapping the first learning rate of the first optimization mode to the second learning rate of the second optimization mode in the same vector space further includes: smoothing the second learning rate.
- the smoothing of the second learning rate includes: determining the first weight; determining the second weight; determining the second learning rate when the neural network was last trained and after the smoothing process; determining the current training
- the second learning rate after smoothing is the sum of the sixth target value and the seventh target value
- the sixth target value is the sum of the first weight and the last time the neural network was trained and after smoothing
- the seventh target value is the product of the second weight and the second learning rate before the smoothing process when the neural network is trained this time.
- the determining the first weight includes: determining the first-order momentum and the second-order momentum; determining an eighth target value and a ninth target value, where the eighth target value is a preset second value and the first-order momentum
- the ratio between the arithmetic square roots of the target value, and the ratio between the arithmetic square root of the eighth target value and the arithmetic square root of the ninth target value is used as the first weight.
- the determining that the second learning rate satisfies a preset update condition includes: determining a learning rate error; determining the deviation of the second learning rate from the learning rate error after smoothing processing, and using the deviation as the learning rate deviation; In a case where the learning rate deviation is less than a preset threshold, it is determined that the second learning rate satisfies a preset update condition.
- the determining the learning rate error includes: determining a second learning rate after smoothing processing; determining a target hyperparameter, where the target hyperparameter is used to control the second learning rate for training the neural network this time; determining the smoothing processing The ratio between the second learning rate and the tenth target value afterwards, and the ratio between the second learning rate and the tenth target value after the smoothing process is taken as the learning rate error, and the tenth target value is the predictive value. Set the difference between the fourth value and the target hyperparameter.
- the determining the deviation of the second learning rate from the learning rate error and using the deviation as the learning rate deviation includes: determining the difference between the learning rate error and the second learning rate, and The difference between the learning rate error and the second learning rate is used as the eleventh target value; the absolute value of the eleventh target value is determined, and the absolute value is used as the learning rate deviation.
- the neural network includes a convolutional neural network CNN
- the first optimization method includes adaptive moment estimation Adam
- the second optimization method includes stochastic gradient descent SGD.
- the neural network can be trained offline on other computer equipment, and after the neural network training is completed, the neural network can be distributed to the current computer equipment.
- the neural network can be directly trained on the current computer equipment, which is not limited in this embodiment.
- the image data is received, and the image data is input into a preset neural network for processing to identify the area where the face data is located in the image data. Because of the mapping in the same vector space through the learning rate, This makes it possible to switch between suitable optimization methods for training neural networks at different stages, which can give play to the advantages of suitable optimization methods at different stages, reduce or avoid problems caused by other optimization methods, and meet the requirements of two or two aspects for training neural networks. Therefore, the performance of the neural network is improved and the effect of face detection is guaranteed.
- the adaptive moment estimation Adam is used to train the neural network in the previous stage to ensure the speed of training the neural network and achieve rapid descent convergence.
- the neural network trained with random gradient descent SGD can ensure the neural network.
- the generalization ability of the network and the improvement of the training speed of the neural network can further increase the update speed of the neural network.
- the neural network adapts to different samples, which can improve the accuracy of the neural network for face detection and ensure the generalization ability of the neural network. It can ensure the accuracy of face detection by the neural network in the same sample situation.
- FIG. 7 is a schematic structural diagram of a neural network training device provided by the fourth embodiment of the present invention.
- the device may include the following modules: a neural network determining module 701 configured to determine a neural network; and a first training module 702 configured to determine a neural network according to the In an optimization method, the neural network is trained at a first learning rate, and the first learning rate is updated each time the neural network is trained; the learning rate mapping module 703 is set to be in the same vector space, and the first learning rate The first learning rate of an optimization method is mapped to the second learning rate of the second optimization method; the switching determination module 704 is configured to determine that the second learning rate meets the preset update condition; the second training module 705 is configured to The second optimization method continues to train the neural network at the second learning rate.
- a neural network determining module 701 configured to determine a neural network
- a first training module 702 configured to determine a neural network according to the In an optimization method, the neural network is trained at a first learning rate, and the first learning rate is
- the neural network training device provided by the embodiment of the present invention can execute the neural network training method provided by any embodiment of the present invention, and has corresponding functional modules and effects for the execution method.
- FIG. 8 is a schematic structural diagram of a face detection device provided by Embodiment 3 of the present invention.
- the device may include the following modules: an image data receiving module 801, configured to receive image data; and a face area recognition module 802, configured to The image data is input into a preset neural network for processing to identify the area where the face data is located in the image data, and the neural network is trained by the neural network training device provided in the fourth embodiment.
- the face detection device provided by the embodiment of the present invention can execute the face detection method provided by any embodiment of the present invention, and has the functional modules and effects corresponding to the execution method.
- FIG. 9 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention.
- the computer device includes a processor 900, a memory 901, a communication module 902, an input device 903, and an output device 904; the number of processors 900 in the computer device can be one or more.
- the processor 900 is taken as an example; the processor 900, the memory 901, the communication module 902, the input device 903, and the output device 904 in the computer equipment may be connected by a bus or other methods. In FIG. 9, the connection by a bus is taken as an example.
- the computer device provided in this embodiment can execute the neural network training method or the face detection method provided in any embodiment of the present invention, and has corresponding functions and effects.
- the seventh embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored.
- a neural network training method is realized.
- the method includes: determining a neural network; The optimization method, the neural network is trained at a first learning rate, and the first learning rate is updated every time the neural network is trained; in the same vector space, the first learning rate of the first optimization method is mapped It is the second learning rate of the second optimization mode; it is determined that the second learning rate satisfies a preset update condition; the neural network is continued to be trained at the second learning rate according to the second optimization mode.
- a face detection method is implemented, which includes: receiving image data; inputting the image data into a preset neural network for processing, so as to recognize that the face data is in the In the region in the image data, the neural network is trained by the following neural network training method: determining the neural network; training the neural network at a first learning rate according to the first optimization method, and the first learning rate is Update when training the neural network one time; in the same vector space, map the first learning rate of the first optimization mode to the second learning rate of the second optimization mode; determine that the second learning rate meets the preset Update conditions; continue to train the neural network at the second learning rate according to the second optimization method.
- the computer program thereof is not limited to the method operations described above, and can also perform the related operations in the neural network training method or the face detection method provided in any embodiment of the present invention .
- the present disclosure can be implemented by software and necessary general-purpose hardware, or can be implemented by hardware.
- the present disclosure can be embodied in the form of a software product.
- Software products can be stored in computer-readable storage media, such as computer floppy disks, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc. , Including multiple instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in multiple embodiments of the present disclosure.
- the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized;
- the names of the multifunctional units are only used to distinguish each other and are not used to limit the protection scope of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
本文公开了一种神经网络的训练及人脸检测方法、装置、设备和存储介质,该训练方法包括:确定神经网络;根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;确定所述第二学习率满足预设的更新条件;根据第二优化方式、以所述第二学习率继续训练所述神经网络。
Description
本申请要求在2019年11月29日提交中国专利局、申请号为201911205613.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本公开涉及深度学习的技术,例如涉及一种神经网络的训练及人脸检测方法、装置、设备和存储介质。
基于神经网络的深度学习方法在计算机视觉、自然语言处理、文本理解等众多领域都具有广泛的应用,而这些领域基本涵盖了互联网应用所需要的图像视频处理、语音处理、文本处理等众多技术。
深度学习使用神经网络作为数据的特征提取工具,通过大量样本来训练神经网络中的参数,拟合样本的标注,如类型,从而具备在类似于样本分布场景下的预测能力。
一般情况下,用户设定学习目标,如用于分类的标签、用做目标物体检测的标注框位置和大小等,在训练过程中,通过定义一个目标函数来计算当前神经网络的预测值与标注的实际值之间的误差,该误差又称损失值,然后利用优化方式来更新神经网络的参数。
优化方式的目标是更新神经网络的参数,以减少损失值,即尽可能地最小化目标函数的损失值。
因此,对神经网络的训练是朝着其梯度减小的方向,根据特定的学习率(也称为步长)更新参数。
相关技术中神经网络的优化方式大多是利用学习率和目标函数的结果来更新参数,优化方式的学习率决定了参数更新的幅度,学习率对训练神经网络的影响较大,相关技术中的神经网络通常使用单一的优化方式进行训练,在满足一方面的需求的同时,容易忽略其他方面的需求。
例如,学习率对训练神经网络的速度和泛化能力存在影响:如果学习率太小,则训练神经网络速度较慢,导致训练的周期过长,影响神经网络的训练效率;如果学习率过大,则很可能略过最优的参数,神经网络的泛化能力较差。
发明内容
本公开提供一种神经网络的训练及人脸检测方法、装置、设备和存储介质,以解决如何兼顾神经网络的训练周期与泛化能力。
提供了一种神经网络的训练方法,包括:
确定神经网络;
根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;
在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;
确定所述第二学习率满足预设的更新条件;
根据第二优化方式、以所述第二学习率继续训练所述神经网络。
还提供了一种人脸检测方法,包括:
接收图像数据;
将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,其中,所述神经网络通过第一方面所述的神经网络的训练方法训练。
还提供了一种神经网络的训练装置,包括:
神经网络确定模块,设置为确定神经网络;
第一训练模块,设置为根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;
学习率映射模块,设置为在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;
切换确定模块,设置为确定所述第二学习率满足预设的更新条件;
第二训练模块,设置为根据第二优化方式、以所述第二学习率继续训练所述神经网络。
还提供了一种人脸检测装置,包括:
图像数据接收模块,设置为接收图像数据;
人脸区域识别模块,设置为将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,其中,所述神经网络通过本公开所述的神经网络的训练装置训练。
还提供了一种计算机设备,所述计算机设备包括:
一个或多个处理器;
存储器,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开所述的神经网络的训练方法或人脸检测方法。
还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如本公开所述的神经网络的训练方法或人脸检测方法。
图1为本发明实施例提供的一种神经网络的训练示意图;
图2为本发明实施例提供的一种鞍点的示例图;
图3为本发明实施例一提供的一种神经网络的训练方法的流程图;
图4是本发明实施例二提供的一种神经网络的训练方法的流程图;
图5是本发明实施例三提供的一种人脸检测方法的流程图;
图6是本发明实施例三提供的一种人脸检测的示例图;
图7为本发明实施例四提供的一种神经网络的训练装置的结构示意图;
图8为本发明实施例五提供的一种人脸检测装置的结构示意图;
图9为本发明实施例六提供的一种计算机设备的结构示意图。
下面结合附图和实施例对本公开进行说明。为了便于描述,附图中仅示出了与本公开相关的部分而非全部结构。
深度学习中的神经网络通常由不同功能的层构成,以计算机视觉中使用的卷积神经网络(Convolutional Neural Network,CNN)为例,CNN通常包含数目众多的卷积层、激活层、池化层等等。
每一层都将输入的数据经过层内存储的参数所表达的函数式进行计算,得到输出的数据,该输出的数据作为下一层输入的数据。
因此,如图1所示,神经网络可以看作是一种函数映射,而神经网络的训练过程是一个函数优化求解的过程。优化求解的目标就是不断更新该神经网络所包含的参数,将已标注的样本作为输入的数据,经过神经网络的计算,输出 的预测值和标注之间的损失值最小。
神经网络训练的过程就是参数更新的过程:计算目标函数在当前参数的梯度,然后根据损失值和学习速率,计算参数的更新幅度,在梯度相反方向更新参数。
假设神经网络的参数表示为w,目标函数为f,则目标函数在第t个时刻时的参数梯度g
t可以表示为:
因此,学习率为a时,第t个时刻参数的更新幅度可以表示为:
Δw
t=-a
t·g
t
第t+1个时刻时的更新可以表示为:
w
t+1=w
t+Δw
t
从上述参数的更新方法中可以看出,神经网络的求解过程主要依赖于当前时刻的参数、目标函数的定义、学习率三个方面。
由于神经网络的参数一般是随机初始化,然后根据样本进行学习,因此,当前时刻的参数,取决于样本的分布,以及先前时刻参数的更新情况;目标函数定义则根据不同的任务来决定,比如对于分类任务,可使用Softmax函数,对于位置回归任务,可使用Smooth L1函数等等。
学习率决定了参数更新的速度,由于并不知道当前时刻参数更新的方向是否是朝着最优解的方向,因此,希望在朝着最优解的方向尽可能快的更新参数,在其他方向尽可能不作更新。但是,学习率的设定又是较为困难的,如果学习率过小,神经网络迟迟无法得到收敛,大大影响了训练的效率;如果学习率过大,则会导致神经网络的参数徘徊在一个震荡区间内,影响神经网络的泛化能力,这也是训练过程中应该尽量避免的问题。
另一方面,随着神经网络的参数更新,学习率还需要进行适当的变化。因此,合理的设置学习策略和学习速率对于神经网络的效率和泛化能力是十分重要的。
根据学习率的设定情况,相关技术中神经网络的优化方式主要分为两大类:一类是以随机梯度下降(stochastic gradient descent,SGD)算法为代表的手动设定学习率的优化方式;另一类是以自适应矩估计(Adaptive Moment Estimation, Adam)为代表的自适应设定学习率的优化方式。
SGD方法和上述参数的更新方法基本一致。不过,在实际的使用中,出于训练效率和硬件限制的考虑,一般选择在一次迭代过程中,对数据进行分批量训练,该批量称为mini-batch。一般情况下,选择在一个batch内计算梯度和更新参数,因此,该迭代过程也称批量梯度下降法(mini-batch gradient descent,MBGD)。无论是SGD还是MBGD,均手动设定一个时刻的学习速率。
SGD作为代表性的优化方式,存在明显的不足:
1、手动设定的学习率如果太小,收敛速度会很慢,如果太大,目标函数就会在极小值处不停地震荡甚至偏离。
2、对所有参数更新时应用同样的学习率,灵活性较差,如果数据是稀疏的,是更希望对出现频率低的特征进行大一点的更新。
3、如果目标函数是非凸函数,还要避免陷于局部极小值处,或者鞍点处,因为鞍点周围的所有维度的梯度都接近于0,SGD作为代表性的优化方式很容易被困在鞍点处。
所谓鞍点,就是一个光滑函数的局部最低点。曲面、或超曲面,都位于这点的切线的不同边。
如图2所示,对于一个三维模型z=x2-y2,其形状类似于马鞍,在横轴方向往上曲,在竖轴方向往下曲,鞍点是(0,0)。
为了抑制SGD的震荡,出现了基于动量的随机梯度下降(SGD with Momentum,SGDM)方法。SGDM认为梯度下降过程可以加入惯性,也就是动量,就是在计算更新幅度时,不仅仅考虑当前时刻的情况,还需要考虑上一次更新时的梯度情况。SGDM在SGD的基础上引入一阶动量,一阶动量是每个时刻梯度移动方向的加权值,第t个时刻参数的一阶动量可以表示为:
计算方法为:
m
t=β
1·m
t-1+(1-β
1)·g
t
m
t表示第t时刻的一阶动量(下降方向),m
t-1表示第t-1时刻的一阶动量,
为利用此前的梯度求解当前参数更新方向的方法,g
t表示第t时刻的参数梯度(包含方向与移动的量),β
1为一阶动量的超参数(一般设置为经验值,如0.9)。
从上述公式可以看出,一阶动量约等于最近1/(1-β
1)个时刻,梯度向量和 的平均值。另外,m
t不仅由当前点的梯度方向决定,还由此前累积的下降方向决定。β
1一般设置为0.9,这就意味着下降方向主要是由记录的历史下降方向决定,并略微偏向当前时刻的下降方向。如此一来,能够大大减少震荡的可能性,在一定程度上加速模型的收敛。根据动量和学习率可以计算当前参数的更新情况:
Δw
t=-a
t·m
t
SGDM引入了一阶动量,后续的一些优化方式在其基础上又引入了二阶动量,如Adam。二阶动量是每个时刻梯度值的平方和,可以表示为:
计算方法为:
考虑到动量的惯性定义,则计算方式可以调整为:
β
2是二阶动量的超参数(一般设置为经验值,如0.999),结合一阶动量m
t:
m
t=β
1·m
t-1+(1-β
1)·g
t
引入二阶动量的参数更新可以表示为:
ε是为了避免分母为0加上的一个极小值,初始化时,m
0和V
0都是0,由于β
1和β
2都比较大,因此,初期的m
t和V
t都会接近于0。为了修正这个自适应算法的误差,常常根据下面的式子对m
t和V
t进行误差修正:
当
与
接近1时,m
t与V
t近似m
t-1与V
t-1,即完全由上一时刻的一阶动量、二阶动量决定;当
与
为0时,则跟上一时刻的一阶动量、二阶动量没有任何关系,即完全分别由当前时刻的g
t、
决定。
由二阶动量的参数更新公式可以看出,此时学习率实质上变为了
而且参数更新的越频繁(V
t越大),学习率就越小。于是,设置初始的学习率a
0(也可以认为a
0=a
1=…=a
t)。Adam不需要手动变化学习率,设定一个初始学习率即可,并且每个参数都可以计算一个自适应的学习率,因此,对于稀疏的特征参数表现较好。但是,Adam同时也存在一些问题:
1、二阶动量V
t是在一个固定窗口期内累计的,而随着时间的变化,神经网络的训练数据可能会发生巨大的变化,因此导致V
t时大时小,在训练的后期影响学习率的震荡,导致收敛效果较差,泛化能力受到影响。
2、当用Adam训练的神经网络接近收敛的时候,学习率基本由很小或者很大的学习率组成,这种极端的学习率对神经网络的性能存在潜在的不良影响。
实施例一
图3为本发明实施例一提供的一种神经网络的训练方法的流程图,本实施例可适用于使用两个或两个以上优化方式训练神经网络的情况,该方法可以由神经网络的训练装置来执行,该神经网络的训练装置可以由软件和/或硬件实现,可配置在计算机设备中,例如,服务器、工作站、个人电脑,等等,该方法包括如下步骤:
S301、确定神经网络。
神经网络,又称人工神经网络(Artificial Neural Networks,ANN),是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。
按性能,神经网络可划分为连续型网络和离散型网络,或确定型网络和随机型网络。
按拓扑结构,神经网络可划分为前向网络和反馈网络。
按学习方法,神经网络可划分有监督的学习网络和无监督的学习网络。
按连接突触性质,神经网络可划分一阶线性关联网络和高阶非线性关联网络。
在本实施例中,该神经网络可以包括但不限于如下至少一种:
1、深度神经网络(Deep Neural Networks,DNN)
神经网络是基于感知机的扩展,而DNN可以理解为有很多隐藏层的神经网络。多层神经网络和深度神经网络DNN实质相同,DNN有时也叫做多层感知机(Multi-Layer perceptron,MLP)。
DNN存在的如下局限:
1.1、参数数量膨胀。由于DNN采用的是全连接的形式,结构中的连接带来了数量级的权值参数,这不仅容易导致过拟合,也容易造成陷入局部最优。
1.2、局部最优。随着神经网络的加深,优化函数更容易陷入局部最优,且偏离真正的全局最优,对于有限的训练数据,性能甚至不如浅层网络。
1.3、梯度消失。使用sigmoid激活函数(传递函数),在反向传播(Back Propagation,BP)梯度时,梯度会衰减,随着神经网络层数的增加,衰减累积下,到底层时梯度基本为0。
1.4、无法对时间序列上的变化进行建模。对于样本的时间顺序对于自然语言处理、语音识别、手写体识别等应用非常重要。
2、CNN
主要针对DNN存在的参数数量膨胀问题,对于CNN,并不是所有的上下层神经元都能直接相连,而是通过卷积核作为中介。同一个卷积核在多个图像内是共享的,图像通过卷积操作仍能保留原先的位置关系。
因为CNN限制参数个数并挖掘局部结构的这个特点,使得CNN适合图像识别。
3、循环神经网络(Recurrent Neural Network,RNN)
针对CNN中无法对时间序列上的变化进行建模的局限,为了适应对时序数据的处理,出现了RNN。
在普通的全连接网络或者CNN中,每层神经元的信号只能向上一层传播,样本的处理在每个时刻独立(这种就是前馈神经网络)。而在RNN中,神经元的输出可以在下一个时间戳直接作用到自身。
(t+1)时刻网络的最终结果O(t+1)是该时刻输入和所有历史共同作用的结果,这就达到了对时间序列建模的目的。
但是,RNN可以看成一个在时间上传递的神经网络,它的深度是时间的长度,而梯度消失的现象出现时间轴上。
在本实施例中,本领域技术人员可以根据实际的需求情况,例如,在图像领域中的目标(如人脸、垃圾、字符、车牌、红绿灯等)检测,在医疗领域中的疾病识别,在金融领域中的预测分析(如销售、产品之间的财务分配,产能 利用率),等等,选择适合的神经网络作为模型,等待训练。
例如,若需求识别手写体数字字符,则可以选择CNN中的LeNet-5。
又例如,若需求人脸检测和对齐,则可以选择CNN中的多任务卷积神经网络(Multi-task convolutional neural networks,MTCNN)。
又例如,若需求自然语言处理,如机器翻译、语音识别、情感分析等,则可以选择RNN中的长短期记忆网络(Long Short-Term Memory,LSTM)。
除了相关技术中的网络结构之外,本领域技术人员还可以根据实际情况对神经网络的网络结构进行调整,本实施例对此不加以限制。
S302、根据第一优化方式、以第一学习率训练所述神经网络。
在本实施例中,在相邻的两个阶段中,对于选定的神经网络,在前一个阶段,可使用第一优化方式、以第一学习率训练该神经网络,在该后一阶段,可从第一优化方式、第一学习率切换至使用第二优化方式、以第二学习率继续训练该神经网络。
第一优化方式、第二优化方式均属于优化方式,又称优化算法、优化求解方法等等,是针对训练神经网络的不同阶段的不同优化方式而言的。
第一学习率、第二学习率均属于学习率,是针对训练神经网络的不同阶段的不同学习率而言的。
第一优化方式与第二优化方式对于训练神经网络、在两个或两个以上维度上存在差异。
在一个示例中,该维度包括训练神经网络的速度、神经网络的泛化能力。
使用第一优化方式训练神经网络的速度大于使用第二优化方式训练神经网络的速度,使用第一优化方式训练的神经网络的泛化能力低于使用第二优化方式训练的神经网络的泛化能力。
泛化能力(generalization ability),是指神经网络对新鲜样本的适应能力,学习的目的是学到隐含在样本背后的规律,对具有同一规律的样本以外的数据,经过训练的网络也能给出合适的输出。
在本示例中,第一优化方式包括自适应矩估计Adam等自适应设定学习率的优化方式,第二优化方式包括随机梯度下降SGD等手动设定学习率的优化方式。
因此,在前一阶段使用自适应矩估计Adam训练神经网络,可保证训练神经网络的速度,实现快速下降收敛,在后一阶段使用随机梯度下降SGD训练的神经网络,可保证神经网络的泛化能力,既解决了自适应矩估计Adam训练神 经网络泛化能力不足的问题,也解决了随机梯度下降SGD训练神经网络速度缓慢的问题。
上述第一优化方式、第二优化方式只是作为示例,在实施本实施例时,可以根据实际维度的情况设置其他第一优化方式、第二优化方式,例如,使用第一优化方式训练神经网络占用的资源小于使用第二优化方式训练神经网络占用的资源,使用第一优化方式训练的神经网络的泛化能力低于使用第二优化方式训练的神经网络的泛化能力,或者,使用第一优化方式训练神经网络占用的资源小于使用第二优化方式训练神经网络占用的资源,使用第一优化方式训练神经网络的速度大于使用第二优化方式训练神经网络的速度,等等,本实施例对此不加以限制。另外,除了上述第一优化方式、第二优化方式外,本领域技术人员还可以根据实际需要采用其它第一优化方式、第二优化方式,本实施例对此也不加以限制。
S303、在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率。
在本实施例中,使用第一优化方式训练神经网络与使用第二优化方式训练神经网络处于同一向量空间中,使得可以将第一优化方式的第一学习率映射为第二优化方式的第二学习率。
使用第一优化方式训练神经网络是迭代的,第一优化方式的第一学习率在每次训练神经网络时更新数值,在每次更新第一优化方式的第一学习率时,均将第一优化方式的第一学习率映射为第二优化方式的第二学习率。
S304、确定所述第二学习率满足预设的更新条件。
在本实施例中,可以预先设置更新条件,如在预设的数值范围内、数值收敛、更新的次数超过预设的阈值,等等,如果该第二学习率满足该更新条件,则可以切换第二优化方式。
S305、根据第二优化方式、以所述第二学习率继续训练所述神经网络。
如果第二学习率收敛,此时,可从第一优化方式、第一学习率切换至使用第二优化方式、以第二学习率继续训练神经网络。
在收敛时,第二优化方式的第二学习率一般会比较小。因此,使用第一优化方式后,可保持该第二学习率的数值不变,直接以该第二学习率训练神经网络。
在使用第二优化方式训练神经网络时,由于训练神经网络是迭代的,本实施例也可以在每次训练该神经网络时更新第二学习率的数值,本实施例对此不加以限制。
在一种情况中,训练神经网络包括两个阶段,在第一个阶段中,使用第一优化方式、以第一学习率训练神经网络,与此同时,将第一优化方式的第一学习率映射为第二优化方式的第二学习率,在该第二学习率收敛时,从第一阶段切换至第二阶段,在第二阶段中,使用第二优化方式、以第二学习率继续训练神经网络,直至该神经网络训练完成。
在另一些情况中,训练神经网络包括两个以上的阶段,在其中的两个阶段中,在前一阶段中,使用第一优化方式、以第一学习率训练神经网络,与此同时,将第一优化方式的第一学习率映射为第二优化方式的第二学习率,在该第二学习率收敛时,从前一阶段切换至后一阶段,在后一阶段中,使用第二优化方式、以第二学习率继续训练神经网络,在该两个阶段之前,可以使用其他优化方式、以其他学习率训练神经网络,即从其他优化方式、其他学习率切换至使用第一优化方式、以第一学习率继续训练神经网络,在该两个阶段之后,也可以使用其他优化方式、以其他学习率训练神经网络,即从第二优化方式、第二学习率切换至使用其他优化方式、以其他学习率训练神经网络,等等,本实施例对此不加以限制。
在本实施例中,根据第一优化方式、以第一学习率训练神经网络,第一学习率在每次训练神经网络时更新,在同一向量空间中,将第一优化方式的第一学习率映射为第二优化方式的第二学习率,确定第二学习率收敛,根据第二优化方式、以第二学习率继续训练神经网络,通过学习率在同一向量空间中的映射,使得在不同的阶段可切换适合的优化方式训练神经网络,可以在不同阶段发挥适合的优化方式的优势,减低或避免其他优化方式产生的问题,同时满足两个或两个方面对于训练神经网络的需求。
实施例二
图4为本发明实施例二提供的一种神经网络的训练方法的流程图,本实施例以前述实施例为基础,细化第一学习率与第二学习率之间的映射、第二学习率的收敛等操作,该方法包括如下步骤:
S401、确定神经网络。
S402、根据第一优化方式、以第一学习率训练所述神经网络。
第一学习率在每次训练神经网络时更新。
S403、确定更新幅度。
更新幅度表示在根据第一优化方式、以第一学习率训练神经网络的情况下,对第一网络参数进行更新的幅度,而第一网络参数表示在根据该第一优化方式、 以该第一学习率训练神经网络的情况下神经网络的参数。
可确定一阶动量、二阶动量。
一方面,计算第一优化方式的第一学习率与一阶动量之间的乘积,并将第一优化方式的第一学习率与一阶动量之间的乘积作为第一目标值。
另一方面,计算二阶动量与预设的第一数值之和的算术平方根,并将二阶动量与预设的第一数值之和的算术平方根作为第二目标值。
确定第一目标值与第二目标值之间的比值,并将第一目标值与第二目标值之间的比值作为第三目标值,从而确定第三目标值的相反数,并将该相反数作为更新幅度。
S404、确定第二网络参数的参数梯度。
第二网络参数表示在根据第二优化方式、以第二学习率训练神经网络的情况下所述神经网络的参数。
S405、在同一向量空间中,确定所述更新幅度在所述参数梯度上的投影,并将该投影作为所述第二优化方式的第二学习率。
在本实施例中,以自适应矩估计Adam作为第一优化方式、随机梯度下降SGD作为第二优化方式为一种示例进行说明。
第一优化方式(如Adam)在神经网络的优化求解过程中、参数更新可以表示为:
w
t+1为神经网络在第t+1时刻的参数(即第一网络参数),w
t为神经网络在第t时刻的参数(即第一网络参数),
为神经网络在第t时刻使用第一优化方式(如Adam)训练时的更新幅度,
为第t时刻第一优化方式(如Adam)的第一学习率,m
t为第t时刻的一阶动量,V
t为第t时刻的二阶动量,ε为第一数值,ε一般为值很小的常数,防止分母为0。
第二优化方式(如SGD)在神经网络的优化求解过程中、参数更新可以表示为:
w
t+1为神经网络在第t+1时刻的参数(即第二网络参数),w
t为神经网络在第t时刻的参数(即第二网络参数),
为神经网络在第t时刻使用第二优化方式(如SGD)训练时的更新幅度,
为第t时刻第二优化方式(如SGD)的第二学习率,g
t为第t时刻第二网络参数的参数梯度。
对于神经网络中的同一参数w
t,使用第一优化方式(如Adam)训练时与使用第二优化方式(如SGD)训练时,其数值有所不同,因此,区分第一网络参数、第二网络参数表示。
基于正交投影,可对更新幅度进行转置,获得目标向量。
确定第四目标值、第五目标值,第四目标值为目标向量与更新幅度之间的乘积,第五目标值为目标向量与所述参数梯度之间的乘积。
计算第四目标值与第五目标值之间的比值,并将第四目标值与第五目标值之间的比值作为第二优化方式的第二学习率。
因此,该第二优化方式的第二学习率可表示为:
在本实施例中,因为每次训练过程中样本的分布不一定是相同的,因此,第二学习率的计算难免会有抖动噪声,可以对第二学习率进行平滑处理,从而减少抖动噪音。
在一实施例中,可确定第一权重,以及,确定第二权重,第一权重与第二权重相加为1。
确定上一次训练神经网络时、平滑处理之后的第二学习率。
确定本次训练神经网络时、平滑处理之后的第二学习率为第六目标值与第七目标值之和,第六目标值为第一权重与上一次训练神经网络时、平滑处理之 后的第二学习率之间的乘积,第七目标值为第二权重与本次训练神经网络时、平滑处理之前的第二学习率之间的乘积。
因此,该第二学习率的平滑处理可表示为:
β
3为第一权重,(1-β
3)为第二权重,λ
t为第t时刻(即第t次训练神经网络)平滑处理之后的第二学习率,λ
t-1为第t-1时刻(即第t-1次训练神经网络)平滑处理之后的第二学习率。
第一权重为参数,为了不引入更多的参数,可确定一阶动量、二阶动量。
确定第八目标值与第九目标值,第八目标值为预设的第二数值与一阶动量的超参数之间的差值,第九目标值为预设的第三数值与二阶动量的超参数之间的差值。
确定第八目标值的算术平方根与第九目标值的算术平方根之间的比值,并将第八目标值的算术平方根与第九目标值的算术平方根之间的比值作为第一权重。
因此,该第一权重可以表示为:
β
1为一阶动量的超参数、β
2为二阶动量的超参数。
S406、确定学习率误差。
在本实施例中,对第二学习率设置的更新条件为数值收敛。
由于第二优化方式的第二学习率每次训练神经网络时也会更新数值,在本实施例中,可对该第二学习率的一系列数值进行比较,从而确定该第二学习率是否收敛。
如果该第二学习率稳定,则可以确定该第二学习率收敛。
可在每次训练神经网络时,对第二学习率引入误差,作为学习率误差。
在一实施例中,确定平滑处理之后的第二学习率,确定目标超参数,该目标超参数用于控制本次训练神经网络的第二学习率。
确定平滑处理之后的第二学习率与第十目标值之间的比值,并将平滑处理之后的第二学习率与第十目标值之间的比值作为学习率误差,第十目标值为预设的第四数值与目标超参数之间的差值。
因此,该学习率误差可以表示为:
S407、确定平滑处理之后的第二学习率偏离所述学习率误差的偏差,并将该偏差作为学习率偏差。
S408、在所述学习率偏差小于预设的阈值的情况下,确定所述第二学习率满足预设的更新条件。
在本实施例中,可计算第二学习率与学习率误差之间的偏差,并将该偏差作为学习率偏差,如果该学习率偏差小于预设的阈值,则可认为第二学习率的数值收敛,符合更新条件,可切换至使用第二优化方式、以该第二学习率继续训练神经网络,如果该学习率偏差大于或等于预设的阈值,则确认第二学习率的数值未收敛,不符合更新条件,继续使用第一优化方式、以该第一学习率进行下一次训练。
在一实施例中,可确定学习率误差与第二学习率之间的差值,作为第十一目标值,确定该第十一目标值的绝对值,并将该绝对值作为学习率偏差。
因此,收敛的条件可以表示为:
S409、根据第二优化方式、以所述第二学习率继续训练所述神经网络。
实施例三
图5为本发明实施例三提供的一种人脸检测方法的流程图,本实施例可适用于使用两个或两个以上优化方式训练的神经网络进行人脸检测的情况,该神经网络的训练装置可以由软件和/或硬件实现,可配置在计算机设备中,例如,个人电脑、移动终端(如手机、平板电脑等)、可穿戴设备(如智能手表、智能眼镜等),等等,该方法包括如下步骤:
S501、接收图像数据。
在一实施例中,计算机设备的操作系统可以包括Android(安卓)、IOS、Windows等等。
在这些操作系统中支持运行可进行图像处理的应用,如短视频应用、直播应用、图像编辑应用、相机应用、即时通讯工具、图库应用,等等。
诸如图像编辑应用、即时通讯工具、图库应用等应用,其用户界面(User Interface,UI)可提供导入的控件,用户可通过触控或鼠标等外设操作该导入的控件,选择本地存储的图像数据(以缩略图或路径表示),也可以选择网络存储的图像数据(以统一资源定位器(Uniform Resource Locators,URL)表示),使得应用获取该图像数据。
诸如短视频应用、直播应用、图像编辑应用、相机应用、即时通讯工具等应用,其UI可提供拍照、录像的控件,用户可通过触控或鼠标等外设操作该拍照、录像的控件,通知应用调用摄像头采集图像数据。
S502、将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域。
在计算机设备中,可预先配置神经网络,该神经网络可用于检测人脸数据的位置。
将接收到的图像数据输入至该神经网络中,该神经网络按照自身的逻辑进行处理,输出人脸数据在图像数据中所处的区域。
例如,图6中所示,用户启动短视频应用,在运动会拍摄短视频,将短视频中的图像数据601输入至神经网络中,神经网络可输出该图像数据601中运动员的人脸所在的区域602。
对于人脸数据在图像数据中所处的区域,应用可进行美颜等其他处理,例如,在该区域中检测人脸关键点,从而使用该人脸关键点进行拉伸、缩放等处理,或者,在该人脸关键点上添加装饰物。
在本实施例中,提供标注人脸数据所在区域的图像数据作为样本,通过一实施例一、二中提供的神经网络的训练方法训练该神经网络。
在一实施例中,该神经网络的训练方法包括:确定神经网络;根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;确定所述第二学习率满足预设的更新条件;根据第二优化方式、以所述第二学习率继续训练所述神经网络。
所述在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优 化方式的第二学习率,包括:确定更新幅度,所述更新幅度表示在根据所述第一优化方式、以所述第一学习率训练所述神经网络的情况下,对第一网络参数进行更新的幅度,所述第一网络参数表示在根据所述第一优化方式、以所述第一学习率训练所述神经网络的情况下所述神经网络的参数;确定第二网络参数的参数梯度,所述第二网络参数表示在根据第二优化方式、以第二学习率训练所述神经网络的情况下所述神经网络的参数;在同一向量空间中,确定所述更新幅度在所述参数梯度上的投影,并将该投影作为所述第二优化方式的第二学习率。
所述确定更新幅度,包括:确定一阶动量、二阶动量;确定第一目标值与第二目标值之间的比值,并将第一目标值与第二目标值之间的比值作为第三目标值,所述第一目标值为所述第一优化方式的第一学习率与所述一阶动量之间的乘积,所述第二目标值为所述二阶动量与预设的第一数值之和的算术平方根;确定所述第三目标值的相反数,并将该相反数作为更新幅度。
所述在同一向量空间中,确定所述更新幅度在所述参数梯度上的投影,并将该投影作为所述第二优化方式的第二学习率,包括:对所述更新幅度进行转置,获得目标向量;确定第四目标值、第五目标值,所述第四目标值为所述目标向量与所述更新幅度之间的乘积,所述第五目标值为所述目标向量与所述参数梯度之间的乘积;计算所述第四目标值与所述第五目标值之间的比值,并将所述第四目标值与所述第五目标值之间的比值作为所述第二优化方式的第二学习率。
所述在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率,还包括:对所述第二学习率进行平滑处理。
所述对所述第二学习率进行平滑处理,包括:确定第一权重;确定第二权重;确定上一次训练所述神经网络时、平滑处理之后的第二学习率;确定本次训练所述神经网络时、平滑处理之后的第二学习率为第六目标值与第七目标值之和,所述第六目标值为所述第一权重与上一次训练所述神经网络时、平滑处理之后的第二学习率之间的乘积,所述第七目标值为所述第二权重与本次训练所述神经网络时、平滑处理之前的第二学习率之间的乘积。
所述确定第一权重,包括:确定一阶动量、二阶动量;确定第八目标值与第九目标值,所述第八目标值为预设的第二数值与所述一阶动量的超参数之间的差值,所述第九目标值为预设的第三数值与所述二阶动量的超参数之间的差值;确定所述第八目标值的算术平方根与所述第九目标值的算术平方根之间的比值,并将所述第八目标值的算术平方根与所述第九目标值的算术平方根之间的比值作为第一权重。
所述确定所述第二学习率满足预设的更新条件,包括:确定学习率误差;确定平滑处理之后的第二学习率偏离所述学习率误差的偏差,并将该偏差作为学习率偏差;在所述学习率偏差小于预设的阈值的情况下,确定所述第二学习率满足预设的更新条件。
所述确定学习率误差,包括:确定平滑处理之后的第二学习率;确定目标超参数,所述目标超参数用于控制本次训练所述神经网络的第二学习率;确定所述平滑处理之后的第二学习率与第十目标值之间的比值,并将所述平滑处理之后的第二学习率与第十目标值之间的比值作为学习率误差,所述第十目标值为预设的第四数值与所述目标超参数之间的差值。
所述确定所述第二学习率偏离所述学习率误差的偏差,并将该偏差作为学习率偏差,包括:确定所述学习率误差与所述第二学习率之间的差值,并将所述学习率误差与所述第二学习率之间的差值作为第十一目标值;确定所述第十一目标值的绝对值,并将该绝对值作为学习率偏差。
示例性地,所述神经网络包括卷积神经网络CNN,所述第一优化方式包括自适应矩估计Adam,所述第二优化方式包括随机梯度下降SGD。
在本实施例中,由于神经网络的训练方式与实施例一、实施例二中相似,所以描述的比较简单,相关之处参见实施例一、实施例二的说明即可,本实施例在此不加以详述。
由于神经网络的训练较为复杂,因此,该神经网络可以离线在其他计算机设备训练,在神经网络训练完成之后,将该神经网络分发至当前计算机设备。
若当前计算机设备的性能较高,或者,如服务器等为其他计算机设备提供人脸检测服务,则可以直接在当前计算机设备训练该神经网络,本实施例对此不加以限制。
在本实施例中,接收图像数据,将图像数据输入至预设的神经网络中进行处理,以识别人脸数据在图像数据中所处的区域,由于通过学习率在同一向量空间中的映射,使得在不同的阶段可切换适合的优化方式训练神经网络,可以在不同阶段发挥适合的优化方式的优势,减低或避免其他优化方式产生的问题,同时满足两个或两个方面对于训练神经网络的需求,从而提高了该神经网络的性能,保证人脸检测的效果。
例如,训练网络时,在前一阶段使用自适应矩估计Adam训练神经网络,可保证训练神经网络的速度,实现快速下降收敛,在后一阶段使用随机梯度下降SGD训练的神经网络,可保证神经网络的泛化能力,神经网络的训练速度提高,可进而提高神经网络的更新速度,神经网络适应不同样本,可提高神经网 络进行人脸检测的精确度,并且,保证神经网络的泛化能力,可保证在相同样本情况下,神经网络进行人脸检测的精确度。
实施例四
图7为本发明实施例四提供的一种神经网络的训练装置的结构示意图,该装置可以包括如下模块:神经网络确定模块701,设置为确定神经网络;第一训练模块702,设置为根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;学习率映射模块703,设置为在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;切换确定模块704,设置为确定所述第二学习率满足预设的更新条件;第二训练模块705,设置为根据第二优化方式、以所述第二学习率继续训练所述神经网络。
本发明实施例所提供的神经网络的训练装置可执行本发明任意实施例所提供的神经网络的训练方法,具备执行方法相应的功能模块和效果。
实施例五
图8为本发明实施例三提供的一种人脸检测装置的结构示意图,该装置可以包括如下模块:图像数据接收模块801,设置为接收图像数据;人脸区域识别模块802,设置为将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,所述神经网络通过实施例四提供的神经网络的训练装置训练。
本发明实施例所提供的人脸检测装置可执行本发明任意实施例所提供的人脸检测方法,具备执行方法相应的功能模块和效果。
实施例六
图9为本发明实施例六提供的一种计算机设备的结构示意图。如图9所示,该计算机设备包括处理器900、存储器901、通信模块902、输入装置903和输出装置904;计算机设备中处理器900的数量可以是一个或多个,图9中以一个处理器900为例;计算机设备中的处理器900、存储器901、通信模块902、输入装置903和输出装置904可以通过总线或其他方式连接,图9中以通过总线连接为例。
本实施例提供的计算机设备,可执行本发明任一实施例提供的神经网络的 训练方法或人脸检测方法,具有相应的功能和效果。
实施例七
本发明实施例七还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现一种神经网络的训练方法,该方法包括:确定神经网络;根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;确定所述第二学习率满足预设的更新条件;根据第二优化方式、以所述第二学习率继续训练所述神经网络。
或者,该计算机程序被处理器执行时实现一种人脸检测方法,该方法包括:接收图像数据;将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,所述神经网络通过如下神经网络的训练方法训练:确定神经网络;根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;确定所述第二学习率满足预设的更新条件;根据第二优化方式、以所述第二学习率继续训练所述神经网络。
本发明实施例所提供的计算机可读存储介质,其计算机程序不限于如上所述的方法操作,还可以执行本发明任意实施例所提供的神经网络的训练方法或人脸检测方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本公开可借助软件及必需的通用硬件来实现,也可以通过硬件实现本公开可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多条指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开多个实施例所述的方法。
上述神经网络的训练装置或人脸检测装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多功能单元的名称也只是为了便于相互区分,并不用于限制本公开的保护范围。
Claims (16)
- 一种神经网络的训练方法,包括:确定神经网络;根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;确定所述第二学习率满足预设的更新条件;根据第二优化方式、以所述第二学习率继续训练所述神经网络。
- 根据权利要求1所述的方法,其中,所述在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率,包括:确定更新幅度,所述更新幅度表示在根据所述第一优化方式、以所述第一学习率训练所述神经网络的情况下,对第一网络参数进行更新的幅度,所述第一网络参数表示在根据所述第一优化方式、以所述第一学习率训练所述神经网络的情况下,所述神经网络的参数;确定第二网络参数的参数梯度,所述第二网络参数表示在根据第二优化方式、以第二学习率训练所述神经网络的情况下,所述神经网络的参数;在同一向量空间中,确定所述更新幅度在所述参数梯度上的投影,并将所述投影作为所述第二优化方式的第二学习率。
- 根据权利要求2所述的方法,其中,所述确定更新幅度,包括:确定一阶动量、二阶动量;确定第一目标值与第二目标值之间的比值,并将所述第一目标值与第二目标值之间的比值作为第三目标值,所述第一目标值为所述第一优化方式的第一学习率与所述一阶动量之间的乘积,所述第二目标值为所述二阶动量与预设的第一数值之和的算术平方根;确定所述第三目标值的相反数,并将所述第三目标值的相反数作为更新幅度。
- 根据权利要求2所述的方法,其中,所述在同一向量空间中,确定所述更新幅度在所述参数梯度上的投影,作为所述第二优化方式的第二学习率,包括:对所述更新幅度进行转置,获得目标向量;确定第四目标值、第五目标值,所述第四目标值为所述目标向量与所述更 新幅度之间的乘积,所述第五目标值为所述目标向量与所述参数梯度之间的乘积;计算所述第四目标值与所述第五目标值之间的比值,并将所述第四目标值与所述第五目标值之间的比值作为所述第二优化方式的第二学习率。
- 根据权利要求2所述的方法,其中,所述在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率,还包括:对所述第二学习率进行平滑处理。
- 根据权利要求5所述的方法,其中,所述对所述第二学习率进行平滑处理,包括:确定第一权重;确定第二权重;确定上一次训练所述神经网络时、平滑处理之后的第二学习率;确定本次训练所述神经网络时、平滑处理之后的第二学习率为第六目标值与第七目标值之和,所述第六目标值为所述第一权重与上一次训练所述神经网络时、平滑处理之后的第二学习率之间的乘积,所述第七目标值为所述第二权重与本次训练所述神经网络时、平滑处理之前的第二学习率之间的乘积。
- 根据权利要求6所述的方法,其中,所述确定第一权重,包括:确定一阶动量、二阶动量;确定第八目标值与第九目标值,所述第八目标值为预设的第二数值与所述一阶动量的超参数之间的差值,所述第九目标值为预设的第三数值与所述二阶动量的超参数之间的差值;确定所述第八目标值的算术平方根与所述第九目标值的算术平方根之间的比值,并将所述第八目标值的算术平方根与所述第九目标值的算术平方根之间的比值作为第一权重。
- 根据权利要求1-7任一所述的方法,其中,所述确定所述第二学习率满足预设的更新条件,包括:确定学习率误差;确定平滑处理之后的第二学习率偏离所述学习率误差的偏差,并将所述偏差作为学习率偏差;在所述学习率偏差小于预设的阈值的情况下,确定所述第二学习率满足预设的更新条件。
- 根据权利要求8所述的方法,其中,所述确定学习率误差,包括:确定平滑处理之后的第二学习率;确定目标超参数,所述目标超参数用于控制本次训练所述神经网络的第二学习率;确定所述平滑处理之后的第二学习率与第十目标值之间的比值,并将所述平滑处理之后的第二学习率与所述第十目标值之间的比值作为学习率误差,所述第十目标值为预设的第四数值与所述目标超参数之间的差值。
- 根据权利要求8所述的方法,其中,所述确定所述第二学习率偏离所述学习率误差的偏差,并将所述偏差作为学习率偏差,包括:确定所述学习率误差与所述第二学习率之间的差值,并将所述学习率误差与所述第二学习率之间的差值作为第十一目标值;确定所述第十一目标值的绝对值,并将所述绝对值作为学习率偏差。
- 根据权利要求1-10任一所述的方法,其中,所述神经网络包括卷积神经网络CNN,所述第一优化方式包括自适应矩估计Adam,所述第二优化方式包括随机梯度下降SGD。
- 一种人脸检测方法,包括:接收图像数据;将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,其中,所述神经网络通过权利要求1-11任一所述的神经网络的训练方法训练。
- 一种神经网络的训练装置,包括:神经网络确定模块,设置为确定神经网络;第一训练模块,设置为根据第一优化方式、以第一学习率训练所述神经网络,所述第一学习率在每次训练所述神经网络时更新;学习率映射模块,设置为在同一向量空间中,将所述第一优化方式的第一学习率映射为第二优化方式的第二学习率;切换确定模块,设置为确定所述第二学习率满足预设的更新条件;第二训练模块,设置为根据第二优化方式、以所述第二学习率继续训练所述神经网络。
- 一种人脸检测装置,包括:图像数据接收模块,设置为接收图像数据;人脸区域识别模块,设置为将所述图像数据输入至预设的神经网络中进行处理,以识别人脸数据在所述图像数据中所处的区域,其中,所述神经网络通过如权利要求13所述的神经网络的训练装置训练。
- 一种计算机设备,包括:至少一个处理器;存储器,设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-11中任一所述的神经网络的训练方法或如权利要求12所述的人脸检测方法。
- 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-11中任一所述的神经网络的训练方法或如权利要求12所述的人脸检测方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/780,840 US12136259B2 (en) | 2019-11-29 | 2020-08-20 | Method and apparatus for detecting face, computer device and computer-readable storage medium |
| EP20892477.9A EP4068160A4 (en) | 2019-11-29 | 2020-08-20 | Neural network training and face detection method and apparatus, and device and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911205613.8 | 2019-11-29 | ||
| CN201911205613.8A CN110942142B (zh) | 2019-11-29 | 2019-11-29 | 神经网络的训练及人脸检测方法、装置、设备和存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021103675A1 true WO2021103675A1 (zh) | 2021-06-03 |
Family
ID=69909096
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/110160 Ceased WO2021103675A1 (zh) | 2019-11-29 | 2020-08-20 | 神经网络的训练及人脸检测方法、装置、设备和存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12136259B2 (zh) |
| EP (1) | EP4068160A4 (zh) |
| CN (1) | CN110942142B (zh) |
| WO (1) | WO2021103675A1 (zh) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110942142B (zh) * | 2019-11-29 | 2021-09-17 | 广州市百果园信息技术有限公司 | 神经网络的训练及人脸检测方法、装置、设备和存储介质 |
| CN111580962A (zh) * | 2020-04-29 | 2020-08-25 | 安徽理工大学 | 一种具有权值衰减的分布式自适应在线学习方法 |
| CN111769603B (zh) * | 2020-07-13 | 2022-04-08 | 国网天津市电力公司 | 一种基于电-气耦合系统安全裕度的机组优化调度方法 |
| CN112183750A (zh) * | 2020-11-05 | 2021-01-05 | 平安科技(深圳)有限公司 | 神经网络模型训练方法、装置、计算机设备及存储介质 |
| CN113808197A (zh) * | 2021-09-17 | 2021-12-17 | 山西大学 | 一种基于机器学习的工件自动抓取系统及方法 |
| CN117830708B (zh) * | 2023-12-20 | 2024-07-30 | 北京斯年智驾科技有限公司 | 一种目标检测模型的训练方法、系统、装置及存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109190458A (zh) * | 2018-07-20 | 2019-01-11 | 华南理工大学 | 一种基于深度学习的小人头检测方法 |
| CN109272046A (zh) * | 2018-09-26 | 2019-01-25 | 北京科技大学 | 基于L2重新正则化Adam切换模拟回火SGD的深度学习方法 |
| US20190147305A1 (en) * | 2017-11-14 | 2019-05-16 | Adobe Inc. | Automatically selecting images using multicontext aware ratings |
| CN109978079A (zh) * | 2019-04-10 | 2019-07-05 | 东北电力大学 | 一种改进的堆栈降噪自编码器的数据清洗方法 |
| CN110414349A (zh) * | 2019-06-26 | 2019-11-05 | 长安大学 | 引入感知模型的孪生卷积神经网络人脸识别算法 |
| CN110942142A (zh) * | 2019-11-29 | 2020-03-31 | 广州市百果园信息技术有限公司 | 神经网络的训练及人脸检测方法、装置、设备和存储介质 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7069214B2 (en) * | 2001-02-26 | 2006-06-27 | Matsushita Electric Industrial Co., Ltd. | Factorization for generating a library of mouth shapes |
| US20200064444A1 (en) * | 2015-07-17 | 2020-02-27 | Origin Wireless, Inc. | Method, apparatus, and system for human identification based on human radio biometric information |
| US11003992B2 (en) | 2017-10-16 | 2021-05-11 | Facebook, Inc. | Distributed training and prediction using elastic resources |
| CN107909142A (zh) * | 2017-11-14 | 2018-04-13 | 深圳先进技术研究院 | 一种神经网络的参数优化方法、系统及电子设备 |
| CN109508678B (zh) * | 2018-11-16 | 2021-03-30 | 广州市百果园信息技术有限公司 | 人脸检测模型的训练方法、人脸关键点的检测方法和装置 |
| CN109947940B (zh) * | 2019-02-15 | 2023-09-05 | 平安科技(深圳)有限公司 | 文本分类方法、装置、终端及存储介质 |
-
2019
- 2019-11-29 CN CN201911205613.8A patent/CN110942142B/zh active Active
-
2020
- 2020-08-20 WO PCT/CN2020/110160 patent/WO2021103675A1/zh not_active Ceased
- 2020-08-20 EP EP20892477.9A patent/EP4068160A4/en active Pending
- 2020-08-20 US US17/780,840 patent/US12136259B2/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190147305A1 (en) * | 2017-11-14 | 2019-05-16 | Adobe Inc. | Automatically selecting images using multicontext aware ratings |
| CN109190458A (zh) * | 2018-07-20 | 2019-01-11 | 华南理工大学 | 一种基于深度学习的小人头检测方法 |
| CN109272046A (zh) * | 2018-09-26 | 2019-01-25 | 北京科技大学 | 基于L2重新正则化Adam切换模拟回火SGD的深度学习方法 |
| CN109978079A (zh) * | 2019-04-10 | 2019-07-05 | 东北电力大学 | 一种改进的堆栈降噪自编码器的数据清洗方法 |
| CN110414349A (zh) * | 2019-06-26 | 2019-11-05 | 长安大学 | 引入感知模型的孪生卷积神经网络人脸识别算法 |
| CN110942142A (zh) * | 2019-11-29 | 2020-03-31 | 广州市百果园信息技术有限公司 | 神经网络的训练及人脸检测方法、装置、设备和存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4068160A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110942142A (zh) | 2020-03-31 |
| US20230023271A1 (en) | 2023-01-26 |
| CN110942142B (zh) | 2021-09-17 |
| US12136259B2 (en) | 2024-11-05 |
| EP4068160A4 (en) | 2023-06-28 |
| EP4068160A1 (en) | 2022-10-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021103675A1 (zh) | 神经网络的训练及人脸检测方法、装置、设备和存储介质 | |
| US20210012198A1 (en) | Method for training deep neural network and apparatus | |
| Stewart et al. | Meta-learning spiking neural networks with surrogate gradient descent | |
| US10977549B2 (en) | Object animation using generative neural networks | |
| CN110622178A (zh) | 学习神经网络结构 | |
| Deo et al. | Predicting waves in fluids with deep neural network | |
| CN117669700B (zh) | 深度学习模型训练方法和深度学习模型训练系统 | |
| CN106407889A (zh) | 基于光流图深度学习模型在视频中人体交互动作识别方法 | |
| CN116976461A (zh) | 联邦学习方法、装置、设备及介质 | |
| CN113196303A (zh) | 不适当神经网络输入检测和处理 | |
| CN112561028B (zh) | 训练神经网络模型的方法、数据处理的方法及装置 | |
| US20250013877A1 (en) | Data Processing Method and Apparatus | |
| CN111104831B (zh) | 一种视觉追踪方法、装置、计算机设备以及介质 | |
| JP2022543245A (ja) | 学習を転移させるための学習のためのフレームワーク | |
| US20250225398A1 (en) | Data processing method and related apparatus | |
| CN115423016A (zh) | 多任务预测模型的训练方法、多任务预测方法及装置 | |
| WO2021042857A1 (zh) | 图像分割模型的处理方法和处理装置 | |
| WO2022156475A1 (zh) | 神经网络模型的训练方法、数据处理方法及装置 | |
| US20130148879A1 (en) | Information processing apparatus, information processing method, and program | |
| CN113971733A (zh) | 一种基于超图结构的模型训练方法、分类方法及装置 | |
| WO2021142904A1 (zh) | 视频分析方法及其相关的模型训练方法、设备、装置 | |
| CN114169393A (zh) | 一种图像分类方法及其相关设备 | |
| CN115169548A (zh) | 基于张量的持续学习方法和装置 | |
| CN114186097A (zh) | 用于训练模型的方法和装置 | |
| CN115239760B (zh) | 一种目标跟踪方法、系统、设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20892477 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020892477 Country of ref document: EP Effective date: 20220629 |