CN117636438A

CN117636438A - Emotion recognition and model training method and device, electronic equipment and storage medium

Info

Publication number: CN117636438A
Application number: CN202311685296.0A
Authority: CN
Inventors: 项正阳
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2023-12-08
Filing date: 2023-12-08
Publication date: 2024-03-01

Abstract

The invention discloses an emotion recognition and model training method, an emotion recognition and model training device, electronic equipment and a storage medium, which mainly comprise the following steps: acquiring a plurality of sample face images corresponding to various emotions, and inputting the plurality of sample face images into an emotion recognition model to be trained, wherein the emotion recognition model comprises a feature extraction neural network and a classifier model; based on a whale optimization algorithm, performing repeated iterative optimization on each parameter of the feature extraction neural network by using a surrounding-stage whale position updating formula updated each time according to real-time information entropy of whale individuals to obtain an optimized feature extraction neural network; and outputting the plurality of sample face image feature data according to the plurality of sample face images by utilizing the optimized feature extraction neural network, and performing optimization training according to the plurality of sample face image feature data classifier models to obtain an optimized classifier model. According to the emotion recognition method and device, the emotion recognition model with high recognition precision can be obtained, and then the emotion can be accurately recognized.

Description

Emotion recognition and model training method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for emotion recognition and model training, an electronic device, and a storage medium.

Background

In the prior art, when emotion recognition is performed according to a facial image, the recognition is performed by using a deep learning technology, the specific principle of deep learning is that a network model is utilized, a plurality of nonlinear converters and various neural network structures are adopted to extract depth features, and finally, the whole training process can be completed in an end-to-end mode. However, in the prior art, a gradient descent algorithm is mostly adopted to perform network parameter optimization when a network model is trained at the heart, however, gradient descent and gradient explosion may occur by using the gradient descent algorithm, so that the parameter optimization is in a local optimal solution condition, and therefore, when the model obtained by training is used for emotion recognition, the recognition accuracy is poor.

Disclosure of Invention

The embodiment of the invention provides an emotion recognition and model training method, an emotion recognition and model training device, electronic equipment and a storage medium, which can avoid gradient elimination and explosion or sinking into a local optimal solution in the process of training to obtain an emotion recognition model, obtain an emotion recognition model with higher recognition precision, and further accurately recognize emotion corresponding to a face image to be recognized.

In a first aspect, an embodiment of the present invention provides a training method for emotion recognition models, including:

Obtaining a plurality of standard face images corresponding to each emotion in a plurality of emotions to obtain a plurality of sample face images, and inputting the plurality of sample face images into an emotion recognition model to be trained, wherein the emotion recognition model comprises a feature extraction neural network and a classifier model;

based on a whale optimization algorithm, performing repeated iterative optimization on each parameter of the characteristic extraction neural network by using a whale position updating formula, the plurality of sample face images and the corresponding emotion types in a surrounding stage updated each time according to real-time information entropy of a whale individual to obtain whale algorithm optimization neural network parameters, and obtaining an optimization characteristic extraction neural network by using the whale algorithm optimization neural network parameters;

utilizing the optimized feature extraction neural network to output a plurality of sample face image feature data according to the plurality of sample face images, and performing optimized training on the classifier model according to the plurality of sample face image feature data and the corresponding emotion types to obtain an optimized classifier model; and

and extracting a neural network by using the optimized features and combining the optimized classifier models to obtain the emotion recognition model.

In a second aspect, an embodiment of the present invention provides an emotion recognition method, including:

acquiring a face image to be identified;

inputting the face image to be recognized into the emotion recognition model obtained by training by the emotion recognition model training method in any one of the embodiments of the invention;

and identifying the emotion corresponding to the face image to be identified by using the emotion identification model.

In a third aspect, an embodiment of the present invention provides an emotion recognition model training device, including:

the system comprises a sample face image acquisition and input module, a feature extraction neural network and a classifier model, wherein the sample face image acquisition and input module is used for acquiring a plurality of standard face images corresponding to each emotion in a plurality of emotions to obtain a plurality of sample face images, and inputting the plurality of sample face images into the emotion recognition model to be trained, and the emotion recognition model comprises the feature extraction neural network and the classifier model;

the optimized feature extraction neural network acquisition module is used for carrying out repeated iterative optimization on each parameter of the feature extraction neural network by utilizing a surrounding stage whale position updating formula, the plurality of sample face images and the corresponding emotion types which are updated each time according to the real-time information entropy of a whale individual based on a whale optimization algorithm to obtain whale algorithm optimized neural network parameters, and acquiring the optimized feature extraction neural network by utilizing the whale algorithm optimized neural network parameters;

The optimized classifier model acquisition module is used for outputting a plurality of sample face image feature data according to the plurality of sample face images by utilizing the optimized feature extraction neural network, and performing optimized training on the classifier model according to the plurality of sample face image feature data and the corresponding emotion types to obtain an optimized classifier model; and

and the emotion recognition model acquisition module is used for utilizing the optimized feature extraction neural network and the optimized classifier model combination to obtain the emotion recognition model.

In a fourth aspect, an embodiment of the present invention provides an emotion recognition device, including:

the face image acquisition module is used for acquiring a face image to be identified;

the input module is used for inputting the face image to be recognized into the emotion recognition model obtained by training the emotion recognition model training method in any one of the embodiments of the invention;

and the identification module is used for identifying the emotion corresponding to the face image to be identified by utilizing the emotion identification model.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the emotion recognition model training method or the emotion recognition method according to any one of the embodiments of the present invention when executing the program.

In a sixth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the program when executed by a processor implements an emotion recognition model training method or an emotion recognition method according to any one of the embodiments of the present invention.

According to the emotion recognition and model training method, device, electronic equipment and storage medium, through utilizing the whale optimization algorithm for updating the surrounding stage position updating formula each time according to the real-time information entropy, parameters of the neural network are extracted from the characteristics of the emotion recognition model to be trained and optimized, gradient elimination and explosion or partial optimal solution in the process of optimizing model parameters can be avoided, recognition precision of the emotion recognition model can be improved, and when emotion recognition is carried out by utilizing the emotion recognition model, the emotion recognition model is not easily influenced by quality of a face image to be recognized, and emotion corresponding to the face image can be accurately recognized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and that other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an emotion recognition model training method according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of the emotion recognition model training method according to the embodiment of the present invention;

FIG. 3 is another schematic flow chart of the emotion recognition model training method according to the embodiment of the present invention;

FIG. 4 is a schematic flow chart of an emotion recognition method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a training device for emotion recognition models according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a emotion recognition device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Fig. 1 is a schematic flow chart of an emotion recognition model training method provided by an embodiment of the present invention, where the method may be performed by an emotion recognition model training device provided by an embodiment of the present invention, and the device may be implemented in a software and/or hardware manner. In a specific embodiment, the apparatus may be integrated in an electronic device, such as a computer, a server, etc. The following embodiments will be described taking the integration of the device in an electronic apparatus as an example. Referring to fig. 1, the method may specifically include the steps of:

Step 101, obtaining a plurality of standard facial images corresponding to each emotion in a plurality of emotions to obtain a plurality of sample facial images, and inputting the plurality of sample facial images into an emotion recognition model to be trained, wherein the emotion recognition model comprises a feature extraction neural network and a classifier model. The method can be used for training the feature extraction neural network and the classifier model of the emotion recognition model according to the plurality of sample face images.

Specifically, the above-mentioned emotions include categories of "happy", "hard to get through", "panic", and the like.

In an optional embodiment of the present invention, the process of obtaining a plurality of standard face images corresponding to each of a plurality of emotions to obtain a plurality of sample face images includes: and obtaining a plurality of facial expression images corresponding to each emotion, and sequentially carrying out emotion type labeling, noise reduction treatment, illumination compensation, face feature detection, effective face region segmentation and standardization treatment on each facial expression image to obtain a plurality of standard facial images corresponding to each emotion.

Specifically, labeling the facial expression image can be beneficial to training a model according to the labeled label; the preprocessing including noise reduction processing and illumination compensation is carried out on the facial expression image, so that the phenomenon that the original facial expression image is not standard due to factors such as region size, illumination condition, shielding, blurring degree and rotation angle can be avoided; facial expression images are subjected to facial image area processing including face feature detection, effective face area segmentation and standardization processing, so that the problems that the positions of faces in complex background scenes are uncertain and the sizes of images are inconsistent can be avoided, and subsequent model training is facilitated.

Specifically, the plurality of facial expression images may be acquired by using an imaging apparatus, or may be acquired from a network or the like.

Optionally, the noise reduction process may include: and carrying out noise reduction treatment on the picture by adopting a self-adaptive wiener filtering method.

In the adaptive wiener filtering, the main effect is that the local variance of the input image, the smoothing effect of the filter is strong for large local variance, and conversely, the smoothing effect of the filter is weak for small local variance. Adaptive wiener filteringThe wave is more accurate in noise selection, and pixel information such as the edges of pictures is better protected. For the MXN neighborhood of a pixel a (x, y) (where x, y are the coordinates of the pixel), the adaptive wiener filtering needs to estimate the mean μ and variance σ of the local matrix of the pixel ² The following formula:

where η is the m×n neighborhood range.

Then, the gray value needs to be estimated for each pixel using an adaptive wiener filter, as follows:

wherein v is ² Is the variance of the image noise.

Optionally, the illumination compensation process includes: and carrying out illumination compensation on the picture by adopting a histogram equalization method.

Specifically, the distribution of illumination of a plurality of facial expression images is uneven, which is contrary to the assumption condition of a face detection algorithm, the quality of pictures under even illumination can be higher, the contrast and definition of the images can be increased, and the accuracy of the later extracted features is improved.

Specifically, in an image, the number of pixels with different gray values is different, and the histogram is a graph showing the distribution of the different gray values in the overall pixels, and its abscissa represents the gray values and its ordinate represents the probability of occurrence of a certain gray value. Although the histogram cannot describe the positions of pixels and the contents of the displayed image in detail, it can describe well the gradation distribution characteristics of the image from which the gradation distribution map related to the overall contrast (ratio of the maximum and minimum brightness), brightness, etc. of the image and the image quality can be obtained. The brightness of the imaging system is extensive, and often the visual effect of the image is poor due to insufficient contrast, so that the pixel gray level is changed before histogram equalization. In the invention, nonlinear gray level conversion is adopted to carry out gray level conversion, and the pixel gray level value g (x, y) after conversion is expressed as follows under the assumption that the pixel gray level value of an input image at coordinates (x, y) is f (x, y):

g(x,y)＝c(f(x,y)+e) ^γ

Wherein c is a variable parameter, e is a compensation coefficient, a certain range can be selectively highlighted, gamma is a gamma coefficient, when gamma is smaller than 1, the contrast of a low gray area is increased, when gamma is larger than 1, the contrast of a high gray area is increased, when gamma is equal to 1, the effect of logarithmic transformation is achieved, and the transformed image is p (r) _k )。

The transformed image is subjected to histogram equalization, and the gray value contained in the image M is expressed as r assuming that the maximum gray level number of the image M is L _k (k=0, 1, …, L-1), if n (r _k ) Is the gray level r in a gray image _k The number of occurrences, then the histogram of the image M represents the following formula:

wherein N is the total number of pixels in the total image M, andat this time, the cumulative probability of the pixel distribution of the image M is expressed as follows:

the cumulative probability is used as a gray level transformation function T (r _k ) Let the gray level be r _k Is mapped to a gray level S _k On the pixel of (2), the pixel gray value S of the image M is output _k Is determined by the following formula:

optionally, the face feature detection process includes: and carrying out face feature detection based on a feature space method.

Optionally, the process of face feature detection by the method based on feature space includes: face features are detected based on features of the human eye.

Specifically, the core of the feature space-based method is to search the invariant features in the face to judge the existence of the face, wherein the invariant features of the face are mainly the positions of five sense organs. In the invention, the human face detection is carried out by adopting a human eye detection method, in the human face, the positions and the shapes of eyes are basically stable, the positions of other organs can be predicted according to the fuzzy positions of the eyes, and the size of the human face can be roughly estimated according to the size of the eyes. Firstly, human eyes are positioned, and then human face areas are divided according to the central coordinates of the two eyes and the prior proportional knowledge of the human faces.

Because the projection of the gray scale of the face area in the horizontal and vertical directions is regular, the gray scale enhancement image of the face image is projected in the horizontal and vertical directions respectively, a minimum value appears on the left and right sides of the middle part of the image in the horizontal direction respectively, the positions of the eyes in the vertical direction are still minimum values corresponding to the left and right eyes, and the nose bridge area presents a maximum value. And then judging and adjusting the size and the position of the rectangular area of the human face by using the priori knowledge of the distribution of the eyes on the human face.

Optionally, the process of detecting the face feature based on the feature of the human eye includes: and carrying out binarization processing on each facial expression image to obtain a binarized image corresponding to each facial expression image, and then detecting facial features in the binarized image according to the binarized image corresponding to each facial expression image.

Optionally, a maximum inter-class variance threshold segmentation method is adopted to carry out binarization processing on each facial expression image.

Specifically, a pixel value R is selected, the pixel values on both sides of R are divided into two groups, and when the variance of the two groups of pixel values is maximum, the threshold value R is determined. Dividing pixels into two groups by R, S ₀ = (1 to R) and S ₁ = (r+1 to L), the probabilities of the two groups are η, respectively ₀ And eta ₁ The average value of the two groups is lambda ₀ And lambda (lambda) ₁ The formula is as follows:

wherein,is the average gray level of the whole image,/>Is the average gray scale when the threshold value is R, and S is ₀ 、S ₁ The variance between is expressed as:

r, which is obtained by maximizing the above equation in 1 to L, is calculated to obtain a threshold value.

Optionally, the process of detecting the facial features according to the binarized image corresponding to each facial expression image includes: the eyes are positioned according to the binarized image, and then the positions of the face areas are determined according to the positions of the eyes and the proportions of facial organs.

Specifically, the human eyes can be accurately positioned by carrying out integral projection on the binarized image. Let G (x, y) represent the gray value at (x, y) at image [ y ] ₁ ,y ₂ ]Horizontal integral projection of regions H (x) and [ x ] ₁ ,x ₂ ]The vertical integral projection V (y) of the region is expressed as the following formula:

in determining the positions of other facial organs based on the positions of the eyes and the proportions of the facial organs, the face region width may be first determined assuming that the left eye coordinates (x ₁ ,y ₁ ) The right eye coordinates are (x ₂ ,y ₂ ). Distance d between the center points of the eyes ₁ The expression formula is as follows:

d ₁ ＝x ₂ -x ₁

due to d ₁ The length of (c) includes left half eye, right half eye and middle eye, so d ₁ The distance/2 is the length of one eye, and the left and right boundaries X of the face area ₁ ，X ₂ Expressed as the following formula:

substituting the central coordinate values of the left eye and the right eye into the above formula to obtain the specific values of the left boundary and the right boundary.

In determining the face region length, it is assumed that the distance is d ₂ The calculation formula is as follows:

d ₂ ＝y ₄ -y ₃

the hairline is located at position (x ₃ ,y ₃ ) The position of the eyebrow is (x ₄ ,y ₄ ) Then the upper and lower boundaries Y of the face region ₁ ,Y ₂ Can be expressed as the following formula:

Y ₁ ＝y ₃

Y ₂ ＝y ₃ +3d ₂

and substituting the numerical value into the above formula to obtain a specific value of the upper and lower boundaries, so that the position of the face region can be determined.

Optionally, the foregoing process of dividing the effective face area includes: and dividing the face image area from the corresponding facial expression image according to the position of the face area to obtain a plurality of divided face area images.

Optionally, the process of the normalization includes: and carrying out scale normalization on the plurality of segmented face region images to obtain a plurality of standardized face images with consistent sizes, namely a plurality of sample face images.

Specifically, one segmented face region image of the plurality of segmented face image regions may be manually calibrated to 18 feature points, the image may be scaled to a size of 160×160 pixels, and then the positions of the feature points corresponding to the remaining face region images may be calibrated to a standard face and scaled to a size.

Step 102, based on a whale optimization algorithm, performing repeated iterative optimization on each parameter of a feature extraction neural network by using a whale position updating formula, a plurality of sample face images and corresponding emotion types in a surrounding stage updated each time according to real-time information entropy of a whale individual to obtain whale algorithm optimization neural network parameters, and obtaining an optimization feature extraction neural network by using the whale algorithm optimization neural network parameters. According to the method, parameters of the feature extraction neural network can be optimized by using a whale optimization algorithm based on information entropy, the phenomenon of gradient descent or gradient explosion during parameter optimization by using a gradient descent algorithm can be avoided, meanwhile, the situation of sinking into local optimization is avoided, and therefore optimal neural network parameters are obtained, and further, an optimized feature extraction neural network with higher precision is obtained, so that emotion recognition models with higher precision can be obtained.

Specifically, the number of layers of the feature extraction neural network may be 5 layers.

Specifically, the parameters of the feature extraction neural network may include weights and node thresholds of nodes of the feature extraction neural network.

Alternatively, the real-time information entropy of the whale individual may be calculated according to an information entropy calculation method in the prior art.

Optionally, the process of obtaining the optimized feature extraction neural network by using the whale algorithm to optimize the neural network parameters includes: and directly determining the parameters of the whale algorithm optimized neural network as parameters of the feature extraction neural network to obtain the optimized feature extraction neural network.

Optionally, taking the difference between the predicted value and the true value of the sample face image feature data output by the reduced feature extraction neural network according to each sample face image as a target, and performing repeated iterative optimization on each parameter of the feature extraction neural network by using a whale position updating formula, a plurality of sample face images and corresponding emotion types in a surrounding stage updated each time according to real-time information entropy of a whale individual based on a whale optimization algorithm.

Specifically, the whale optimizing algorithm is a novel meta-heuristic search optimizing algorithm provided according to a hunting mode of whales, and a mathematical model of 3 stages is built according to 3 behaviors of whales for surrounding the hunting object, bubble net attack and hunting object. The method specifically comprises the following steps:

The first stage: surrounding stage

Within the search space, the location of each whale represents one potential solution to the optimization problem sought. Assuming that whales identify and surround prey and that the optimal solution is unknown, the current best candidate solution generated in the iterative process is the solution closest to the global optimal solution. Other whale individuals can update their own positions according to the positions of the current best candidate solutions, and the position update formula is as follows:

X(t+1)＝X ^* (t)-A·D

D＝|C·X ^* (t)-X(t)|

C＝2r

A＝2a·r-a

wherein t represents the number of iterations, A, C represents the vector coefficient, X (t) represents the vector position, X ^* (t) represents the current best candidate solution at time t, a is the vector linearly decreasing from 2 to 0 in the iterative process, and r is [0,1]Is a random vector of (c).

And a second stage: stage of bubble network attack

The bubble network attack stage has two mathematical modeling schemes of a contraction surrounding mechanism and a spiral position updating mechanism.

(1) Shrink wrap mechanism. By decreasing the value of a, a is decreased. Whale can define a new location of the search agent at any location between the original location and the current location.

(2) The position is updated helically. A spiral action mimicking whale predation, creating a spiral equation between the prey and whale:

X(t+1)＝D′·e ^bl ·cos(2πl)+X ^* (t)

D′＝|X ^* (t)-X(t)|

wherein D' represents the distance from the prey to the ith whale, b is a constant that determines the shape of the helix, and l is a random number of [ -1,1 ].

Assuming that the update probabilities between the two models are equal when whale updates the position, there are:

wherein P is a random number of [0,1 ].

And a third stage: search for prey stage

In the stage of searching for a prey,first, define whale individuals asThe formula is as follows:

where k is the spatial dimension, rand (k, 1) refers to a randomly generated k-dimensional random vector, and is the Euclidean distance.

The search direction is then determined. The whale is provided with a left-right direction sensor respectivelyAnd->The update formula is as follows:

wherein d (t) represents the distance between the left and right direction sensors at the t-th iteration; f (·) is the fitness value; r=2. Then, by setting |A|more than or equal to 1, the positions of other whale individuals are updated by taking randomly selected whale individuals as references, and the whale individuals are forced to be far away from the reference whale to find other more suitable positions, so that the algorithm can search in a global scope. Its model can be expressed as:

D＝|C·X _rand -X|

X(t+1)＝X _rand -A·D

in the middle of，X _rand (t) is a random position vector, i.e. a random head of whale.

And 103, outputting a plurality of sample face image feature data according to the plurality of sample face images by utilizing an optimized feature extraction neural network, and performing optimization training on the classifier model according to the plurality of sample face image feature data and the corresponding emotion types to obtain an optimized classifier model. The method can be favorable for obtaining the emotion recognition model by utilizing the optimized classifier model combination.

Specifically, the classification model may classify the sample after feature extraction by using a Softmax classification function. The Softmax function formula is as follows:

wherein Y is _i Representing the ith element in the feature vector, softmax functionAnd mapping the elements in the input vector into a (0, 1) interval to obtain a probability vector of the input vector, wherein the output category of emotion recognition is a category corresponding to the maximum probability value in the probability vector obtained by Softmax function mapping.

And 104, extracting a neural network by using the optimized features and combining the optimized classifier models to obtain an emotion recognition model. According to the method, parameters of a neural network extracted from characteristics of an emotion recognition model to be trained are optimized by utilizing a whale optimization algorithm for updating a surrounding stage position update formula each time according to real-time information entropy, gradient elimination and explosion or sinking into a local optimal solution in the process of model parameter optimization can be avoided, and further recognition accuracy of the emotion recognition model can be improved, so that the emotion recognition model is not easily influenced by quality of a face image to be recognized when the emotion recognition model is utilized for emotion recognition, and emotion corresponding to the face image can be accurately recognized.

The emotion recognition model training method is further described below, as shown in fig. 2, that is, step 102 in fig. 1 may include the following steps:

and 1021, determining a predictive value and true value root mean square error equation of the characteristic data of the sample face image output by the computing characteristic extraction neural network according to each sample face image as a fitness function of a whale algorithm.

Specifically, the root mean square error equation of the predicted value and the true value of the feature data of the sample face image output by the computing feature extraction neural network according to each sample face image may be expressed as:

wherein E represents the square sum of the systematic errors, E represents the systematic root mean square error, n represents the number of hidden layer nodes, and P _i Represents the predicted value, O _i Representing the true value.

Step 1022, calculating the real-time information entropy of the next iteration optimization according to the fitness value of the fitness function obtained by each iteration optimization, and updating the whale position updating formula of the surrounding stage corresponding to each iteration optimization according to the real-time information entropy of the next iteration optimization to obtain the whale position updating formula of the surrounding stage corresponding to the next iteration optimization.

Optionally, the process of calculating the real-time information entropy of the next iteration optimization according to the fitness value of the fitness function obtained by each iteration optimization includes calculating the real-time information entropy by using the following formula:

Wherein, fit _i The adaptation value of whales is given, and n is the number of whales. The entropy value of each whale individual can be expressed as:

E(fit)＝-p _i ln p _i 。

in an optional embodiment of the present invention, the process of updating the surrounding stage whale position update formula corresponding to each iteration optimization according to the real-time information entropy of the next iteration optimization to obtain the surrounding stage whale position update formula corresponding to the next iteration optimization includes:

and calculating to obtain the next iteration adjustment parameters according to the real-time information entropy optimized by the next iteration and the maximum possible value of the real-time information entropy.

Optionally, the process of calculating the next iteration adjustment parameter according to the real-time information entropy optimized by the next iteration and the maximum possible value of the real-time information entropy may be performed by using the following formula:

wherein H is _max Representing the maximum possible value of the entropy of the real-time information, p for each individual _i Are allAnd the information entropy during the next iteration is H, and the information entropy is the real-time information entropy optimized by the next iteration.

In an optional embodiment of the present invention, the process of updating the surrounding stage whale position update formula corresponding to each iteration optimization according to the real-time information entropy of the next iteration optimization to obtain the surrounding stage whale position update formula corresponding to the next iteration optimization includes: and adjusting the whale position updating formula of the surrounding stage corresponding to each iteration optimization by using the next iteration adjustment parameters to obtain the whale position updating formula of the surrounding stage corresponding to the next iteration optimization.

Optionally, the next iteration adjustment parameter is substituted into the surrounding stage whale position update formula corresponding to each iteration optimization to adjust the whale position update formula, for example:

will be

Substitution formula X (t+1) =x _rand -a.d, obtaining the update of whale positions at the surrounding stage corresponding to the next iteration optimization:

wherein, delta is a step factor, which is set by human. η changes along with the change of H, when the H value is larger, population distribution is denser, and η is larger at the moment, the step length to random individuals is increased, so that the population can be scattered, the diversity of the population is increased, and local optimum of trapped people is avoided; when the H value is smaller, the population distribution is more dispersed, and eta is smaller at the moment, so that the population can search the optimal solution in the solution space. Eta is introduced into the whole searching process, the convergence process of the algorithm is changed through the change of the information entropy, the self-adaptive adjustment is realized, and the premature convergence of the algorithm is avoided.

Step 1023, performing next iteration optimization on each parameter of the feature extraction neural network by using a whale position updating formula, a plurality of sample face images and corresponding emotion types in a surrounding stage corresponding to next iteration optimization until the fitness value of the fitness function is not greater than a preset fitness value threshold value, so as to obtain whale algorithm optimization neural network parameters.

According to the embodiment of the invention, the root mean square error equation of the predicted value and the true value of the feature extraction neural network is set as the fitness function, the real-time information entropy of the next iteration optimization is calculated according to the fitness value of the fitness function obtained by each iteration, and then the next iteration adjustment parameter is calculated according to the real-time information entropy so as to update the whale position updating formula of the surrounding stage of each iteration, so that the whale position updating formula of the surrounding stage corresponding to the next iteration optimization is obtained, and the whole iteration optimization process can be accurately and effectively carried out.

Further describing the emotion recognition model training method, in the embodiment of the present invention, as shown in fig. 3, that is, step 102 in fig. 1 may include the following steps:

and step 1024, determining parameters of the feature extraction neural network as parameters of the whale algorithm optimization neural network to obtain the preliminary optimization feature extraction neural network.

Specifically, in the embodiment of the invention, the parameters are root mean square errors corresponding to the preliminary optimization feature extraction neural network of the whale algorithm optimization neural network parameters, and the root mean square errors are larger than the target root mean square errors corresponding to the optimization feature extraction neural network. For example: if the target root mean square error corresponding to the optimized feature extraction neural network is E=1×10 ^-4 And the root mean square error corresponding to the preliminary optimization feature extraction neural network is 0.01.

Step 1025, further optimizing parameters of the preliminary optimized feature extraction neural network based on the gradient descent algorithm to obtain the optimized feature extraction neural network.

Optionally, the process of further optimizing the parameters of the preliminary optimized feature extraction neural network based on the gradient descent algorithm to obtain the optimized feature extraction neural network includes: and (3) continuing to perform iterative optimization on the parameters of the preliminary optimization feature extraction neural network by using a gradient descent algorithm until the corresponding root mean square error reaches the target root mean square error.

Specifically, the traditional unified neural network randomly selects initial weights and threshold values, local convergence minimum points are easy to occur, so that the fitting effect is reduced, the whale optimization algorithm is lower than the traditional gradient descent algorithm in parameter optimizing speed, the information entropy-based multi-directional whale algorithm initial weights and initial threshold values are adopted for comprehensively considering optimizing speed and accuracy, and then the traditional gradient descent algorithm is used for centralized training and optimizing, so that the problem of local minimum points can be solved, the predicting precision of the neural network algorithm is improved, and meanwhile the optimizing speed is prevented from being too slow.

Fig. 4 is a schematic flow chart of an emotion recognition method according to an embodiment of the present invention, where the method may be performed by an emotion recognition device according to an embodiment of the present invention, and the device may be implemented in software and/or hardware. In a specific embodiment, the apparatus may be integrated in an electronic device, such as a computer, a server, etc. The following embodiments will be described taking the integration of the device in an electronic apparatus as an example. Referring to fig. 4, the method may specifically include the steps of:

step 401, acquiring a face image to be recognized.

Optionally, the process of acquiring the face image to be identified includes:

and shooting by using an image pickup device to obtain a facial expression image to be identified, and sequentially performing emotion category labeling, noise reduction treatment, illumination compensation, face feature detection, effective face region segmentation and standardization treatment on the facial expression image to be identified to obtain the face image to be identified.

Step 402, inputting the face image to be recognized into the emotion recognition model obtained by training by the emotion recognition model training method according to any one of the embodiments of the present invention.

And step 403, recognizing emotion corresponding to the face image to be recognized by using the emotion recognition model.

According to the emotion recognition model training method, the emotion recognition model is trained to recognize the emotion type corresponding to the face image to be recognized, the recognition process is not easily affected by the quality of the face image to be recognized, and the emotion corresponding to the face image can be accurately recognized.

Fig. 5 is a block diagram of an emotion recognition model training device according to an embodiment of the present invention, where the device is adapted to execute the emotion recognition model training method according to the embodiment of the present invention. As shown in fig. 5, the apparatus may specifically include:

the sample face image obtaining and inputting module 501 is configured to obtain a plurality of standard face images corresponding to each emotion in a plurality of emotions to obtain a plurality of sample face images, and input the plurality of sample face images into an emotion recognition model to be trained, where the emotion recognition model includes a feature extraction neural network and a classifier model. The feature extraction neural network and the classifier model of the emotion recognition model can be trained according to a plurality of sample face images.

The optimized feature extraction neural network obtaining module 502 is configured to obtain parameters of a whale algorithm optimized neural network by performing multiple iterative optimization on each parameter of the feature extraction neural network based on a whale optimization algorithm by using a whale position update formula, a plurality of sample face images and corresponding emotion types in a surrounding stage updated each time according to real-time information entropy of a whale individual, and obtain the optimized feature extraction neural network by using the parameters of the whale algorithm optimized neural network. The information entropy-based whale optimization algorithm can be used for optimizing parameters of the feature extraction neural network, the phenomenon of gradient descent or gradient explosion during parameter optimization by using the gradient descent algorithm can be avoided, and meanwhile, the situation of sinking into local optimization is avoided, so that the optimal neural network parameters are obtained, and further, the optimized feature extraction neural network with higher precision is obtained, and the emotion recognition model with higher precision is obtained.

The optimized classifier model obtaining module 503 is configured to obtain a plurality of sample face image feature data according to a plurality of sample face image outputs by using an optimized feature extraction neural network, and perform optimized training on a classifier model according to the plurality of sample face image feature data and the corresponding emotion types to obtain an optimized classifier model. The emotion recognition model can be obtained by utilizing the optimized classifier model combination.

And the emotion recognition model acquisition module 504 is used for obtaining an emotion recognition model by utilizing the optimized feature extraction neural network and the optimized classifier model combination. The modules 501-503 can be combined, parameters of the neural network extracted from the characteristics of the emotion recognition model to be trained are optimized by using a whale optimization algorithm for updating a surrounding stage position update formula each time according to real-time information entropy, gradient elimination and explosion or sinking into a local optimal solution in the process of model parameter optimization can be avoided, and further the recognition precision of the emotion recognition model can be improved, so that the emotion recognition model is not easily influenced by the quality of a face image to be recognized when the emotion recognition model is used for emotion recognition, and the emotion corresponding to the face image can be accurately recognized.

Optionally, the sample facial image acquiring and inputting module 501 may be specifically configured to acquire a plurality of facial expression images corresponding to each emotion; and sequentially carrying out emotion type labeling, noise reduction processing, illumination compensation, face feature detection, effective face region segmentation and standardization processing on each facial expression image to obtain a plurality of standard facial images corresponding to each emotion.

Optionally, the above-mentioned optimized feature extraction neural network obtaining module 502 may be specifically configured to determine, as the fitness function of the whale algorithm, a predictive value and a true value root mean square error equation of the feature data of the sample face image output by the computing feature extraction neural network according to each sample face image;

calculating real-time information entropy of next iteration optimization according to the fitness value of the fitness function obtained by each iteration optimization, and updating a surrounding stage whale position updating formula corresponding to each iteration optimization according to the real-time information entropy of next iteration optimization to obtain a surrounding stage whale position updating formula corresponding to the next iteration optimization; and

and performing next iteration optimization on each parameter of the feature extraction neural network by using a whale position updating formula in a surrounding stage corresponding to next iteration optimization, a plurality of sample face images and corresponding emotion types until the fitness value of the fitness function is not larger than a preset fitness value threshold value, so as to obtain whale algorithm optimization neural network parameters.

Optionally, the above-mentioned optimization feature extraction neural network obtaining module 502 may be specifically configured to calculate, according to the real-time information entropy optimized by the next iteration and the maximum possible value of the real-time information entropy, obtain the next iteration adjustment parameter; and

and adjusting the whale position updating formula of the surrounding stage corresponding to each iteration optimization by using the next iteration adjustment parameters to obtain the whale position updating formula of the surrounding stage corresponding to the next iteration optimization.

Optionally, the above-mentioned optimized feature extraction neural network obtaining module 502 may be specifically configured to determine parameters of a feature extraction neural network as parameters of a whale algorithm optimized neural network to obtain a preliminary optimized feature extraction neural network; and

and further optimizing parameters of the preliminary optimized feature extraction neural network based on a gradient descent algorithm to obtain the optimized feature extraction neural network.

Fig. 6 is a block diagram of an emotion recognition device according to an embodiment of the present invention, where the device is adapted to execute the emotion recognition method according to the embodiment of the present invention. As shown in fig. 6, the apparatus may specifically include:

the face image to be identified acquisition module 601 is configured to acquire a face image to be identified.

The input module 602 is configured to input a face image to be recognized into an emotion recognition model obtained by training by the emotion recognition model training method according to any one of the embodiments of the present invention.

The recognition module 603 is configured to recognize emotion corresponding to the face image to be recognized by using the emotion recognition model.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working process of the functional module described above may refer to the corresponding process in the foregoing method embodiment, and will not be described herein.

The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the emotion recognition model training method or the emotion recognition method provided by any one of the embodiments when executing the program.

The embodiment of the invention also provides a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the emotion recognition model training method or emotion recognition method provided in any of the above embodiments.

Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units involved in the embodiments of the present invention may be implemented in software, or may be implemented in hardware. The described modules and/or units may also be provided in a processor, e.g., may be described as: the processor comprises a sample face image acquisition and input module, an optimized feature extraction neural network acquisition module, an optimized classifier model acquisition module and an emotion recognition model acquisition module; alternatively, it can be described as: a processor comprises a face image acquisition module to be identified, an input module and an identification module. The names of these modules do not constitute a limitation on the module itself in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by one of the devices, cause the device to implement: obtaining a plurality of standard face images corresponding to each emotion in a plurality of emotions to obtain a plurality of sample face images, and inputting the plurality of sample face images into an emotion recognition model to be trained, wherein the emotion recognition model comprises a feature extraction neural network and a classifier model; based on a whale optimization algorithm, performing repeated iterative optimization on each parameter of a feature extraction neural network by using a whale position updating formula, a plurality of sample face images and corresponding emotion types in a surrounding stage updated each time according to real-time information entropy of a whale individual to obtain whale algorithm optimization neural network parameters, and obtaining an optimization feature extraction neural network by using the whale algorithm optimization neural network parameters; utilizing the optimized feature extraction neural network to output a plurality of sample face image feature data according to the plurality of sample face images, and performing optimized training on the classifier model according to the plurality of sample face image feature data and the corresponding emotion types to obtain an optimized classifier model; and extracting a neural network by using the optimized features and optimizing the classifier model combination to obtain the emotion recognition model.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for training emotion recognition models, comprising:

2. The emotion recognition model training method of claim 1, wherein the optimizing the parameters of the feature extraction neural network for a plurality of iterations by using a surrounding-stage whale position update formula updated each time according to real-time information entropy of whale individuals, the plurality of sample face images, and the corresponding emotion types based on a whale optimization algorithm includes:

determining a predictive value and a true value root mean square error equation of the characteristic data of the sample face image output by the calculated characteristic extraction neural network according to each sample face image as an fitness function of the whale algorithm;

calculating real-time information entropy of next iteration optimization according to the fitness value of the fitness function obtained by each iteration optimization, and updating a surrounding stage whale position updating formula corresponding to each iteration optimization according to the real-time information entropy of the next iteration optimization to obtain the surrounding stage whale position updating formula corresponding to the next iteration optimization; and

And performing next iteration optimization on each parameter of the feature extraction neural network by using the surrounding stage whale position updating formula corresponding to the next iteration optimization, the plurality of sample face images and the corresponding emotion types until the fitness value of the fitness function is not greater than a preset fitness value threshold value, so as to obtain the whale algorithm optimization neural network parameter.

3. The emotion recognition model training method of claim 2, wherein updating the surrounding stage whale position update formula corresponding to each iteration optimization according to the real-time information entropy of the next iteration optimization to obtain the surrounding stage whale position update formula corresponding to the next iteration optimization comprises:

calculating to obtain the next iteration adjustment parameter according to the real-time information entropy optimized by the next iteration and the maximum possible value of the real-time information entropy; and

and adjusting the surrounding stage whale position updating formula corresponding to each iteration optimization by using the next iteration adjustment parameters to obtain the surrounding stage whale position updating formula corresponding to the next iteration optimization.

4. The emotion recognition model training method of claim 1, wherein said optimizing the neural network parameters using the whale algorithm to obtain an optimized feature extraction neural network comprises:

Determining parameters of the characteristic extraction neural network as parameters of the whale algorithm optimization neural network to obtain preliminary optimization of the characteristic extraction neural network; and

and further optimizing parameters of the preliminary optimization feature extraction neural network based on a gradient descent algorithm to obtain the optimization feature extraction neural network.

5. The emotion recognition model training method of claim 1, wherein the acquiring a plurality of standard face images corresponding to each emotion of the plurality of emotions to obtain a plurality of sample face images includes:

acquiring a plurality of facial expression images corresponding to each emotion; and

and carrying out emotion type labeling, noise reduction processing, illumination compensation, face feature detection, effective face region segmentation and standardization processing on each facial expression image in sequence to obtain a plurality of standard facial images corresponding to each emotion.

6. An emotion recognition method, comprising:

acquiring a face image to be identified;

inputting the face image to be recognized into an emotion recognition model obtained by training the emotion recognition model training method according to any one of claims 1 to 5;

7. An emotion recognition model training device, comprising:

8. An emotion recognition device, comprising:

the input module is used for inputting the face image to be recognized into the emotion recognition model obtained by training the emotion recognition model training method according to any one of claims 1 to 5;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the emotion recognition model training method of any one of claims 1 to 5 when executing the program or the emotion recognition method of claim 6 when the processor executes the program.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the emotion recognition model training method according to any one of claims 1 to 5, or the program, when executed by a processor, implements the emotion recognition method according to claim 6.