Disclosure of Invention
The invention aims to provide a single-person motion posture correction method based on a neural network aiming at the defects of the existing motion posture correction method so as to improve the accuracy and efficiency of motion posture correction.
The idea of the invention is to set up a human body joint point detection network based on spatial domain conversion, construct a standard motion data set, construct a common motion data set, set a scoring threshold value to be 50, and determine action points needing to be corrected. The method comprises the following implementation steps:
(1) collecting a training data set:
(1a) downloading an image data set containing human body joint points and storing the image data set into a training image folder A;
(1b) downloading a label file corresponding to the data set, and storing the label file into a training label folder B;
(1c) putting the image folder and the label folder into the same folder to form a training data set;
(2) constructing a human body joint point detection network based on spatial domain conversion, which is formed by cascading an image spatial domain conversion sub-network and a human body joint point detection sub-network, wherein:
the image space domain conversion sub-network consists of 3 convolutional layers in sequence;
the human body joint point detection sub-network comprises 9 convolution layers and 4 deconvolution layers, namely 4 deconvolution layers are sequentially connected between 8 convolution layers and the last convolution layer which are sequentially cascaded;
(3) training a human body joint point detection network based on spatial domain conversion:
(3a) reading a training data set image from a training image folder A, inputting the image into the human body joint point detection network based on spatial domain conversion constructed in the step (2), generating a spatial conversion image through an image spatial conversion sub-network in the human body joint point detection network, and outputting a predicted coordinate value of a human body joint point through the human body joint point detection sub-network by the spatial conversion image;
(3b) reading the labeled coordinate values corresponding to the images of the training data set from the training labeled folder B, calculating the loss value L of the human body joint point network, and training the network constructed in the step (2) by using the loss value and adopting a random gradient descent algorithm to obtain a trained human body joint point detection network based on spatial domain conversion;
(4) constructing a standard motion data set:
(4a) shooting a standard action video demonstrated by a standard athlete;
(4b) collecting each frame of the shot standard action video into a picture, and storing the picture into a standard picture folder C;
(4c) respectively inputting the collected pictures into a trained human body joint point detection network based on spatial domain conversion to obtain coordinate information of each human body joint point, and storing the obtained coordinate information into a standard labeling folder D;
(5) constructing a common motion data set:
(5a) shooting a non-standard motion video demonstrated by a common athlete;
(5b) collecting each frame of the shot non-standard action video into an image, and storing the image into a test image folder E;
(5c) respectively inputting the collected pictures into a trained human body joint point detection network based on spatial domain conversion to obtain coordinate information of each human body joint point, and storing the obtained coordinate information into a test labeling folder F;
(6) setting a scoring threshold value to be 50, determining an action point needing correction:
(6a) reading coordinate information corresponding to the test picture from the test labeling folder F;
(6b) reading coordinate information corresponding to the standard picture from the standard labeling folder D;
(6c) sequentially calculating the Euclidean distance sum of the coordinates of the joint points of the test picture and the standard picture, and taking the standard picture with the minimum Euclidean distance sum as a standard matching picture of the test picture;
(6d) and calculating the Euclidean distance between the test picture and each joint point in the standard matching picture, and counting the joint points which are larger than the set scoring threshold value, namely the joint points to be corrected.
Compared with the prior art, the invention has the following advantages:
1. the identification accuracy is high
The existing posture correction method is very dependent on the exercise experience and exercise level of a teacher, and when the experience of the teacher deviates or is not very proficient in a certain exercise, misleading effects are often generated on the exercise and training of students. The invention establishes a human body joint point detection network based on spatial domain conversion and collects standard motion videos, and strictly and standard definition is carried out on standard actions, so that the accuracy of guidance is greatly improved.
2. The training efficiency is high
In the existing posture correction method, because the number of teachers is far smaller than that of students, the students often cannot be effectively guided at any time. By establishing a set of universal motion posture detection method, the invention enables students to receive training at any time, thereby greatly improving the training efficiency.
Detailed Description
Embodiments of the present invention are further described below with reference to the accompanying drawings.
Referring to fig. 1, the specific implementation steps for this example are as follows.
Step 1, a training data set is collected.
(1.1) downloading an image data set containing human body joint points from an open website and storing the image data set into a training image folder A;
(1.2) downloading a label file corresponding to the data set from the public website, and storing the label file into a training label folder B;
the label file contains coordinate information of 18 joint points in the human body, and the 18 joint points are respectively as follows: nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, right eye, left eye, right ear, and left ear;
and (1.3) putting the image folder and the label folder into the same folder to form a training data set.
And 2, building a human body joint point detection network based on spatial domain conversion.
(2.1) constructing an image spatial domain conversion sub-network:
the sub-network is in turn composed of 3 convolutional layers, of which:
the convolution kernel size of the 1 st convolution layer is 1 multiplied by 1, the number of convolution kernels is 3, and the step length is 1;
the convolution kernel size of the 2 nd convolution layer is 1 multiplied by 1, the number of convolution kernels is 64, and the step length is 1;
the convolution kernel size of the 3 rd convolution layer is 1 × 1, the number of convolution kernels is 3, and the step size is 1.
(2.2) constructing a human joint point detection sub-network:
the sub-network comprises 9 convolutional layers and 4 anti-convolutional layers, and the structural relationship is as follows: first convolution layer → second convolution layer → third convolution layer → fourth convolution layer → fifth convolution layer → sixth convolution layer → seventh convolution layer → eighth convolution layer → first reverse convolution layer → second reverse convolution layer → third reverse convolution layer → fourth reverse convolution layer → ninth convolution layer, wherein:
the convolution kernel size of the first convolution layer is 3 multiplied by 3, the number of the convolution kernels is 128, and the step length is 1;
the convolution kernel size of the second convolution layer is 1 multiplied by 1, the number of convolution kernels is 256, and the step length is 2;
the convolution kernel size of the third convolution layer is 3 multiplied by 3, the number of convolution kernels is 256, and the step length is 1;
the convolution kernel size of the fourth convolution layer is 1 × 1, the number of convolution kernels is 256, and the step length is 2;
the convolution kernel size of the fifth convolution layer is 3 × 3, the number of convolution kernels is 256, and the step length is 1;
the convolution kernel size of the sixth convolution layer is 1 × 1, the number of convolution kernels is 256, and the step size is 2;
the convolution kernel size of the seventh convolution layer is 3 × 3, the number of convolution kernels is 256, and the step size is 1;
the convolution kernel size of the eighth convolution layer is 1 × 1, the number of convolution kernels is 256, and the step size is 1;
the convolution kernel size of the first deconvolution layer is 3 × 3, the number of convolution kernels is 256, and the step size is 2;
the convolution kernel size of the second deconvolution layer is 3 × 3, the number of convolution kernels is 128, and the step size is 2;
the convolution kernel size of the third deconvolution layer is 3 × 3, the number of convolution kernels is 128, and the step size is 2;
the convolution kernel size of the fourth deconvolution layer is 3 × 3, the number of convolution kernels is 128, and the step size is 1;
the size of convolution kernels of the ninth convolution layer is 1 multiplied by 1, the number of convolution kernels is 18, and the step size is 1;
and (2.3) cascading the established image spatial domain conversion sub-network with a human body joint point detection sub-network to form a human body joint point detection network based on spatial domain conversion.
And 3, training a human body joint point detection network based on spatial domain conversion.
(3.1) reading a training data set image from the training image folder A, inputting the image into the human body joint point detection network based on spatial domain conversion constructed in the step (2), generating a spatial conversion image through an image spatial conversion sub-network in the human body joint point detection network, and outputting a predicted coordinate value of the human body joint point through the human body joint point detection sub-network by the spatial conversion image;
(3.2) reading the labeled coordinate values corresponding to the images of the training data set from the training labeled folder B, and calculating the loss value L of the human body joint point detection network based on spatial domain conversion:
wherein i represents the serial number of the human body joint point, xi' and yi' labeling abscissa and ordinate values, x, respectively representing the joint points of the corresponding serial numbersiAnd yiRespectively representing the abscissa and the ordinate of a predicted coordinate value output by a human body joint point detection network based on spatial domain conversion;
(3.3) detecting the loss value L of the network by using the human body joint points based on spatial domain conversion, and training the network constructed in the step (2) by adopting a random gradient descent algorithm:
(3.3.1) taking the derivative of the loss value of the human body joint point detection network based on the spatial domain conversion:
f represents a derivative value of a loss value L of the human body joint point detection network based on the spatial domain conversion to a network parameter theta thereof, and theta represents a parameter of the human body joint point detection network based on the spatial domain conversion;
(3.3.2) calculating an updated value of the human body joint point detection network parameter based on the spatial domain conversion:
θ2=θ-αF
wherein, theta2Representing the updated value of the human body joint point detection network parameters based on the spatial domain conversion, α is the learning rate of the human body joint point detection network based on the spatial domain conversion, and the value is 0.00025;
(3.3.3) detecting updated values of network parameters θ with human body joint points based on spatial domain transformation2Replacing the parameter theta of the original network;
(3c4) and (3) iterating the steps from (3.3.1) to (3.3.3) for 150000 times to obtain the trained human body joint point detection network based on spatial domain conversion.
And 4, constructing a standard motion data set:
(4.1) shooting a standard motion video demonstrated by a standard athlete, wherein the shooting equipment is Canon EOS 5D Mark IV, and the video frame rate is 60 frames/second;
(4.2) collecting each frame of the shot standard motion video into a picture as shown in figure 2, and storing the picture into a standard picture folder C;
and (4.3) respectively inputting the collected pictures into a trained human body joint point detection network based on spatial domain conversion to obtain coordinate information of each human body joint point, and storing the obtained coordinate information into a standard labeling folder D.
And 5, constructing a common motion data set.
(5.1) shooting a nonstandard action video demonstrated by a common athlete, wherein the shooting equipment is Canon EOS 5D MarkIV, and the video frame rate is 60 frames/second;
(5.2) collecting each frame of the shot non-standard motion video into a picture as shown in figure 3, and storing the picture into a test picture folder E;
and (5.3) respectively inputting the acquired pictures into a trained human body joint point detection network based on spatial domain conversion to obtain coordinate information of each human body joint point, and storing the obtained coordinate information into a test labeling folder F.
And 6, determining action points needing to be corrected.
(6.1) reading the coordinate information corresponding to the test picture from the test labeling folder F;
(6.2) reading coordinate information corresponding to the standard picture from the standard labeling folder D;
(6.3) sequentially calculating the sum of Euclidean distances between the coordinates of the test picture and the coordinates of the joint points of the standard picture:
wherein, P represents the sum of Euclidean distances between the coordinates of the test picture and the coordinates of the joint points of the standard picture, i represents the serial number of the joint points of the human body, ai' and bi' respectively representing the abscissa and ordinate values of the joint point of the corresponding serial number in the test picture, aiAnd biRespectively representing the abscissa and ordinate values of the joint point with the corresponding serial number in the standard picture.
(6.4) taking the standard picture with the minimum sum of Euclidean distances as a standard matching picture of the test picture from the sum of the Euclidean distances of the coordinates of the joint points of the calculated test picture and the standard picture;
(6.5) calculating the Euclidean distance between the test picture and each joint point in the standard matching picture:
Qj=(c'j-cj)2+(d'j-dj)2,j=1,2,...,18
wherein Q isjRepresenting Euclidean distance of j-th joint point coordinates of the test picture and the standard picture, wherein j represents a serial number of human body joint points, c'jAnd d'jRespectively representing the abscissa and ordinate values of the joint point of the corresponding serial number in the test picture, cjAnd djRespectively representing the abscissa and ordinate values of the joint point with the corresponding serial number in the standard matching picture.
(6.6) setting a scoring threshold value to be 50, and counting the joint points which are greater than the scoring threshold value in the Euclidean distance between the test picture and each joint point in the standard matching picture, namely the joint points to be corrected.
The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.