CN108830139A

CN108830139A - Depth context prediction technique, device, medium and the equipment of human body key point

Info

Publication number: CN108830139A
Application number: CN201810395949.4A
Authority: CN
Inventors: 汪旻; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2018-11-16

Abstract

Presently filed embodiment discloses depth context prediction technique, neural network training method, device, electronic equipment, computer readable storage medium and the computer program of a kind of human body key point, and the depth context prediction technique of human body key point therein includes：Obtain image to be processed；The image to be processed is supplied to neural network, is handled via the depth context prediction that the neural network executes human body key point, to obtain the depth context of human body key point；Wherein, the depth context of the human body key point is used to indicate the depth location relativeness between human body key point.Technical solution provided by the present application is conducive to improve the accuracy of 3 D human body attitude prediction, to be conducive to avoid due to 3 D human body attitude prediction mistake and generate adverse effect to interaction entertainment and behavioural analysis etc..

Description

Depth context prediction technique, device, medium and the equipment of human body key point

Technical field

This application involves computer vision techniques, more particularly, to a kind of depth context prediction side of human body key point Method, the depth context prediction meanss of human body key point, neural network training method, neural metwork training device, electronics are set Standby, computer readable storage medium and computer program.

Background technique

3 D human body attitude prediction plays certain effect in the technical fields such as interaction entertainment and behavioural analysis.

During 3 D human body attitude prediction, three-dimensional is often led to due to the depth prediction mistake of human body key point Human body gesture prediction mistake, for example, arm should be located on front side of body, and 3 D human body attitude prediction result may be due to phase The depth prediction mistake of key point is answered, and finally predicts arm and is located on rear side of body.3 D human body attitude prediction mistake can be right Interaction entertainment and behavioural analysis etc. generate adverse effect.The accuracy for how improving 3 D human body attitude prediction is a value The technical issues of must paying close attention to.

Summary of the invention

Neural network is predicted and trained to the depth context that the application embodiment provides a kind of human body key point Technical solution.

According to the application embodiment one aspect, a kind of depth context prediction side of human body key point is provided Method, the method includes：Obtain image to be processed；The image to be processed is supplied to neural network, via the nerve net Network executes the depth context prediction processing of human body key point, to obtain the depth context of human body key point；Wherein, institute The depth context for stating human body key point is used to indicate the depth location relativeness between human body key point.

In one embodiment of the application, the acquisition image to be processed includes：Obtain image to be processed and to be processed The characteristic pattern of at least two human body key points of image；It is described the image to be processed is supplied to neural network to include：By institute The characteristic pattern for stating image to be processed and the human body key point is supplied to neural network.

In the another embodiment of the application, the characteristic pattern of the human body key point includes：The hotspot graph of human body key point.

In the application a further embodiment, closed before and after the depth that human body key point is executed via the neural network It is that prediction processing includes：Via the neural network according to the image to be processed and the characteristic pattern of the human body key point, The characteristic value of at least two human body key points is formed, and obtains the difference between characteristic value, human body is formed based on the difference and is closed The depth context of key point.

In the another embodiment of the application, the depth context of the human body key point includes：Characterize a human body Information of the key point before or after another human body key point.

In the another embodiment of the application, one human body key point of the characterization is before another human body key point Or information later includes：Characterize probability value of the human body key point before or after another human body key point.

In the another embodiment of the application, the depth context of the human body key point includes：Human body key point Depth context matrix；Wherein, the line number of the matrix and columns are the quantity of human body key point, the line n of the matrix Indicate n-th of human body key point, the m column of the matrix indicate m-th of human body key point, the number of the matrix line n m column Value indicates probability value of n-th of human body key point before or after m-th of human body key point.

In the another embodiment of the application, the neural network is using before a plurality of depth for being provided with human body key point Made of the image pattern of relationship marking information is trained in advance afterwards；Wherein, the depth context mark of the human body key point Information table is leted others have a look at the depth location relativeness between body key point.

According to the application embodiment wherein in another aspect, providing a kind of training method of neural network, the method Including：Obtain image pattern；Described image sample is supplied to neural network to be trained, via the nerve net to be trained Network executes the depth context prediction processing of human body key point, to obtain the depth context of human body key point；Using institute State the depth context markup information of the human body key point of image pattern to the depth context of the human body key point into Row supervision, the study so that neural network to be trained exercises supervision.

In one embodiment of the application, the acquisition image pattern includes：Obtain image pattern and image pattern The characteristic pattern of at least two human body key points；It is described described image sample is supplied to neural network to be trained to include：By institute The characteristic pattern for stating image pattern and the human body key point is supplied to neural network to be trained.

In the another embodiment of the application, the depth context of the human body key point of described image sample marks letter Breath is formed using the human body key point Labeling Coordinate information in three dimensions of image pattern；Alternatively, described image sample The depth context markup information of this human body key point, is by manually marking formation.

In the application a further embodiment, the depth context markup information of the human body key point of described image sample Including：Characterize markup information of the human body key point before or after another human body key point.

In the application a further embodiment, one human body key point of the characterization is before another human body key point Or markup information later includes：Characterize probability mark of the human body key point before or after another human body key point Note value.

In the application a further embodiment, the depth context markup information of the human body key point includes：Human body The depth context of key point marks matrix；Wherein, the line number of the mark matrix and columns are the quantity of human body key point, The line n of the mark matrix indicates n-th of human body key point, and the m column of the mark matrix indicate that m-th of human body is crucial Point, it is described mark matrix line n m column mark value indicate n-th of human body key point before m-th of human body key point or it Probability mark value afterwards.

In the application a further embodiment, the probability mark value is：First mark value indicates a human body key point Depth coordinate markup information in three dimensions is greater than, the depth coordinate mark of another human body key point in three dimensions The sum of information and predetermined value；Alternatively, the second mark value, indicates the depth coordinate mark of a human body key point in three dimensions Information is less than, the difference of another human body key point depth coordinate markup information in three dimensions and predetermined value；Alternatively, third Mark value indicates the depth coordinate markup information of a human body key point in three dimensions and another human body key point three The absolute value of the difference of depth coordinate markup information in dimension space is no more than predetermined value.

According to the application embodiment wherein in another aspect, providing a kind of depth context prediction dress of human body key point It sets, described device includes：First obtains module, for obtaining image to be processed；It include the first depth front and back of neural network Relationship module executes human body key point via the neural network for the image to be processed to be supplied to neural network Depth context prediction processing, to obtain the depth context of human body key point；Wherein, the depth of the human body key point Context is used to indicate the depth location relativeness between human body key point.

In one embodiment of the application, the first acquisition module is further used for：Obtain image to be processed and to Handle the characteristic pattern of at least two human body key points of image；The first depth context module is further used for：By institute The characteristic pattern for stating image to be processed and the human body key point is supplied to neural network.

In the application a further embodiment, the neural network includes：First unit, for according to the figure to be processed The characteristic pattern of picture and the human body key point forms the characteristic value of at least two human body key points；Second unit, for obtaining Difference between characteristic value；Third unit, for forming the depth context of human body key point based on the difference.

In the application a further embodiment, the second unit includes：Vector differentials computing unit, for for multiple Characteristic value two-by-two in the characteristic value of human body key point executes characteristic value difference and calculates, to obtain the difference between characteristic value two-by-two Value.

In the application a further embodiment, the third unit includes：Context forms unit, for according at least One difference forms the depth context of the human body key point.

According to the application embodiment wherein in another aspect, providing a kind of training device of neural network, described device Including：Second obtains module, for obtaining image pattern；It include the second depth context mould of neural network to be trained Block executes human body via the neural network to be trained for described image sample to be supplied to neural network to be trained The depth context prediction of key point is handled, to obtain the depth context of human body key point；Supervision module, for utilizing Depth context of the depth context markup information of the human body key point of described image sample to the human body key point It exercises supervision, the study so that neural network to be trained exercises supervision.

In one embodiment of the application, the second acquisition module is further used for：Obtain image pattern and image The characteristic pattern of at least two human body key points of sample；The second depth context module is further used for：By the figure The characteristic pattern of decent and the human body key point is supplied to neural network to be trained.

In the another embodiment of the application, described device further includes：First labeling module, for utilizing image pattern The Labeling Coordinate information of human body key point in three dimensions forms pass before and after the depth of the human body key point of described image sample It is markup information；Alternatively, the second labeling module, for providing artificial mark interface, according to received based on artificial mark interface Information forms the depth context markup information of the human body key point of described image sample.

According to the application embodiment in another aspect, providing a kind of electronic equipment, including：Memory is calculated for storing Machine program；Processor, for executing the computer program stored in the memory, and the computer program is performed, Realize the application either method embodiment.

According to the application embodiment in another aspect, providing a kind of computer readable storage medium, it is stored thereon with calculating Machine program when the computer program is executed by processor, realizes the application either method embodiment.

According to another aspect of the application embodiment, a kind of computer program, including computer instruction are provided, when this When computer instruction is run in the processor of equipment, the application either method embodiment is realized.

Before and after depth context prediction technique, the depth of human body key point based on human body key point provided by the present application Relationship Prediction device, neural network training method, neural metwork training device, electronic equipment, computer readable storage medium and Computer program, the application is by that can predict the depth context of human body key point using neural network, due to human body The depth context of key point can indicate the depth location relativeness between human body key point, therefore, human body key point Depth context can be provided for 3 D human body attitude prediction referring to information, to be conducive to avoid 3 D human body posture pre- During survey, the phenomenon that existing prediction error.It follows that technical solution provided by the present application is conducive to improve 3 D human body The accuracy of attitude prediction, to be conducive to avoid due to 3 D human body attitude prediction mistake and to interaction entertainment and behavior point Analysis etc. generates adverse effect.

Below by drawings and embodiments, the technical solution of the application is described in further detail.

Detailed description of the invention

The attached drawing for constituting part of specification describes presently filed embodiment, and together with description for solving Release the principle of the application.

The application can be more clearly understood according to following detailed description referring to attached drawing, wherein：

Fig. 1 is the flow chart of one embodiment of depth context prediction technique of the application human body key point；

Fig. 2 is the schematic diagram of the image to be processed of the application；

Fig. 3 is the schematic diagram of one embodiment of hotspot graph of 16 human body key points of image to be processed shown in Fig. 2；

Fig. 4 is the schematic diagram of an embodiment of the human body key point of image to be processed shown in Fig. 2；

Fig. 5 is the schematic diagram of an embodiment of the depth context matrix of the human body key point of the application；

Fig. 6 is the image to be processed shown in Fig. 2 that is directed to of the application, the depth context prediction of progress human body key point The schematic diagram of one embodiment of processing；

Fig. 7 is the flow chart of one embodiment of training method of the neural network of the application；

Fig. 8 is the structural representation of depth context one embodiment of prediction meanss of the human body key point of the application Figure；

Fig. 9 is the structural schematic diagram of one embodiment of training device of the neural network of the application；

Figure 10 is the block diagram for realizing an example devices of the application embodiment.

Specific embodiment

The various exemplary embodiments of the application are described in detail now with reference to attached drawing.It should be noted that：Unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of application.

Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the application And its application or any restrictions used.

Technology, method known to person of ordinary skill in the relevant and equipment may be not discussed in detail, but In appropriate situation, the technology, method and apparatus should be considered as part of specification.

It should be noted that：Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain item exists It is defined in one attached drawing, then in subsequent attached drawing does not need that it is further discussed.

The embodiment of the present application can be applied to the electronic equipments such as terminal device, computer system and server, can be with crowd Mostly other general or dedicated computing system environment or configuration operate together.Suitable for terminal device, computer system with And the example of well-known terminal device, computing system, environment and/or configuration that the electronic equipments such as server are used together, Including but not limited to：It is personal computer system, server computer system, thin client, thick client computer, hand-held or above-knee set It is standby, microprocessor-based system, set-top box, programmable consumer electronics, NetPC Network PC, little type Ji calculate machine Xi Tong ﹑ Large computer system and the distributed cloud computing technology environment including above-mentioned any system, etc..

The electronic equipments such as terminal device, computer system and server can be in the computer executed by computer system It is described under the general context of system executable instruction (such as program module).In general, program module may include routine, program, Target program, component, logic and data structure etc., they execute specific task or realize specific abstract data class Type.Computer system/server can be implemented in distributed cloud computing environment, in distributed cloud computing environment, task be by What the remote processing devices being linked through a communication network executed.In distributed cloud computing environment, program module can be located at packet On the Local or Remote computing system storage medium for including storage equipment.

Exemplary embodiment

Fig. 1 is the flow chart of one embodiment of depth context prediction technique of the human body key point of the application.

As shown in Figure 1, the embodiment method mainly includes：Step S100 and step S110.Step in the application S100 and step S110 are specially：

S100, image to be processed is obtained.

S110, image to be processed is supplied to neural network, before the depth that human body key point is executed via the neural network Relationship Prediction is handled afterwards, to obtain the depth context of human body key point.

In an optional example, the image to be processed in the application can be original image to be processed, or former Begin in image to be processed include human body topography.In addition, the image to be processed in the application can also be to pass through needle To in original image to be processed include human body topography's image for handling, and obtaining.In addition, in the application Image to be processed can be the images such as the static picture of presentation or photo, or the video frame in dynamic video is presented. One specific example of the image to be processed in the application is as shown in Figure 2.

In an optional example, the human body that the image to be processed of the application is included can be complete human body (for example, Image to be processed shown in Fig. 2 includes complete human body).The human body that image to be processed is included be also possible to due to blocking or The reasons such as person's angle coverage and caused by partial body.The application does not limit the specific manifestation shape of the human body in image to be processed State.

In an optional example, the application can not only obtain image to be processed, can also obtain image to be processed The characteristic pattern (Feature Map) of at least two human body key points.The number of the human body key point of image to be processed in the application Amount at least two.In general, the quantity of the human body key point of image to be processed is multiple, for example, image to be processed 12 or 14 or 16 human body key points.The quantity of human body key point in the application can be can substantially retouch Stating out human body attitude is principle, to determine.The application does not limit the particular number of human body key point.Correspondingly, the application is obtained The quantity at least two of the characteristic pattern of the human body key point for the image to be processed got.In general, the application is obtained The quantity of the characteristic pattern of the human body key point of the image to be processed obtained is multiple, and the application would generally obtain all human bodies keys The characteristic pattern of point, for example, in the case where the quantity of the human body key point of image to be processed is 12 or 14 or 16, the application It can be directed to image to be processed, get the characteristic pattern of 12 or 14 or 16 human body key points.The application does not limit The particular number of the characteristic pattern of the human body key point of the image to be processed got.

In an optional example, the characteristic pattern of the human body key point in the application is special commonly used in expression human body key point Sign.For example, the characteristic pattern of the human body key point in the application can be specially the hotspot graph of human body key point.The application does not limit The specific manifestation form of the characteristic pattern of human body key point.In addition, in the description of following technical proposals, sometimes with human body key For the hotspot graph of point, the technical solution of the application is illustrated, however, this does not indicate that the application must be closed using human body The hotspot graph of key point.

In an optional example, the application can by existing two-dimension human body key point Feature Extraction Technology obtain to Handle the hotspot graph of the human body key point of image.For example, the application can use the nerve for extracting human body key point feature Network (following be known as feature extraction neural network), come obtain image to be processed each human body key point hotspot graph.Specifically, Image to be processed can be supplied to feature extraction neural network by the application, extracted neural network via this feature and executed human body pass Key point feature extraction process, to extract the information of neural network output according to this feature, the application can obtain figure to be processed The hotspot graph of at least two (as all) human body key points as in.

In an optional example, in the case where the quantity of human body key point is 16, for shown in Fig. 2 to be processed Image, the available hotspot graph to 16 human body key points of the application.One tool of the hotspot graph of this 16 human body key points Body example is as shown in Figure 3.In Fig. 3, the hotspot graph of 16 human body key points respectively corresponds to a human body key point, and different hot spots The corresponding different human body key point of figure.That is, corresponding 16 human bodies of the hotspot graph of 16 human body key points shown in Fig. 3 close Key point, this 16 human body key points can be specially 16 human body key points that number as shown in Figure 4 is 0-15.Shown in Fig. 3 Each human body key point hotspot graph in include a high bright spot.High bright spot in hotspot graph is generally it can be thought that be this The hotspot location of human body key point in hotspot graph.For example, the number in the hotspot graph corresponding diagram 4 in the upper left corner of Fig. 3 is 5 Human body key point.For another example the human body key point that number is 13 in the hotspot graph corresponding diagram 4 in the lower left corner of Fig. 3.

In an optional example, the feature extraction neural network in the application be can include but is not limited to：It is convolutional layer, non- Relu layers linear, pond layer and full articulamentum etc., the number of plies that this feature extraction neural network is included is more, then network is got over It is deep；For another example the network structure of the feature extraction neural network of the application can use but be not limited to ALexNet, depth residual error Network (Deep Residual Network, ResNet) or VGGnet (Visual Geometry Group Network, depending on Feel geometry group network) etc. network structure used by neural networks.The application does not limit the hotspot graph for obtaining human body key point The network structure of specific implementation and feature extraction neural network.

In an optional example, the depth context of the human body key point of the application can be used for predicting 3 D human body Posture, for example, corresponding neural network is supplied to, in order to by this using the depth context of human body key point as input Neural network executes corresponding 3 D human body attitude prediction processing.Certainly, the depth context of the human body key point of the application It can be used for other aspects such as depth value of prediction human body key point.It closes the depth front and back that the application does not limit human body key point The concrete application scene of system.

In an optional example, the depth context of the human body key point of the application can usually represent human body pass Depth location relativeness between key point, for example, for two human body key points, the depth of the human body key point of the application Degree context can represent the front-rear position relationship between the two human body key points.Above-mentioned two human body key point can be with For any two human body key point in all human body key points, or preassigned two human body key points.

In an optional example, the depth context of the human body key point in the application may include：For any For two human body key points, for characterize one of human body key point before the another one human body key point or it Information afterwards.For example, the depth context of the human body key point in the application may include：Institute in all human body key points Some two-by-two key point it is corresponding for characterize one of human body key point in another one human body key point it Preceding or information later.In the application for characterize one of human body key point in another one human body key point it Information preceding or later can be specially：For characterize one of human body key point in another one human body key point it Preceding or probability value later.

In an optional example, the depth context of the human body key point in the application can be：Human body key point Depth context matrix；Wherein, line number and columns included by the matrix are the human body key point of image to be processed Quantity, for example, the quantity of human body key point be integer A in the case where, the matrix can be an A × A matrix.This Shen Line n in matrix please indicates n-th of human body key point (for example, human body key point that number is n), in the matrix of the application M column indicate m-th human body key point (for example, human body key point that number is m).Line n m in the matrix of the application The value of column indicates：Probability value of n-th of human body key point before or after m-th of human body key point.The value of the probability value Range typically 0-1.The depth context of human body key point in the application can also be using other forms such as arrays It indicates, the application do not limit the contextual specific manifestation form of depth of human body key point.

One specific example of the depth context matrix of the human body key point in the application is as shown in Figure 5.Fig. 5 is shown One 16 row × 16 column matrixes, that is to say, that the quantity of human body key point is 16.Matrix shown in fig. 5 include 16 × 16 numerical value (i.e. 256 numerical value).Each of matrix numerical value is the probability value of a 0-1.For example, square shown in fig. 5 The numerical value of the 0th row the 2nd column in battle array is 0.2, and the human body key point for indicating that number is 0 in Fig. 4 numbers the people for being 2 in Fig. 4 Probability before body key point is 0.2.For another example the numerical value that the 1st row the 15th in matrix shown in fig. 5 arranges is 0.3, figure is indicated It is 0.3 that the human body key point that number is 1 in 4 numbers the probability before the human body key point for being 15 in Fig. 4.It needs to illustrate It is that the value that line n n-th arranges in matrix shown in fig. 5 is all set to 0.5, indicates the human body key point that number is n in Fig. 4 Probability value before its own is 0.5.Probability value 0.5 in the application may be considered：It is difficult to differentiate between front and back.In addition, In this section of description, " before " can also be replaced with " later ".

The corresponding depth of human body key point two-by-two is indicated in all human body key points by using the mode of pro-bability value matrices Spend context, can the clear orderly depth context for embodying human body key point, to be conducive to subsequent basis The depth context of human body key point carries out the processing of 3 D human body attitude prediction.

The nerve for being used to carry out human body key point the processing of depth context in an optional example, in the application Network (following to be known as depth context neural network) it is crucial can to form at least two human bodies first according to image to be processed The characteristic value of point, secondly, the characteristic value that the depth context neural network is formed according to it, calculates the difference between two characteristic values Different (for example, calculating the difference between all characteristic values two-by-two in all characteristic values), finally, the depth context nerve net Network forms the depth context of human body key point based on its calculated difference.

In an optional example, the application is on the basis that image to be processed is supplied to depth context neural network On, the characteristic pattern of the human body key point of image to be processed can also be also provided to depth context neural network, for example, this Application merges processing (i.e. connection processing) for the characteristic pattern of image to be processed and human body key point, forms image to be processed Tensor (be referred to as input tensor) be supplied to depth context neural network and by the tensor of image to be processed, with The depth context prediction processing that human body key point is executed via depth context neural network, so that the application can root According to the information that depth context neural network exports, the depth context of human body key point is obtained.The application is by by people The characteristic pattern of body key point is supplied to depth context neural network, is conducive to improve the execution of depth context neural network The accuracy of the depth context prediction processing of human body key point, so that the depth front and back for being conducive to improve human body key point is closed The forecasting accuracy of system.

In an optional example, depth context neural network in the application can first according to image to be processed with And the characteristic pattern of human body key point, the characteristic value of at least two human body key points is formed (for example, forming all human body key points Characteristic value), then, which calculates the difference between two characteristic values according to the characteristic value that it is formed (for example, calculating the difference between all characteristic values two-by-two in all characteristic values), finally, the depth context neural network The depth context of human body key point is formed based on its calculated difference.

In an optional example, the depth context neural network in the application may include：Residual error network unit, Vector differentials computing unit and context form unit, this three parts.The depth of human body key point is realized by this three parts Contextual prediction.

In an optional example, the residual error network unit in the application is used for, according to image to be processed, or according to The hotspot graph of image and human body key point is handled, the characteristic value of at least two human body key points is formed.The application can treat Processing image and the characteristic pattern (for example, characteristic pattern of all human body key points) of at least one human body key point merge place Reason forms input tensor, and input tensor is supplied to the residual error network unit in depth context neural network, by residual error Network unit forms multiple characteristic values (being referred to as feature vector) for input tensor, for example, for shown in Fig. 2 wait locate For the hotspot graph for managing image and 16 human body key points shown in Fig. 3, residual error network unit forms 16 characteristic values (can also To be known as the feature vector of 16 dimensions).The characteristic pattern of each characteristic value corresponding a human body key point and human body key point.

In an optional example, the residual error network unit in the application may include, but be not limited to：It is convolutional layer, non-thread Relu layers of property, pond layer and full articulamentum etc., the number of plies which is included is more, then network is deeper.This Shen Please in residual error network unit can be specially Resnet-18, Resnet-34 or Resnet-50 etc..The application does not limit residual The network structure of poor network unit.

In an optional example, the vector differentials computing unit in the application is used for, and is exported for residual error network unit Characteristic value (for example, all characteristic values, such as feature vectors of above-mentioned 16 dimension) in two characteristic values (for example, in all characteristic values All characteristic values two-by-two), execute characteristic value difference and calculate, and its calculated difference is exported, for example, in all characteristic values The difference of all characteristic values two-by-two.

In an optional example, characteristic value difference performed by the vector differentials computing unit in the application calculates can be with It is expressed as the form of following formula (1)：

F_ij=F_i-F_jFormula (1)

In above-mentioned formula (1), F_ijIndicate characteristic value F_iWith characteristic value F_jBetween difference；Characteristic value F_iIndicate i-th of people The characteristic value of body key point；Characteristic value F_jIndicate the characteristic value of j-th of human body key point.

In an optional example, the context in the application forms unit and is used for, and vector differentials computing unit is defeated Difference out is converted to the depth context of human body key point.For example, context, which forms unit, calculates list for vector differentials All differences of member output are respectively converted into the probability value between 0-1, so that the application can obtain the depth of human body key point Context matrix, 16 × 16 matrix as shown in Figure 5.

In an optional example, the context in the application forms conversion operation performed by unit and can be expressed as The form of following formula (2)：

In above-mentioned formula (2), P_ijIndicate probability value；F_ijIndicate the characteristic value F of i-th of human body key point_iWith j-th of people The characteristic value F of body key point_jBetween difference.

In an optional example, the application, which utilizes, includes residual error network unit, vector differentials computing unit and front and back Relationship forms the depth context neural network of unit, obtains one of the depth context prediction result of human body key point Embodiment is as shown in Figure 6.Specifically, the leftmost side Fig. 6 is image to be processed, by the way that image to be processed is supplied to feature extraction Neural network (neural network for being used to predict two-dimension human body guise in such as Fig. 6), can obtain the owner of image to be processed The hotspot graph (multiple hotspot graphs as being located at Fig. 6 middle position) of body key point.The application can by image to be processed and The hotspot graph of all human body key points is merged and (is connected), is formed input tensor, is supplied to depth context Residual error network unit (Resnet for being located at Fig. 6 middle position) in neural network, Resnet are calculated according to input tensor The characteristic value of each human body key point, so that the application can be according to the Resnet each human body key point of information acquisition exported Characteristic value.The characteristic value of all human body key points of Resnet output is provided to vector differentials computing unit (as being located in Fig. 6 Between Pairwise Layer at position to the right), the feature of all key points of human body two-by-two is calculated by vector differentials computing unit Difference between value.The calculated all differences of vector differentials computing unit are provided to context and form unit (such as Rank Transfer at the center-right position Fig. 6), unit is formed by context and converts all difference respectively Probability value between 0-1, to obtain depth context matrix (the probability value square of such as rightmost side Fig. 6 of human body key point Battle array).

In an optional example, the depth context neural network in the application is to utilize multiple images sample training Made of, and each image pattern is usually provided with the depth context markup information of human body key point.In the application The depth context markup information of human body key point can be used to indicate that the depth between the human body key point in image pattern Position relative relation.The depth context markup information of human body key point exercises supervision for treating trained neural network It practises.The depth context markup information of human body key point can be specially the depth context matrix of human body key point.

Fig. 7 is the flow chart of one embodiment of the training method of the neural network of the application.It is shown in Fig. 7 to include：Step Rapid S700, step S710 and step S720.S700, S710 and S720 in the application are specially：

S700, image pattern is obtained.

S710, image pattern is supplied to neural network to be trained, executes human body via neural network to be trained and closes The depth context prediction of key point is handled, to obtain the depth context of human body key point.

S720, using image pattern human body key point depth context markup information to the depth of human body key point Context exercises supervision, the study so that neural network to be trained exercises supervision.

In an optional example, the image pattern in the application can be original image samples, or original graph In decent includes the topography of human body.In addition, the image pattern in the application can also be for by being directed to original graph In decent includes topography's image for being handled, and being obtained of human body.In addition, the image pattern in the application can Think and the image patterns such as static picture sample or photo sample are presented, or the video in dynamic video sample is presented Frame sample.

In an optional example, the human body that the image pattern in the application is included can be complete human body.Image The human body that sample is included be also possible to block or angle coverage etc. due to and caused by partial body.The application is unlimited The specific manifestation form of human body in imaged sample.

In an optional example, the application can from training data concentrate read image pattern, in order to be supplied to Trained depth context neural network.It includes a plurality of for training the figure of neural network that training data in the application, which is concentrated, Decent, it is generally the case that each image pattern is provided with the depth context markup information of human body key point.This Shen It can please once be concentrated from training data according to random reading manner or according to image pattern arrangement order sequence reading manner Read one or more image pattern.

In an optional example, the depth context markup information of the human body key point of the application can usually be indicated Depth location relativeness between human body key point out, for example, the human body of the application closes for two human body key points The depth context markup information of key point can represent the front-rear position relationship between the two human body key points.Above-mentioned two A human body key point can be any two human body key point in owner's body key point, or preassigned two Human body key point.

In an optional example, the depth context markup information of the human body key point in the application may include： For two human body key points, for characterizing one of human body key point before another one human body key point Or markup information later.For example, the depth context markup information of the human body key point in the application may include：It is all All key points two-by-two in human body key point are corresponding to be located therein separately for characterizing one of human body key point Markup information before or after one human body key point.It is used to characterize one of human body key point in it in the application In markup information before or after another human body key point can be specially：For characterizing one of human body key point Probability mark value before or after another one human body key point.

In an optional example, the depth context markup information of the human body key point in the application can be specific For：The depth context of human body key point marks matrix；Wherein, line number and columns included by the mark matrix are figure The quantity of decent human body key point, for example, the mark matrix can in the case where the quantity of human body key point is integer A Think the mark matrix of an A × A.Line n in the mark matrix of the application indicates n-th of human body key point (for example, number For the human body key point of n), the m column in the mark matrix of the application indicate m-th of human body key point (for example, number is m's Human body key point).The value of line n m column in the mark matrix of the application indicates：N-th of human body key point is in m-th of human body Probability mark value before or after key point.The value range of the probability mark value typically 0-1.People in the application The depth context markup information of body key point can also indicate that the application does not limit human body using other forms such as arrays The specific manifestation form of the depth context markup information of key point.

In an optional example, the depth context markup information of the human body key point of the image pattern in the application It can be and formed using the human body key point Labeling Coordinate information in three dimensions of image pattern；It is also possible to pass through people Work, which marks, to be formed.

A mark in an optional example, in the depth context mark matrix of the human body key point in the application The value of note value can be the first mark value or the second mark value or third mark value.First mark value therein indicates one The depth coordinate markup information of a human body key point in three dimensions is greater than, another human body key point is in three dimensions The sum of depth coordinate markup information and predetermined value；Second mark value therein indicates a human body key point in three dimensions Depth coordinate markup information is less than, the depth coordinate markup information of another human body key point in three dimensions and predetermined value it Difference；Third mark value therein indicates human body key point depth coordinate markup information in three dimensions and another person The absolute value of the difference of the depth coordinate markup information of body key point in three dimensions is no more than predetermined value.The application can adopt The value of a mark value in the depth context mark matrix of human body key point is indicated with following formula (3)：

In above-mentioned formula (3), M_ijIndicate the mark value of the i-th row jth column in depth context mark matrix (as marked Probability value)；1 is above-mentioned first mark value (being referred to as the first Marking Probability value)；0 (can also be with for above-mentioned second mark value Referred to as the second Marking Probability value)；0.5 is above-mentioned third mark value (being referred to as third Marking Probability value)；z_i' indicate i-th Z coordinate mark value in the depth coordinate markup information of a human body key point in three dimensions；z_j' indicate that j-th of human body closes Z coordinate mark value in the depth coordinate markup information of key point in three dimensions；ε indicates predetermined value, i.e. a constant；Predetermined value Size can according to practical application determine.

In an optional example, the application can not only obtain image pattern, can also obtain image pattern at least The characteristic pattern of two human body key points.The quantity at least two of the human body key point of image pattern in the application.Usual In the case of, the quantity of the human body key point of image pattern be it is multiple, for example, 12 of image pattern or 14 or 16 people Body key point.Correspondingly, the quantity at least two of the characteristic pattern of the human body key point of image pattern accessed by the application. In general, the quantity of the characteristic pattern of the human body key point of the application image pattern obtained is multiple, and the application The characteristic pattern of all human body key points would generally be obtained, for example, the quantity in the human body key point of image pattern is 12 or 14 Or in the case where 16, the application can be directed to image pattern, get 12 or 14 or 16 human body key points Characteristic pattern.The application does not limit the particular number of the characteristic pattern of the human body key point of the image pattern got.

In an optional example, the characteristic pattern of the human body key point in the application is special commonly used in expression human body key point Sign.For example, the characteristic pattern of the human body key point in the application can be specially the hotspot graph of human body key point.The application does not limit The specific manifestation form of the characteristic pattern of human body key point.

In an optional example, the application can be obtained by existing two-dimension human body key point Feature Extraction Technology to be schemed The hotspot graph of decent human body key point.For example, the application can use the nerve net for extracting human body key point feature Network (following be known as feature extraction neural network), come obtain image pattern each human body key point hotspot graph.Specifically, this Shen Image pattern can please be supplied to feature extraction neural network, neural network is extracted via this feature and execute human body key point spy Extraction process is levied, to extract the information of neural network output according to this feature, the application can be obtained in image pattern extremely The hotspot graph of two few (as all) human body key point.

In an optional example, depth context neural network to be trained in the application can be first according to figure Decent, form the characteristic value of at least two human body key points, secondly, should depth context neural network be trained according to Its characteristic value formed, calculate between two characteristic values difference (for example, calculate all characteristic values two-by-two in all characteristic values it Between difference), finally, should depth context neural network be trained be based on its calculated difference and form human body key point Depth context.

In an optional example, image pattern is being supplied to depth context neural network to be trained by the application On the basis of, the characteristic pattern of the human body key point of image pattern can also be also provided to depth context nerve to be trained Network is formed for example, the application merges processing (i.e. connection processing) for the characteristic pattern of image pattern and human body key point The tensor (being referred to as input tensor) of image pattern, and by the tensor of image pattern, it is supplied to before and after depth to be trained Relationship neural network, it is pre- with the depth context for executing human body key point via depth context neural network to be trained Survey processing, so that the information that the application can be exported according to depth context neural network to be trained, it is crucial to obtain human body The depth context of point.The characteristic pattern of human body key point by being supplied to depth context nerve to be trained by the application Network is conducive to improve at the depth context prediction of depth context neural network execution human body key point to be trained The accuracy of reason, to be conducive to improve the performance of depth context neural network.

In an optional example, the depth context neural network to be trained in the application can be first according to image The characteristic pattern of sample and human body key point, the characteristic value for forming at least two human body key points (are closed for example, forming all human bodies The characteristic value of key point), then, depth context neural network that should be to be trained calculates two spies according to the characteristic value that it is formed Difference (for example, calculating the difference between all characteristic values two-by-two in all characteristic values) between value indicative, finally, should be wait train Depth context neural network the depth context of human body key point is formed based on its calculated difference.

In an optional example, include in the depth context neural network to be trained of the application：Residual error network In the case that unit, vector differentials computing unit and context form unit, firstly, the application can to image pattern with And the characteristic pattern (for example, characteristic pattern of all human body key points) of at least one human body key point merges processing, is formed defeated Enter tensor, and input tensor is supplied to residual error network unit to be trained, by residual error network unit to be trained for input Tensor forms multiple characteristic values (being referred to as feature vector).Secondly, vector differentials computing unit to be trained is directed to wait instruct Two characteristic values (for example, all characteristic values two-by-two in all characteristic values) in the characteristic value of experienced residual error network unit output, It executes characteristic value difference to calculate, and exports its calculated difference, for example, vector differentials computing unit output to be trained is all The difference of all characteristic values two-by-two in characteristic value.Finally, context to be trained forms unit for vector difference to be trained The difference of value computing unit output is converted to the depth context of human body key point.For example, context to be trained is formed All differences that vector differentials computing unit to be trained exports are respectively converted into the probability value between 0-1 by unit, thus this Application can obtain the depth context matrix of human body key point.

In an optional example, the human body that the application can be exported with depth context neural network to be trained is closed Difference between the depth context of key point and the human body key point depth context markup information of image pattern is guidance Information, using corresponding loss function, is treated trained depth context neural network and is carried out for the purpose of reducing the difference Supervised learning.

In an optional example, the loss function of the application can be expressed as the form of following formula (4)：

In above-mentioned formula (4), C_ij≡C(F_ij) indicate the loss function based on i-th of key point and j-th of key point； M_ijIndicate the mark value (such as Marking Probability value) of the i-th row jth column in depth context mark matrix；P_ijIt indicates to be trained The i-th key point and the corresponding probability value of j-th of key point of depth context neural network output；F_ijIndicate depth to be trained Spend the characteristic value F for i-th of human body key point that context neural network is obtained by calculating_iWith j-th human body key point Characteristic value F_jBetween difference.

In the case where the human body keypoint quantity of image pattern is 16, C_ijQuantity can be 256, the application can benefit With 256 C_ijTrained depth context neural network is treated to exercise supervision study.

In an optional example, reach predetermined iteration in the training for depth context neural network to be trained When condition, this training process terminates.Predetermined iterated conditional in the application may include：Depth context neural network is defeated Difference between the human body key point depth context markup information of human body key point depth context and image pattern out It is different to meet predetermined difference requirement.In the case where difference meets predetermined difference requirement respectively, this is treated before and after trained depth Relationship neural network successfully trains completion.Predetermined iterated conditional in the application also may include：Before the depth to be trained to this Relationship neural network is trained afterwards, and the quantity of used image pattern reaches predetermined quantity requirement etc..In the image used The quantity of sample reaches predetermined quantity requirement, however, this is treated in the case that difference does not meet predetermined difference requirement respectively Trained depth context neural network is not trained successfully.The depth context neural network that success training is completed can be with Depth context prediction for carrying out human body key point to image to be processed is handled.

Fig. 8 is the structural schematic diagram of depth context prediction meanss one embodiment of the human body key point of the application. As shown in figure 8, the device of the embodiment mainly includes：First obtains module 800 and the first depth context module 810.It can Choosing, which can also include：Second obtains module 900, the second depth context module 910, supervision module 920, the One labeling module 930 and the second labeling module 940.

First acquisition module 800 is for obtaining image to be processed.

First depth context module 810 includes neural network.First depth context module 810 is for will be to Processing image is supplied to neural network, handles via the depth context prediction that neural network executes human body key point, to obtain Take the depth context of human body key point.The depth context of human body key point in the application is for indicating human body key Depth location relativeness between point.

In an optional example, first obtains the available image to be processed of module 800 and image to be processed extremely The characteristic pattern of few two human body key points.In this case, the first depth context module 810 can by image to be processed with And the characteristic pattern of the human body key point is both provided to neural network.The characteristic pattern of human body key point in the application can wrap It includes：The hotspot graph of human body key point.

In an optional example, the neural network of the application may include：First unit, second unit and third list Member.First unit therein is used for the characteristic pattern according to image to be processed and human body key point, and it is crucial to form at least two human bodies The characteristic value of point.Second unit therein is used to obtain the difference between characteristic value.Third unit therein is used to be based on difference Form the depth context of human body key point.

In an optional example, first unit can be specially residual error network unit.Second unit can be vector difference It is worth computing unit.The vector differentials computing unit is used for the characteristic value two-by-two in the characteristic value for multiple human body key points, holds Row characteristic value difference calculates, to obtain the difference between characteristic value two-by-two.Third unit can form unit for context.It should Context forms unit and is used to form the depth context of human body key point according at least one difference.

First obtains concrete operations performed by module 800 and the first depth context module 810, may refer to above-mentioned For the description of each step in Fig. 1 in method implementation.Second obtains module 900, the second depth context module 910, supervision module 920, the first labeling module 930 and the second labeling module 940 may refer in following apparatus embodiment For the description of Fig. 9.It is no longer described in detail herein.

Fig. 9 is the structural schematic diagram of training device one embodiment of the neural network of the application.Training cartridge shown in Fig. 9 It sets and mainly includes：Second obtains module 900, the second depth context module 910 and supervision module 920.Optionally, the dress Setting to include：First labeling module 930 and the second labeling module 940.

Second acquisition module 900 is for obtaining image pattern.

Second depth context module 910 includes neural network to be trained.Second depth context module 910 For image pattern to be supplied to neural network to be trained, the depth of human body key point is executed via neural network to be trained Context prediction processing, to obtain the depth context of human body key point.

Supervision module 920 is used to close human body using the depth context markup information of the human body key point of image pattern The depth context of key point exercises supervision, the study so that neural network to be trained exercises supervision.

In an optional example, second obtains at least the two of the available image pattern of module 900 and image pattern The characteristic pattern of a human body key point.In this case, the second depth context module 910 can be by image pattern and human body The characteristic pattern of key point is supplied to neural network to be trained.

First labeling module 930 is used to utilize the Labeling Coordinate of the human body key point of image pattern in three dimensions to believe Breath, forms the depth context markup information of the human body key point of image pattern.

Second labeling module 940 is for providing artificial mark interface, according to based on the artificial mark received information in interface, shape At the depth context markup information of the human body key point of image pattern.

In an optional example, the depth context markup information of the human body key point of the image pattern in the application May include：Characterize markup information of the human body key point before or after another human body key point.Optionally, table Levying markup information of the human body key point before or after another human body key point may include：Characterize a human body Probability mark value of the key point before or after another human body key point.Optionally, before and after the depth of human body key point Relationship marking information may include：The depth context of human body key point marks matrix.It is therein mark matrix line number and Columns is the quantity of human body key point, and the line n for marking matrix indicates n-th of human body key point, marks the m list of matrix Show that m-th of human body key point, the mark value of mark matrix line n m column indicate that n-th of human body key point is closed in m-th of human body Probability mark value before or after key point.

In an optional example, the probability mark value in the application can be：First mark value, the second mark value or Third mark value.First mark value indicates that the depth coordinate markup information of a human body key point in three dimensions is greater than, separately The sum of one human body key point depth coordinate markup information in three dimensions and predetermined value.Second mark value indicates a people The depth coordinate markup information of body key point in three dimensions is less than, another depth of human body key point in three dimensions The difference of Labeling Coordinate information and predetermined value.Third mark value indicates the depth coordinate of a human body key point in three dimensions The absolute value of the difference of markup information and the depth coordinate markup information of another human body key point in three dimensions is no more than pre- Definite value.

Second obtain module 900, the second depth context module 910, supervision module 920, the first labeling module 930 with And second concrete operations performed by labeling module 940, it may refer in above method embodiment for each step in Fig. 7 Description.It is no longer described in detail herein.

Example devices

Figure 10 shows the example devices 1000 for being adapted for carrying out the application, and equipment 1000 can be the control configured in automobile System/electronic system processed, mobile terminal (for example, intelligent mobile phone etc.), personal computer (PC, for example, desktop computer or Person's notebook computer etc.), tablet computer and server etc..

In Figure 10, equipment 1000 includes one or more processor, communication unit etc., one or more of processors Can be：One or more central processing unit (CPU) 1001, and/or, one or more carries out people using neural network The image processor (GPU) 1013 etc. of the depth context prediction of body key point, processor can be according to being stored in read-only deposit Executable instruction in reservoir (ROM) 1002 is loaded into random access storage device (RAM) 1003 from storage section 1008 Executable instruction and execute various movements appropriate and processing.Communication unit 1012 can include but is not limited to network interface card, the net Card can include but is not limited to IB (Infiniband) network interface card.Processor can be deposited with read-only memory 1002 and/or random access Communication to be in reservoir 1003 to execute executable instruction, be connected by bus 1004 with communication unit 1012 and through communication unit 1012 and Other target devices communication, to complete the corresponding steps in the application.

Operation performed by above-mentioned each instruction may refer to the associated description in above method embodiment, herein no longer in detail Explanation.In addition, in RAM 1003, various programs and data needed for device operation can also be stored with.CPU1001, ROM1002 and RAM1003 is connected with each other by bus 1004.

In the case where there is RAM1003, ROM1002 is optional module.RAM1003 stores executable instruction, or is running When executable instruction is written into ROM1002, executable instruction makes central processing unit 1001 execute above-mentioned method for segmenting objects Included step.Input/output (I/O) interface 1005 is also connected to bus 1004.Communication unit 1012 can integrate setting, It can be set to multiple submodule (for example, multiple IB network interface cards), and connect respectively with bus.

I/O interface 1005 is connected to lower component：Importation 1006 including keyboard, mouse etc.；Including such as cathode The output par, c 1007 of ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section including hard disk etc. 1008；And the communications portion 1009 of the network interface card including LAN card, modem etc..Communications portion 1009 passes through Communication process is executed by the network of such as internet.Driver 1010 is also connected to I/O interface 1005 as needed.It is detachable to be situated between Matter 1011, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1010, so as to It is installed in storage section 1008 as needed in from the computer program read thereon.

It should be strongly noted that framework as shown in Figure 10 is only a kind of optional implementation, in concrete practice process In, can the component count amount and type according to actual needs to above-mentioned Figure 10 selected, deleted, increased or replaced；In different function Can in component setting, can also be used it is separately positioned or integrally disposed and other implementations, for example, the separable setting of GPU and CPU, then Such as reason, GPU can be integrated on CPU, the separable setting of communication unit, can also be integrally disposed on CPU or GPU etc..These can be replaced The embodiment changed each falls within the protection scope of the application.

Particularly, it according to presently filed embodiment, may be implemented as calculating below with reference to the process of flow chart description Machine software program, for example, the application embodiment includes a kind of computer program product, it can it includes machine is tangibly embodied in The computer program on medium is read, computer program includes the program code for step shown in execution flow chart, program generation Code may include the corresponding corresponding instruction of step executed in method provided by the present application.

In such an embodiment, which can be downloaded and be pacified from network by communications portion 1009 Dress, and/or be mounted from detachable media 1011.When the computer program is executed by central processing unit (CPU) 1001, hold The row instruction as described in this application for realizing above-mentioned corresponding steps.

In one or more optional embodiments, the embodiment of the present disclosure additionally provides a kind of computer program program production Product, for storing computer-readable instruction, described instruction is performed so that computer executes described in above-mentioned any embodiment Human body key point depth context prediction technique or neural network training method.

The computer program product can be realized especially by hardware, software or its mode combined.In an alternative embodiment In son, the computer program product is embodied as computer storage medium, in another optional example, the computer Program product is embodied as software product, such as software development kit (Software Development Kit, SDK) etc..

In one or more optional embodiments, the embodiment of the present disclosure additionally provides the depth of another human body key point The training method and its corresponding device and electronic equipment of context prediction technique and neural network, computer storage medium, Computer program and computer program product, method therein include：First device sends human body key point to second device Depth context prediction instruction or training neural network instruction, the instruction is so that second device executes any of the above-described possibility Embodiment in human body key point depth context prediction technique or training neural network method；First device receives The depth context prediction result or neural metwork training result for the human body key point that second device is sent.

In some embodiments, the depth context prediction instruction or training neural network instruction of human body key point It can be specially call instruction, first device can indicate that second device executes the depth of human body key point by way of calling Context predicted operation or training neural network operation, accordingly, in response to receiving call instruction, second device can be with Execute any embodiment in the depth context prediction technique of above-mentioned human body key point or the method for training neural network In step and/or process.

It should be understood that the terms such as " first " in the embodiment of the present disclosure, " second " are used for the purpose of distinguishing, and be not construed as Restriction to the embodiment of the present disclosure.It should also be understood that in the disclosure, " multiple " can refer to two or more, " at least one It is a " can refer to one, two or more.It should also be understood that for the either component, data or the structure that are referred in the disclosure, In no clearly restriction or in the case where context provides opposite enlightenment, one or more may be generally understood to.Also answer Understand, the disclosure highlights the difference between each embodiment to the description of each embodiment, it is same or similar it Place can mutually refer to, for sake of simplicity, no longer repeating one by one.

The present processes and device, electronic equipment and computer-readable storage medium may be achieved in many ways Matter.For example, can be realized by any combination of software, hardware, firmware or software, hardware, firmware the present processes and Device, electronic equipment and computer readable storage medium.The said sequence of the step of for method merely to be illustrated, The step of the present processes, is not limited to sequence described in detail above, unless specifically stated otherwise.In addition, some In embodiment, the application can be also embodied as recording program in the recording medium, these programs include for realizing basis The machine readable instructions of the present processes.Thus, the application also covers storage for executing the journey according to the present processes The recording medium of sequence.

The description of the present application is given for the purpose of illustration and description, and is not exhaustively or by this Shen It please be limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selection and Description embodiment is the principle and practical application in order to more preferably illustrate the application, and makes those skilled in the art It will be appreciated that the embodiment of the present application can be so that design the various embodiments with various modifications for being suitable for special-purpose.

Claims

1. a kind of depth context prediction technique of human body key point, which is characterized in that including：

Obtain image to be processed；

The image to be processed is supplied to neural network, pass before and after the depth of human body key point is executed via the neural network It is prediction processing, to obtain the depth context of human body key point；

Wherein, the depth context of the human body key point is used to indicate the opposite pass of depth location between human body key point System.

2. the method according to claim 1, wherein the acquisition image to be processed includes：

Obtain the characteristic pattern of at least two human body key points of image to be processed and image to be processed；

It is described the image to be processed is supplied to neural network to include：

The characteristic pattern of the image to be processed and the human body key point is supplied to neural network.

3. according to the method described in claim 2, it is characterized in that, the characteristic pattern of the human body key point includes：Human body is crucial The hotspot graph of point.

4. the method according to any one of claim 2 to 3, which is characterized in that described to be executed via the neural network The depth context prediction of human body key point is handled：

Via the neural network according to the image to be processed and the characteristic pattern of the human body key point, at least two are formed The characteristic value of human body key point, and the difference between characteristic value is obtained, before forming the depth of human body key point based on the difference Relationship afterwards.

5. method according to claim 1 to 4, which is characterized in that before and after the depth of the human body key point Relationship includes：

Characterize information of the human body key point before or after another human body key point.

6. a kind of training method of neural network, which is characterized in that including：

Obtain image pattern；

Described image sample is supplied to neural network to be trained, it is crucial to execute human body via the neural network to be trained The depth context prediction processing of point, to obtain the depth context of human body key point；

Using described image sample human body key point depth context markup information to the depth of the human body key point Context exercises supervision, the study so that neural network to be trained exercises supervision.

7. a kind of depth context prediction meanss of human body key point, which is characterized in that including：

First obtains module, for obtaining image to be processed；

It include the first depth context module of neural network, for the image to be processed to be supplied to neural network, It is handled via the depth context prediction that the neural network executes human body key point, before the depth to obtain human body key point Relationship afterwards；

8. a kind of training device of neural network, which is characterized in that including：

Second obtains module, for obtaining image pattern；

It include the second depth context module of neural network to be trained, for being supplied to described image sample wait instruct Experienced neural network is handled via the depth context prediction that the neural network to be trained executes human body key point, with Obtain the depth context of human body key point；

Supervision module, the depth context markup information for the human body key point using described image sample is to the human body The depth context of key point exercises supervision, the study so that neural network to be trained exercises supervision.

9. a kind of electronic equipment, including：

Memory, for storing computer program；

Processor, for executing the computer program stored in the memory, and the computer program is performed, and is realized Method described in any one of the claims 1-6.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is executed by processor When, realize method described in any one of the claims 1-6.