CN113610818A

CN113610818A - A Position-Controllable Human Head Segmentation Method

Info

Publication number: CN113610818A
Application number: CN202110917750.5A
Authority: CN
Inventors: 张明琦; 吴茗
Original assignee: Hangzhou Xiaoying Innovation Technology Co ltd
Current assignee: Hangzhou Xiaoying Innovation Technology Co ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-05
Anticipated expiration: 2041-08-11
Also published as: CN113610818B

Abstract

The invention discloses a human head segmentation method based on position control. The human head segmentation device comprises a human head key point detection module, a position correction module and a human head segmentation module, wherein the position correction module corrects the user click position so as to match the position of the human head key point, and the final human head segmentation result is obtained by utilizing the key point information and the human head segmentation module. The invention has the beneficial effects that: the method can accurately divide a single head under a multi-person scene, is more flexible, improves the operation efficiency, and is deployed at a mobile phone end.

Description

Human head segmentation method based on position control

Technical Field

The invention relates to the technical field of image processing, in particular to a human head segmentation method based on position control.

Background

Human head segmentation is a common function in the current short video software, and aims to provide a basis for special-effect playing methods of making facial expression packs, human head stickers, changing animation bodies and the like for users.

At present, most human head segmentation is developed based on a deep learning technology, and a semantic segmentation model is usually adopted to deduct human head parts in an image. However, when there are many people in the picture, the single head of the person cannot be taken in a targeted manner, which greatly reduces the playability. Although there are also example segmentation technologies capable of performing differentiated segmentation on the human head in a multi-person scenario, the current example segmentation technologies are not highly considered in operation, and also require complex post-processing, and are not suitable for end-side deployment.

Disclosure of Invention

The invention provides a human head segmentation method based on position control, which has high operation efficiency and is flexible in order to overcome the defects in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a human head segmentation method based on position control specifically comprises the following steps:

(1) preprocessing an input picture, scaling the resolution to 256x256, then carrying out normalization operation on the input picture, and controlling the range of pixel points to be between-1 and 1;

(2) building a human head key point detection module, inputting the input picture in the step (1) into the human head key point detection module to obtain 1x256x256 key point characteristics, carrying out position analysis on the key point characteristics to obtain the center point coordinates of each human head, and assuming that N is equal to { N ═ N₁,N₂,N₃… indicates the number of heads, where N_i＝{x_i,y_iDenotes the specific position of each person's head in the picture;

(3) assuming that the click position of the user is I ═ x, y, and the click position of the user is not completely in the picture, the position correction module performs nearest distance matching on the positions of the position I circulation and the N heads, and compares the N nearest distances to obtain the head N closest to the click position of the user_jDefaulting the head of the person to the head of the person which needs to be deducted from the picture by the user;

(4) for human head N_jThe coordinates of (a) are subjected to gaussian blur to obtain the condition of the specific position, and the calculation formula of the gaussian blur is as follows:

where σ is 2 and the size of the Gaussian kernel is set to 10The information range of the position condition is enlarged, so that the head of a person at a specific position is better segmented;

(5) constructing a human head segmentation module which is a full convolution neural network and consists of an encoding module and a decoding module, wherein the encoding module consists of 4 feature extraction units, each feature extraction unit consists of two convolution layers and a down-sampling layer, and the multiple of each down-sampling layer is 2, so that the whole encoding module carries out 16 times of down-sampling; in addition, in a decoding module, the size of the feature is restored by using the combination of one convolution layer and one upsampling, 2 times of upsampling is carried out each time, meanwhile, the feature is fused with the feature with the same size in the coding module, and 4 times of operation is carried out according to the mode, so that the output feature with the size consistent with that of the original image is finally obtained; finally, sigmoid function activation is carried out on the output characteristics;

(6) and (3) merging the input picture in the step (1) and the head position condition information in the step (4), and inputting the merged input picture into the head segmentation network in the step (5) to obtain a single head mask at a corresponding position, so as to complete the head segmentation required by the user.

According to the method, the position of the head of a user needing to be segmented is obtained by using the head key point detection network and the position correction module, then the position and the segmentation network for segmenting the head of the user are used for accurately segmenting the head of the user, the influence of other people in a picture is ignored, the single head of the user under a multi-person scene can be accurately segmented, the method is more flexible, the operation efficiency is improved, and the method is deployed at a mobile phone end.

Preferably, in step (2), the human head key point detection module refers to: the full-convolution neural network consists of an encoding module and a decoding module, the encoding module consists of 5 feature extraction units, each feature extraction unit consists of two convolution layers and a down-sampling layer, and the multiple of each down-sampling layer is 2, so that the whole encoding module carries out 32 times of down-sampling; in a decoding module, restoring the size of the feature by using the combination of two convolutional layers and one upsampling, performing 2 times of upsampling each time, simultaneously fusing the feature with the same size in an encoding module, and performing 5 times of operation according to the mode to finally obtain an output feature map with the size consistent with that of an original image; and analyzing the positions of the key points in the output feature diagram, and obtaining the specific coordinate information of each key point in the original drawing by using an expected method.

Preferably, in step (3), the closest distance dst between the position I and the positions of the N human heads is calculated as follows

The invention has the beneficial effects that: the method can accurately divide a single head under a multi-person scene, is more flexible, improves the operation efficiency, and is deployed at a mobile phone end.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment shown in fig. 1, a method for segmenting a human head based on position control specifically includes the following steps:

(2) building a human head key point detection module, inputting the input picture in the step (1) into the human head key point detection module to obtain 1x256x256 key point characteristics, carrying out position analysis on the key point characteristics to obtain the center point coordinates of each human head, and assuming that N is equal to { N ═ N₁,N₂,N₃… indicates the number of heads, where N_i＝{x_i,y_iDenotes the specific position of each person's head in the picture; the human head key point detection module refers to: a complete convolution neural network composed of two modules of coding and decoding, the coding module is composed of 5 feature extraction units, each feature extraction unit is composed of two convolution layers and a down-sampling layer, the multiple of each down-sampling layer is 2, so the whole coding moduleThe block is downsampled by 32 times; in a decoding module, restoring the size of the feature by using the combination of two convolutional layers and one upsampling, performing 2 times of upsampling each time, simultaneously fusing the feature with the same size in an encoding module, and performing 5 times of operation according to the mode to finally obtain an output feature map with the size consistent with that of an original image; analyzing the positions of key points in the output characteristic diagram, and acquiring specific coordinate information of each key point in the original image by using an expected method;

(3) assuming that the click position of the user is I ═ x, y, and the click position of the user is not completely in the picture, the position correction module performs nearest distance matching on the positions of the position I circulation and the N heads, and compares the N nearest distances to obtain the head N closest to the click position of the user_jDefaulting the head of the person to the head of the person which needs to be deducted from the picture by the user; the closest distance dst between the position I and the position of N persons' heads is calculated as follows

the sigma is 2, and the size of the Gaussian kernel is set to be 10 to enlarge the information range of the position condition, so that the human head at a specific position is better segmented;

The whole method comprises a human head key point detection module, a position correction module and a human head segmentation module. And correcting the click position of the user through a position correction module so as to match the position of the key point of the head of the user. And obtaining a final human head segmentation result by using the key point information and the human head segmentation module. The whole system adopts a lightweight model design, and the operation effect is high. According to the method, the position of the head of a user needing to be segmented is obtained by using the head key point detection network and the position correction module, then the position and the segmentation network for segmenting the head of the user are used for accurately segmenting the head of the user, the influence of other people in a picture is ignored, the single head of the user under a multi-person scene can be accurately segmented, the method is more flexible, the operation efficiency is improved, and the method is deployed at a mobile phone end.

Claims

1. a head segmentation method based on position controllable, is characterized in that, specifically comprises the steps:

(1) Preprocess the input image, scale the resolution to 256x256, and then normalize it to control the range of pixels between -1 and 1;

(2) Build a human head key point detection module, input the input image in step (1) into the human head key point detection module, obtain 1x256x256 key point features, analyze the key point features, and obtain the center point coordinates of each human head , assuming that N={N ₁ , N ₂ , N ₃ ,...} represents the number of human heads, where N _i ={x _i , y _i } represents the specific position of each human head in the picture;

(3) Assuming that the user's click position is I={x, y}, and there is a situation where the user's click position is not entirely on the head in the picture, the position correction module performs the closest distance matching by looping the position I with the position of the N head , obtain the head N _j closest to the user's click position by comparing the size of the N closest distances, and set the head as the head that the user needs to deduct from the picture by default;

(4) Gaussian blur is performed on the coordinates of the head N _j to obtain the condition of a specific position. The calculation formula of Gaussian blur is as follows:

Where σ=2, and the size of the Gaussian kernel is set to 10 to increase the information range of the position condition, thereby ensuring better segmentation of the head of a specific position;

(5) Build a head segmentation module, which is a fully convolutional neural network, consisting of two modules: encoding and decoding. The encoding module consists of 4 feature extraction units, each feature extraction unit consists of two convolutional layers and a It consists of down-sampling layers, and the multiple of each down-sampling layer is 2, so the entire encoding module performs 16-fold down-sampling; in addition, in the decoding module, a combination of a convolutional layer and an up-sampling is used to restore the feature size, Each time upsampling is performed twice, and the feature is fused with the feature of the same size in the encoding module, and 4 operations are performed in this way, and the output feature that is the same size as the original image is finally obtained; finally, the sigmoid function is performed on the output feature. activation;

(6) Combine the input picture in step (1) with the head position condition information in step (4), and then input it into the head segmentation network in step (5) to obtain a separate head mask for the corresponding position, and complete the user Head segmentation required.

2. a kind of position-controllable human head segmentation method according to claim 1 is characterized in that, in step (2), the human head key point detection module refers to: a fully convolutional neural network, which is composed of coding and The decoding module consists of two modules, the encoding module consists of 5 feature extraction units, each feature extraction unit consists of two convolutional layers and a down-sampling layer, and the multiple of each down-sampling layer is 2, so the entire encoding module performs a total of 32x downsampling; in the decoding module, a combination of two convolutional layers and one upsampling is used to restore the size of the feature, each time upsampling by a factor of 2, while the feature is the same size as the feature in the encoding module Perform fusion, perform 5 operations in this way, and finally obtain the output feature map with the same size as the original image; analyze the key point positions in the output feature map, and use the desired method to obtain the position of each key point in the original image. The specific coordinate information in the figure.

3. a kind of head segmentation method based on position controllable according to claim 1, is characterized in that, in step (3), the closest distance dst calculation formula of the position of position I and N head is as follows