CN116703707A

CN116703707A - Method for training skin color migration model, method for generating skin care image and related device

Info

Publication number: CN116703707A
Application number: CN202310619988.9A
Authority: CN
Inventors: 陈仿雄
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-09-05

Abstract

The application relates to a method for training a skin color migration model, a method for generating a beauty image and a related device. And fusing the feature map with the smallest size in the plurality of first feature maps and the second feature map through a fusion network to obtain a third feature map. And inputting the third feature map into a first-stage decoding layer of the decoding network, and performing up-sampling operation to obtain an output feature map. And carrying out fusion operation on the output feature map and the target feature map to obtain a first-stage skin color migration feature map. And inputting the first-stage skin color migration feature image to a next-stage decoding layer, and performing layer-by-layer up-sampling operation and fusion operation until an output image with the same size as the source image is obtained, wherein the output image is a skin color migration image. By the method, more accurate, natural and stable skin color migration effect can be realized.

Description

Method for training skin color migration model, method for generating skin care image and related device

Technical Field

The application relates to the technical field of image processing, in particular to a method for training a skin color migration model, a method for generating a beauty image and a related device.

Background

Skin tone migration is an important research topic in the field of image processing. Skin tone migration is the purpose of changing the skin tone (typically a human face or body skin) in one image to the skin tone in another image. In other words, skin tone migration is to apply the skin tone style in the source image to the target image, so that the target image presents similar skin tone features as the source image, thereby realizing beautification, decoration and the like of the portrait of the person. The technology is widely applied to the fields of beauty, face editing, virtual makeup and the like.

The existing method generally utilizes a histogram matching method to carry out skin color migration, however, the histogram matching has some disadvantages in the aspect of skin color migration, such as being sensitive to extreme illumination conditions, when an image is shot under the condition of insufficient illumination or uneven illumination, skin color areas of the image can be darker or uneven in brightness distribution, and the area with darker or uneven brightness cannot be well processed by the histogram matching, so that skin color migration effect is not ideal and stability is lacked.

Disclosure of Invention

The embodiment of the application provides a method for training a skin color migration model, a method for generating a skin care image and a related device, which can improve the skin color migration effect and stability.

In a first aspect, an embodiment of the present application provides a training method for training a skin tone migration model, where the skin tone migration model includes an encoding network, a fusion network, and a decoding network, and the method includes:

the method comprises the steps that a training set is obtained, the training set comprises a plurality of training data, the training data comprise source images and reference images, the source images are original human body images shot by shooting equipment in different illumination environments, the source images are original human body images shot by the shooting equipment in different illumination environments, and the reference images are beautified human body images;

encoding the skin region of the source image through the encoding network to obtain a first feature map with a plurality of sizes, and encoding the skin region of the reference image through the encoding network to obtain a second feature map, wherein the size of the second feature map is the same as the size of the feature map with the smallest size in the plurality of first feature maps;

Fusing the feature map with the smallest size in the plurality of first feature maps and the second feature map through the fusion network to obtain a third feature map;

inputting the third feature map into a first-stage decoding layer of the decoding network to perform up-sampling operation to obtain an output feature map, performing fusion operation on the output feature map and a target feature map to obtain a first-stage skin color migration feature map, inputting the first-stage skin color migration feature map into a next-stage decoding layer, performing up-sampling operation and fusion operation layer by layer until an output image with the same size as the source image is obtained, wherein the target feature map is a feature map with the same size as the output feature map in the first feature maps, and the output image is a skin color migration image;

and calculating the loss of the skin color migration image by adopting a loss function, and carrying out iterative training on the skin color migration model according to the loss until the skin color migration model converges to obtain the skin color migration model.

In some embodiments, the fusing, by the fusing network, the feature map with the smallest size from the plurality of first feature maps and the second feature map to obtain a third feature map includes:

Performing convolution operation on a fourth feature map, extracting mean features and variance features of the fourth feature map, and performing convolution operation on the second feature map, extracting mean features and variance features of the second feature map; the fourth feature map is the feature map with the smallest size in the plurality of first feature maps;

fusing the mean characteristic of the fourth characteristic diagram, the variance characteristic of the fourth characteristic diagram, the mean characteristic of the second characteristic diagram and the variance characteristic of the second characteristic diagram to obtain a fifth characteristic diagram;

and carrying out convolution operation on the fifth characteristic diagram to obtain the third characteristic diagram.

In some embodiments, a fusion formula that fuses the mean feature of the fourth feature map, the variance feature of the fourth feature map, the mean feature of the second feature map, and the variance feature of the second feature map is:

wherein SV is the fourth feature map, TV is the second feature map, μ (SV) is the mean feature of the fourth feature map, σ (SV) is the variance feature of the fourth feature map, μ (TV) is the mean feature of the second feature map, σ (TV) is the variance feature of the second feature map, and IN (SV, TV) is the fifth feature map.

In some embodiments, the method further comprises:

respectively carrying out human body analysis on the source image and the reference image to obtain a human body analysis image of the source image and a human body analysis image of the reference image;

and respectively extracting skin areas in the human body analysis chart of the source image and the human body analysis chart of the reference image to obtain the skin area of the source image and the skin area of the reference image.

In some embodiments, the loss function is:

L＝λ ₁ L _percept +λ ₂ L _skin

wherein the lambda is ₁ And lambda (lambda) ₂ Is super-parametric, said L _percept For a skin tone perceived loss reflecting a difference in skin tone similarity between the source image and the skin tone migrated image, the L _skin For a skin tone histogram loss, the skin tone histogram loss reflects a difference in skin tone distribution between the source image, the reference image, and the skin tone migration image.

In some embodiments, the skin color perception loss L _percept The method comprises the following steps:

R _i ＝C _i W _i H _i

wherein V is the layer number of the convolutional neural network, and R is _i For the number of elements in the ith layer of the convolutional neural network, the C _i 、W _i 、H _i The number, the width and the length of the ith layer characteristic diagram of the convolutional neural network are respectively, and F is ^j Is a feature map with the size of j, S is a source image, Y is a skin color migration image, and F ^j (S) is a feature map of a source image of size j, said F ^j And (Y) is a feature map of the skin color migration image with the size j.

In some embodiments, the skin tone histogram loss L _skin The method comprises the following steps:

wherein T is a reference image and P is a reference image _T The skin area of the reference image is the skin color migration image, the Y is the skin color migration image, and the P is _s P- (T.times.P) for the skin region of the source image _T ) For the probability distribution of the skin color of the reference image, p ⁺ (Y*P _s ) And migrating a probability distribution of skin colors of the image for the skin colors.

In a second aspect, the present application provides a method of generating a skin-friendly image, the method comprising:

acquiring a user image and a skin-beautifying reference image, wherein the user image is an original human body image obtained by shooting under different illumination environments by shooting equipment, and the skin-beautifying reference image is a beautified human body image;

inputting a skin area of a user image and a skin area of a skin-care reference image into a skin color migration model, encoding the skin area of the user image through the encoding network to obtain user image feature images with a plurality of sizes, and encoding the skin area of the skin-care reference image through the encoding network to obtain skin-care reference image feature images, wherein the size of the skin-care reference image feature images is the same as the size of a feature image with the smallest size in the plurality of user image feature images;

Fusing the feature images with the smallest size in the plurality of user image feature images and the skin-beautifying reference image feature images through the fusion network to obtain a fused feature image;

inputting the fusion feature map into a first-stage decoding layer of the decoding network to perform up-sampling operation to obtain an intermediate feature map, performing fusion operation on the intermediate feature map and a target image feature map to obtain a first-stage skin-beautifying feature map, inputting the first-stage skin-beautifying feature into a next-stage decoding layer, and performing up-sampling operation and fusion operation layer by layer until a skin-beautifying image with the same size as the user image is obtained, wherein the target image feature map is a feature map with the smallest size in the plurality of user image feature maps;

the skin color migration model is obtained by training the skin color migration model training method according to any one of the first aspect.

In a third aspect, the present application also provides a computer device comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as in the first aspect above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer device to perform the method of the first aspect above.

The embodiment of the application has the beneficial effects that: the application provides a method for training a skin color migration model, which can realize a high-quality skin color migration effect by using the structures of a coding network, a fusion network and a decoding network. In the method, a skin region of a source image and a reference image is encoded by an encoding network, and a plurality of sizes of first feature images and second feature images are obtained, respectively. And fusing the feature map with the smallest size in the plurality of first feature maps and the second feature map through a fusion network to obtain a third feature map. And inputting the third feature map into a first-stage decoding layer of the decoding network, and performing up-sampling operation to obtain an output feature map. And carrying out fusion operation on the output feature map and the target feature map to obtain a first-stage skin color migration feature map. And inputting the first-stage skin color migration feature image to a next-stage decoding layer, and performing layer-by-layer up-sampling operation and fusion operation until an output image with the same size as the source image is obtained, wherein the output image is a skin color migration image. And calculating the loss of the skin color migration image by adopting a loss function, and carrying out iterative training on the skin color migration model according to the loss until the model converges to obtain the skin color migration model with high-quality skin color migration effect.

The application can more accurately capture and migrate the skin color information between the source image and the reference image, has stronger illumination adaptability, has higher stability and reliability when processing the condition of uneven illumination or insufficient illumination, and realizes the high-quality skin color migration effect.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

FIG. 1 is a flowchart of a method for training a skin tone migration model according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of step S104 in the method shown in FIG. 1;

fig. 3 is a flowchart of a method for generating a skin care image according to an embodiment of the present application;

fig. 4 is a flowchart of a method for generating a skin care image according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a skin color migration model according to an embodiment of the present application.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that, if not in conflict, the features of the embodiments of the present application may be combined with each other, which is within the protection scope of the present application. In addition, while functional block division is performed in a device diagram and logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. Moreover, the words "first," "second," "third," and the like as used herein do not limit the data and order of execution, but merely distinguish between identical or similar items that have substantially the same function and effect.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items.

In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.

Before explaining the present application in detail, terms and terminology involved in the embodiments of the present application will be explained, and the terms and terminology involved in the embodiments of the present application are applicable to the following explanation:

(1) Neural networks, also known as Neural Networks (NNs) or Connection models (Connection models), are mathematical models of algorithms that mimic the behavior of animal neural networks and perform distributed parallel information processing. The neural network depends on the complexity of the system, and the aim of processing information is achieved by adjusting the relation of interconnection among a large number of nodes. In particular, the neural network may be composed of neural units, and may be specifically understood as a neural network having an input layer, an hidden layer, and an output layer, where in general, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Among them, the neural network with many hidden layers is called deep neural network (deep neural network, DNN). The operation of each layer in the neural network can be described by the mathematical expression y=a (w·x+b), from the physical level, and can be understood as the completion of the transformation of the input space into the output space (i.e., the row space into the column space of the matrix) by five operations on the input space (set of input vectors), including 1, dimension up/down; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Where the operations of 1, 2, 3 are done by "w·x", the operation of 4 by "+b", and the operation of 5 by "a ()" are implemented here by the word "space" because the object being classified is not a single thing but a class of things, space refers to the collection of all individuals of such things, where W is a weight matrix of the layers of the neural network, each value in the matrix representing the weight value of one neuron of that layer. The matrix W determines the above spatial transformation of the input space into the output space, i.e. W at each layer of the neural network controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, and more specifically to learn the weight matrix.

It should be noted that in the embodiments of the present application, the neural network is essentially based on the model employed by the machine learning task.

Common components in the neural network comprise a convolution layer, an up-sampling layer and the like, a model is designed by assembling the common components in the neural network, and when model parameters (weight matrixes of all layers) are determined so that model errors meet preset conditions or the number of adjusted model parameters reaches a preset threshold value, the model converges.

The convolution layer is configured with a plurality of convolution kernels, and each convolution kernel is provided with a corresponding step length so as to carry out convolution operation on the image. The convolution operation aims at extracting different characteristics of an input image, effectively preserving the spatial information of the image, reducing the parameter quantity and improving the calculation efficiency. The first layer of convolution may only extract some low-level features such as edges, lines, and corners, and the deeper convolution may iteratively extract more complex features from the low-level features.

The main purpose of the upsampling layer is to enlarge the size of the input feature map in order to process larger-sized images in subsequent layers of the neural network. The upsampling layer plays an important role in some computer vision tasks, such as image segmentation, super resolution, and generation of a countermeasure network (GAN), etc. The upsampling layer typically occurs in the decoding portion of the encoding-decoding structure for recovering the higher resolution output image from the lower resolution feature map.

(2) A penalty function refers to a function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "penalty" of the random event. The loss function is a non-negative real function used to quantify the difference between the predicted and actual labels of the model predictions. In application, the loss function is typically associated with an optimization problem as a learning criterion, i.e., solving and evaluating the model by minimizing the loss function. For example, in statistics and machine learning, are used for parameter estimation of models (parametric estimation). In the process of training the neural network, because the output of the neural network is expected to be as close to the value actually expected, the weight matrix of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the actually expected target value (however, an initialization process is usually performed before the first update, that is, the parameters are preconfigured for each layer in the neural network), for example, if the predicted value of the network is higher, the weight matrix is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the neural network can predict the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network becomes the process of reducing the loss as much as possible.

Before the embodiments of the present application are described, a skin color migration method known to the present inventors is first briefly described, so that the embodiments of the present application are convenient to understand later.

With the popularity of beauty, more and more users want to perform beauty treatment on characters in pictures through a beauty tool, especially when users browse the complexion style shared by other people on the internet, the complexion effect migration of other people is often hoped to be applied to own images, so as to realize similar beauty effect.

Along with the development of the neural network, the skin color migration method also has been greatly improved. In the disclosed technical scheme, the skin tone migration method commonly used at present is a histogram matching method, the method comprises the steps of calculating cumulative histograms of an input image and a reference image, wherein the cumulative histograms represent the proportion of the number of pixels with pixel values smaller than or equal to a given value to the total number of pixels, and the difference of pixel value distribution of the input image and the reference image can be known by comparing the cumulative histograms. The cumulative histogram of the input image is matched with the cumulative histogram of the reference image, so that a mapping relation from the pixel value of the input image to the pixel value of the reference image can be obtained, and the pixel value of the input image is adjusted according to the established mapping relation. And replacing each pixel value of the input image with a corresponding reference image pixel value in the mapping relation, so that histogram matching is realized. However, in practical application, once the ambient light is insufficient or uneven during shooting, the shot image and the skin area of the person are dark or uneven in brightness, and the histogram matching method is a global method based on statistical information, so that the effect is poor during processing of local brightness, the effect of processing is not ideal and the stability is lacking in facing the image shot in extreme illumination environment.

Accordingly, referring to fig. 6, as shown in fig. 6, the skin color migration model includes an encoding network, a fusion network and a decoding network, and the encoding network encodes skin areas of a source image and a reference image to obtain a first feature map and a second feature map with multiple sizes respectively. And fusing the feature map with the smallest size in the plurality of first feature maps and the second feature map through a fusion network to obtain a third feature map. And inputting the third feature map into a first-stage decoding layer of the decoding network, and performing up-sampling operation to obtain an output feature map. And carrying out fusion operation on the output feature map and the target feature map to obtain a first-stage skin color migration feature map. And inputting the first-stage skin color migration feature image to a next-stage decoding layer, and performing layer-by-layer up-sampling operation and fusion operation until an output image with the same size as the source image is obtained, wherein the output image is a skin color migration image. The method can realize more accurate, natural and stable skin color migration effect.

The technical scheme of the application is specifically described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for training a skin color migration model according to an embodiment of the present application. The method S100 may specifically include the following steps:

S101: a training set is obtained.

The training set comprises a plurality of training data, the training data comprises a source image and a reference image, the source image is an original human body image shot by the shooting equipment under different illumination environments, and the reference image is a beautified human body image. The source image may include original human body images obtained by various photographing devices under different terminal illumination environments, the reference image may be a standard human body image subjected to beautification treatment, and attributes such as skin color, brightness and contrast are optimized, so that the target effect of skin color migration can be achieved.

The method adopts an unsupervised training method, so that paired source image and reference image data are not needed, the workload of data collection and labeling is simplified, and the method can effectively meet the diversified demands in practical application. The unsupervised training method enables the skin color migration model to autonomously learn the mapping relation from the source image to the reference image effect under various illumination conditions, so that the skin color migration model has stronger generalization capability and practicability, and the skin color migration model obtained through training has higher stability.

It is understood that training data may be collected in advance by those skilled in the art. In some embodiments, the number of training data is tens of thousands, for example 20000, which is beneficial for training to get an accurate generic model. The number of training data can be determined by a person skilled in the art according to the actual situation.

S102: the skin areas of the source image and the reference image are extracted.

Specifically, in some embodiments, the source image and the reference image may be respectively subjected to human body analysis, a human body analysis image of the source image and a human body analysis image of the reference image are obtained, skin areas in the human body analysis image of the source image and the human body analysis image of the reference image are respectively extracted, and skin areas of the source image and skin areas of the reference image are obtained.

Human body analysis refers to dividing a person captured in an image into a plurality of semantically uniform regions, for example, a body part and clothing, or a subdivision class of a body part and a subdivision class of clothing, or the like. I.e. the input image is identified at the pixel level and each pixel point in the image is annotated with the object class to which it belongs.

In some embodiments, the body resolution map may be obtained by using an existing body resolution algorithm, alternatively, in this embodiment, using a graphomyces algorithm, the graphomyces algorithm may divide the image into 20 categories, and may use different colors to divide the portions. In some embodiments, the 20 categories described above may also be divided by the reference numerals 0-19, for example 0 for background, 1 for hat, 2 for hair, 3 for glove, 4 for sunglasses, 5 for coat, 6 for dress, 7 for coat, 8 for sock, 9 for trousers, 10 for torso skin, 11 for scarf, 12 for half skirt, 13 for face, 14 for left arm, 15 for right arm, 16 for left leg, 17 for right leg, 18 for left shoe, and 19 for right shoe. From the human body analysis chart, the category to which each part in the image belongs can be determined, wherein the skin area includes 10 (torso skin), 13 (face), 14 (left arm), 15 (right arm), 16 (left leg), and 17 (right leg).

In some embodiments, parsing categories may be simplified. For example, the categories of pixels of the skin tone feature such as 10 (torso skin), 13 (face), 14 (left arm), 15 (right arm), 16 (left leg), and 17 (right leg) are set to 1 and the rest are set to 0, according to the analysis needs.

It can be understood that the analysis category is simplified by the method, the analysis error can be reduced, the skin color information is unchanged in the training process of each round of models, and the convergence speed and the accuracy of the skin color migration model are improved.

S103: the method comprises encoding a skin region of a source image through an encoding network to obtain a first feature map of a plurality of sizes, and encoding a skin region of a reference image through the encoding network to obtain a second feature map. The second feature map has the same size as the smallest feature map of the plurality of first feature maps.

In this embodiment, a pre-trained network model is used as an initial model, so as to reduce the complexity of training a skin color migration model, reduce training time and improve model performance. For example, a pretrained VGG neural network is used as a coding network to perform a coding operation on a skin region of a source image to extract six first feature maps with different sizes, namely SV _{_224*224} 、SV _{_112*112} 、SV _{_56*56} 、SV _{_28*28} 、SV _{_14*14} And SV(s) _{_7*7} They have different spatial resolutions in order to capture characteristic information of the source image over different sizes. Simultaneously, the same pretrained VGG neural network is used for encoding the skin area of the reference image so as to extract the second feature map TV with the minimum size _{_7*7} 。

It can be understood that the VGG network may include a VGG19 network, where the VGG19 network is a convolutional neural network including 19 hidden layers (16 convolutional layers and 3 fully-connected layers), and the VGG19 network is used to extract image features, and may reflect differences, i.e. perceived loss, between a predicted sample and a real sample under the same VGG19 feature extraction, and may improve the depth of the network and improve the effect of the neural network to a certain extent under the condition of ensuring that the VGG19 has the same perceived field.

In other embodiments, to implement feature extraction on the skin area, common network structures such as VGG neural network, res net (residual network), densnet (dense connection network) and the like may be adopted, appropriate network depth and width are selected according to task requirements and computational resource limitations, then a deep learning framework (such as TensorFlow, pyTorch) is used to build a coding network model, and the coding network model is trained and optimized. Compared with the pre-trained network model, the coding network constructed by the method is used for training the skin color migration model together, so that the complexity of training the skin color migration model is increased, the convergence of the skin color migration model is not facilitated, and the accuracy of the skin color migration model is reduced.

S104: and fusing the feature images with the smallest size in the plurality of first feature images and the second feature images through a fusion network to obtain a third feature image.

The structure of the fusion network can have a plurality of different forms, including a parallel network (Parallel Networks), a serial network (Sequential Networks), a residual network (Residual Networks) and an attention network (Attention Networks), and can be selected according to specific application scenarios and task requirements of the skin color migration model. In this embodiment, the fusion network is configured to perform feature fusion on the feature map with the smallest size and the second feature map in the plurality of first feature maps to obtain a third feature map, and this process may be generally implemented through operations such as convolution, stitching, adding, multiplying, and the like. The third feature map obtained after fusion has better skin color feature representation capability and richer skin color feature information.

Specifically, in some embodiments, as illustrated in fig. 2, step S104 includes the steps of:

s1041: performing convolution operation on the fourth feature map, extracting the mean feature of the fourth feature map and the variance feature of the fourth feature map, and performing convolution operation on the second feature map, extracting the mean feature of the second feature map and the variance feature of the second feature map; the fourth feature map is the feature map with the smallest size in the first feature maps, and the size of the fourth feature map is the same as that of the second feature map.

Extracting the mean and variance features of the feature map is a common method in deep learning, which can provide statistical information about local areas of the image for the neural network. The mean feature describes the average level of pixel values in the feature map, while the variance feature describes the degree of dispersion of the pixel value distribution. These features may help the neural network to better understand texture, color, and local structure in the image.

It will be appreciated that in training the skin tone migration model, extracting the mean and variance features of the feature map may help to improve the problem of uneven brightness distribution, and by extracting these statistical features, the model may better capture the brightness variations of the source and reference images in the local area. This helps to preserve the natural feel of the image during skin tone migration.

In the present embodiment, the fourth feature map is SV\u _7*7 The second characteristic diagram is TV u _7*7 For SV/u respectively _7*7 And TV/u _7*7 And extracting the mean value characteristic and the variance characteristic.

S1042: and fusing the mean characteristic of the fourth characteristic diagram, the variance characteristic of the fourth characteristic diagram, the mean characteristic of the second characteristic diagram and the variance characteristic of the second characteristic diagram to obtain a fifth characteristic diagram.

Specifically, the following fusion formula is adopted for fusion:

Wherein SV is the fourth feature map SV\u _7*7 TV is the second feature map TV u _7*7 μ (SV) is a mean feature of the fourth feature map, σ (SV) is a variance feature of the fourth feature map, μ (TV) is a mean feature of the second feature map, σ (TV) is a variance feature of the second feature map, and IN (SV, TV) is a fifth feature map.

It can be understood that by fusing the mean feature of the fourth feature map and the variance feature of the fourth feature map, the mean feature of the second feature map and the variance feature of the second feature map, the skin color migration model can better adjust the brightness distribution between the source image and the reference image, thereby being beneficial to keeping the natural sense of the image in the skin color migration process and reducing the phenomenon of uneven brightness.

S1043: and carrying out convolution operation on the fifth characteristic diagram to obtain the third characteristic diagram.

It can be understood that the convolution operation is further performed on the fifth feature map, so as to deepen the fusion degree of the network, extract the feature information of a higher level, and this step can reduce the complexity of the model, help to reduce the over-fitting phenomenon, and improve the generalization capability of the model in practical application.

In other embodiments, when the fifth feature map is convolved, a nonlinear activation function (such as ReLU, tanh, etc.) may be introduced to help the model capture more complex image features and relationships, thereby improving the accuracy and effect of skin color migration.

S105: and inputting a third feature map into a first-stage decoding layer of a decoding network to perform up-sampling operation to obtain an output feature map, performing fusion operation on the output feature map and a target feature map to obtain a first-stage skin color migration feature map, inputting the first-stage skin color migration feature map into a next-stage decoding layer, performing the up-sampling operation and the fusion operation layer by layer until an output image with the same size as that of a source image is obtained, wherein the target feature map is a feature map with the same size as that of the output feature map in the first feature map, and the output image is a skin color migration image.

Common decoding network structures include a transposed Convolution (Transposed Convolution) decoding network, a Deconvolution (Deconvolution) decoding network, an upsampling+convolution (upsampling+con-volution) decoding network. The decoding network structure can be selected and adjusted according to specific tasks and requirements, and the main purpose of the decoding network structure is to gradually restore the spatial resolution of the input feature map so as to generate a skin color migration image with the same size as the source image. One skilled in the art may attempt to obtain the best skin tone migration effect by trying different configurations and parameter settings.

In this embodiment, the structure of the decoding network employs an upsampling+convolution (upsampling+convolution) decoding network. The decoding network amplifies the input feature map through an up-sampling layer (such as bilinear interpolation, nearest neighbor interpolation and the like), and then uses a convolution layer to extract or fuse the features. The decoding network constructed in this embodiment includes a plurality of cascaded decoding layers, where the decoding layers include an upsampling layer and a convolution layer, the upsampling layer performs upsampling processing on an input feature map, and a feature map size obtained by upsampling is the same as a plurality of sizes of the first feature map one by one.

Specifically, inputting the third feature map into a first-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the first-stage decoding layer to obtain an output feature map with the size of 7*7, and fusing the output feature map with the size of 7*7 and the first feature map with the size of 7*7 through a convolution layer of the first-stage decoding layer to obtain a first-stage skin color migration feature map; inputting the first-stage skin color migration feature map into a second-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the second-stage decoding layer to obtain an output feature map with the size of 14 x 14, and fusing the output feature map with the size of 14 x 14 and the first feature map with the size of 14 x 14 through a convolution layer of the second-stage decoding layer to obtain a second-stage skin color migration feature map; inputting the second-stage skin color migration feature map into a third-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the third-stage decoding layer to obtain an output feature map with the size of 28 x 28, and fusing the output feature map with the size of 28 x 28 and the first feature map with the size of 28 x 28 through a convolution layer of the third-stage decoding layer to obtain a third-stage skin color migration feature map; inputting the third-level skin color migration feature map into a fourth-level decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the fourth-level decoding layer to obtain an output feature map with the size of 56 x 56, and fusing the output feature map with the size of 56 x 56 and the first feature map with the size of 56 x 56 through a convolution layer of the third-level decoding layer to obtain the fourth-level skin color migration feature map; inputting the fourth-level skin color migration feature map into a fifth-level decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the fifth-level decoding layer to obtain an output feature map with the size of 112 x 112, and fusing the output feature map with the size of 112 x 112 and the first feature map with the size of 112 x 112 through a convolution layer of the fifth-level decoding layer to obtain the fifth-level skin color migration feature map; inputting the fifth-stage skin color migration feature map into a sixth-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the sixth-stage decoding layer to obtain an output feature map with the size of 224 x 224, and fusing the output feature map with the size of 224 x 224 and the first feature map with the size of 224 x 224 through a convolution layer of the sixth-stage decoding layer to obtain the sixth-stage skin color migration feature map; and finally, performing up-sampling operation on the sixth-stage skin color migration feature image to obtain an output image with the same size as the source image, wherein the output image is a skin color migration image.

It will be appreciated that the convolution layer itself is not dedicated to feature fusion, but feature fusion may be achieved by a specific design and structure. For example, the 1x1 convolution kernel (Pointwise Convolution) may be used to integrate information of different channels, because the 1x1 convolution integrates the input multiple channel information into one output channel, reducing the number of channels, reducing the number of parameters and the amount of computation, and achieving feature fusion to some extent. In addition, when a decoding network is constructed, an up-sampling layer and a convolution layer can be added layer by layer, and a batch normalization layer and an activation function are added between each layer to realize feature fusion.

In other embodiments, more advanced feature fusion may be performed using specific network structures, such as skip connection (skip connection) or fusion modules, which may fuse different levels of feature maps together to achieve a richer multi-dimensional feature expression. In short, the role of the convolution layer in feature fusion depends on the design of the network structure by those skilled in the art, and can be adjusted according to the effect target of the skin color migration model training.

S106: and calculating the loss of the skin color migration image by adopting a loss function, and carrying out iterative training on the skin color migration model according to the loss until the skin color migration model converges to obtain the skin color migration model.

Loss functions are widely used in machine learning model training to quantify the difference between model predictions and true values. In this embodiment, the loss function includes a skin tone perception loss for reflecting a difference in skin tone similarity between the source image and the skin tone migration image, and a skin tone histogram loss for reflecting a difference in skin tone distribution between the source image, the reference image, and the skin tone migration image.

In some embodiments, the loss function constitutes the formula:

L＝λ ₁ L _percept +λ ₂ L _skin

wherein lambda is ₁ And lambda (lambda) ₂ Is super-parameter, L _percept For skin color perception loss, L _skin Is a loss of skin tone histogram.

In some embodiments, the skin color perception loss L _percept The composition formula of (2) is:

R _i ＝C _i W _i H _i

wherein V is the layer number of the convolutional neural network, R _i For elements in the ith layer of convolutional neural networkPrime number, C _i 、W _i 、H _i Respectively expressed as the number, width and length of ith layer characteristic diagrams of the convolutional neural network, F ^j Is a feature map with the size of j, S is a source image, Y is a skin color migration image and F ^j (S) is a feature map of a source image of size j, F ^j (Y) is a feature map of a skin color migration image with the size of j, and the paradigm is I.I.I. ] of the skin color migration image ₁ To solve for the absolute value of the difference between the two values.

It will be appreciated that the value of the dimension j is determined according to the dimension of the feature map output when the code network performs the downsampling operation, and in different code networks, those skilled in the art can change the dimension according to the actual requirement of the model training. In this embodiment, the encoding network extracts six feature maps of different sizes of the source image, the sizes are 224×224, 112×112, 56×56, 28×28, 14×14 and 7*7, respectively, so j= (7*7, 14×14, 28×28, 56×56, 112×112, 224×224).

In some embodiments, the constituent formula of the skin color histogram loss Lskin is:

wherein T is a reference image, P _T The skin area of the reference image is Y, the skin color migration image is P _s Is the skin area of the source image, T.times.P _T For reference image skin color, Y.times.P _s Migrating the skin tone of an image for skin tone, P- (T x P) _T ) To reference the probability distribution of skin color of the image, p ⁺ (Y*P _s ) The probability distribution of the skin tone of the image is migrated for the skin tone.

It can be appreciated that if the skin tone difference between each skin tone migration image and the reference image in the training set is smaller, the skin tone migration image and the reference image are more similar, which means that the skin tone migration effect can accurately restore the skin tone effect of the reference image. Therefore, parameters of the skin color migration model can be adjusted according to differences between each skin color migration image and the reference image in the training set, and iterative training is carried out on the skin color migration model until the skin color migration model converges, so that the skin color migration model is obtained. In some embodiments, the skin tone migration model includes a fusion network and an encoding network, and the model parameters include model parameters of the fusion network and model parameters of the decoding network.

In some embodiments, the adam algorithm is used to optimize (Adaptive Moment Estimation Algorithm) model parameters, for example, the iteration number is set to 10 ten thousand, the initialization learning rate is set to 0.005, the weight attenuation of the learning rate is set to 0.0005, each 500 iterations are performed to obtain adjusted model parameters output by the adam algorithm, then the adjusted model parameters are used to perform the next training until the fusion network and the decoding network converge, and the converged model parameters are output, thus obtaining the skin color migration model.

In summary, the present invention provides a method for training a skin color migration model, which can implement a high quality skin color migration effect by using the structures of the encoding network, the fusion network and the decoding network. In the method, a skin region of a source image and a reference image is encoded by an encoding network, and a plurality of sizes of first feature images and second feature images are obtained, respectively. And fusing the feature map with the smallest size in the plurality of first feature maps and the second feature map through a fusion network to obtain a third feature map. And inputting the third feature map into a first-stage decoding layer of the decoding network, and performing up-sampling operation to obtain an output feature map. And carrying out fusion operation on the output feature map and the target feature map to obtain a first-stage skin color migration feature map. And inputting the first-stage skin color migration feature image to a next-stage decoding layer, and performing layer-by-layer up-sampling operation and fusion operation until an output image with the same size as the source image is obtained, wherein the output image is a skin color migration image. And calculating the loss of the skin color migration image by adopting a loss function, and carrying out iterative training on the skin color migration model according to the loss until the model converges to obtain the skin color migration model with high-quality skin color migration effect. The method can more accurately capture and migrate the skin color information between the source image and the reference image, and meanwhile, the model training process considers the image data under various illumination conditions, has better adaptability when the conditions of uneven illumination or insufficient illumination are processed, and improves the stability and reliability of skin color migration in practical application.

After the skin color migration model is obtained through training by the method for training the skin color migration model, the skin color migration model can be utilized to carry out skin color migration, and a skin beautifying image is generated. Referring to fig. 3, fig. 3 is a flowchart of a method for generating a skin care image according to an embodiment of the present application, as shown in fig. 3, the method S200 includes the following steps:

s201: and acquiring a user image and a skin-care reference image.

Specifically, the user image and the skin-care reference image are selected by the user, wherein the user image refers to an original human body image which needs to be subjected to skin-care treatment and is provided by the user. The user image can come from different photographing devices (such as a mobile phone, a camera and the like) and photos of the human body photographed under various illumination conditions; the skin-beautifying reference image is a human body image after beautifying treatment, and has ideal skin color and illumination conditions. These images are typically from work taken by a professional photographer, post-processed, or processed by high quality beauty software, and the user can obtain a skin care reference image by means such as the internet.

S202: the skin areas of the user image and the skin-beautifying reference image are extracted.

Specifically, the human body analysis is performed on the user image and the skin-beautifying reference image respectively, a human body analysis image of the user image and a human body analysis image of the skin-beautifying reference image are obtained, skin areas in the human body analysis image of the user image and the human body analysis image of the skin-beautifying reference image are extracted respectively, and skin areas of the user image and the skin areas of the skin-beautifying reference image are obtained.

The method for acquiring the skin area is described in detail in step S102, and in some embodiments, the method may be acquired with reference to step S102, which is not described in detail herein.

S203: inputting the skin area of the user image and the skin area of the skin-beautifying reference image into a skin color migration model, coding the skin area of the user image through a coding network to obtain user image feature images with multiple sizes, and coding the skin area of the skin-beautifying reference image through the coding network to obtain the skin-beautifying reference image feature images.

Specifically, in some embodiments, a pre-trained VGG neural network is used as a coding network to perform a coding operation on a skin region of a user image to extract six user image feature maps with different sizes, 224×224, 112×112, 56×56, 28×28, 14×14, and 7*7, respectively, which have different spatial resolutions, so as to capture feature information of the user image on different sizes. Meanwhile, the same pretrained VGG neural network is used for carrying out coding operation on the skin area of the skin-care reference image so as to extract the skin-care reference image characteristic diagram with the size of 7*7.

The skin color migration model is obtained by training the skin color migration model in any one of the above embodiments, and will not be described herein.

S204: and fusing the feature images with the minimum size in the plurality of user image feature images and the skin-beautifying reference image feature images through the fusion network to obtain a fused feature image.

Specifically, convolution operation is performed on the user image with the size of 7*7 and the skin-care reference image feature map, mean features and variance features are extracted respectively, the mean features of the user image and the variance features of the user image, the mean features of the skin-care reference image and the variance features of the skin-care reference image are fused, convolution operation is further performed on the obtained fusion feature map, and feature information of higher layers is extracted to obtain the fusion feature map.

The network structure and the formula used for the fusion are described in detail in steps S1041 to S1043, and the description thereof will not be repeated here.

S205: and inputting the fused feature map into a first-stage decoding layer of the decoding network to perform up-sampling operation to obtain the intermediate feature map, performing fusion operation on the intermediate feature map and feature maps with the same size as that in the user image feature map to obtain a first-stage skin-beautifying feature map, inputting the first-stage skin-beautifying feature map into a next-stage decoding layer, and performing up-sampling operation and fusion operation layer by layer until a skin-beautifying image with the same size as that of the user image is obtained.

Specifically, in some embodiments, the fused feature map is input into a first-stage decoding layer of the decoding network, an up-sampling operation is performed through an up-sampling layer of the first-stage decoding layer to obtain an intermediate feature map with a size of 7*7, and the intermediate feature map with a size of 7*7 and the user image feature map with a size of 7*7 are fused through a convolution layer of the first-stage decoding layer to obtain a first-stage skin-beautifying feature map; inputting the first-stage skin-beautifying feature map into a second-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the second-stage decoding layer to obtain an intermediate feature map with the size of 14 x 14, and fusing the intermediate feature map with the size of 14 x 14 and a user image feature map with the size of 14 x 14 through a convolution layer of the second-stage decoding layer to obtain the second-stage skin-beautifying feature map; inputting the second-stage skin-beautifying feature map into a third-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the third-stage decoding layer to obtain an intermediate feature map with the size of 28 x 28, and fusing the intermediate feature map with the size of 28 x 28 and a user image feature map with the size of 28 x 28 through a convolution layer of the first-stage decoding layer to obtain the third-stage skin-beautifying feature map; inputting the third-stage skin-beautifying feature map into a fourth-stage decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the fourth-stage decoding layer to obtain a middle feature map with the size of 56 x 56, and fusing the middle feature map with the size of 56 x 56 and a user image feature map with the size of 56 x 56 through a convolution layer of the first-stage decoding layer to obtain the fourth-stage skin-beautifying feature map; inputting the fourth-level skin-beautifying feature map into a fifth-level decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the fifth-level decoding layer to obtain an intermediate feature map with the size of 112 x 112, and fusing the intermediate feature map with the size of 112 x 112 and the user image feature map with the size of 112 x 112 through a convolution layer of the fifth-level decoding layer to obtain the fifth-level skin-beautifying feature map; inputting the fifth-level skin-beautifying feature map into a sixth-level decoding layer of a decoding network, performing up-sampling operation through an up-sampling layer of the sixth-level decoding layer to obtain an intermediate feature map with the size of 224 x 224, and fusing the intermediate feature map with the size of 224 x 224 and the user image feature map with the size of 224 x 224 through a convolution layer of the sixth-level decoding layer to obtain the sixth-level skin-beautifying feature map; and finally, performing up-sampling operation on the sixth-level skin-beautifying feature map to obtain an output image with the same size as the user image, wherein the output image is a skin-beautifying image.

The structural design of the decoding network is already described in detail in step S105, and the description thereof will not be repeated here.

In short, by designing the skin color migration model and the training method of the skin color migration model, training is carried out to obtain a converged skin color migration model, and based on the trained skin color migration model, a user only needs to input a user image and a skin-beautifying reference image, and the skin color migration model can output a skin-beautifying image to the user. The skin-beautifying image generated by the trained skin color migration model provided by the embodiment of the application has better adaptability to the illumination environment of the image, and can still stably obtain a high-quality skin color migration effect in the face of the user image shot in the extreme illumination environment.

In some embodiments, after the skin tone migration model is trained by the method for training the skin tone migration model provided by the embodiment of the application, the skin tone migration model can be utilized to apply to skin tone migration and generate a skin care image. The method for generating the skin care image provided by the embodiment of the application can be implemented by various types of electronic equipment with calculation processing capacity, such as an intelligent terminal, a server and the like.

The method for generating the skin care image provided by the embodiment of the application is described below in connection with exemplary application and implementation of the terminal provided by the embodiment of the application. Referring to fig. 4, fig. 4 is a flowchart of a method for generating a skin care image according to an embodiment of the present application. The method S300 comprises the steps of:

S301: training a skin color migration model to obtain a trained skin color migration model.

In some embodiments, the skin tone migration model is trained using a server, which may be multiple in number, which may constitute a server cluster, for example: the server cluster includes: the first server, the second server, …, the nth server, or the server cluster may be a cloud computing service center including a plurality of servers. The server in the embodiment of the application comprises but is not limited to: tower server, rack server, blade server, cloud server. Optionally, the server is a cloud server (Elastic Compute Service, ECS).

It can be understood that the skin tone migration model is obtained by training based on the training method of the skin tone migration model in any one of the above steps, the skin tone migration model is stored in the server after the server is trained, and the user can establish communication connection with the server through the intelligent terminal, so that the skin tone migration model is used.

S302: and acquiring a user image and a skin-care reference image.

Specifically, the intelligent terminal is in communication connection with the server, and the server acquires the user image and the skin-care reference image through the intelligent terminal. For example, after the user inputs the user image and the skin-care reference image through the intelligent terminal input interface, the server automatically acquires the user image and the skin-care reference image, or the intelligent terminal is provided with a camera, the user image and the skin-care reference image are acquired through the camera and uploaded to the server, or the user selects the user image library and the skin-care reference image library uploading server stored in the intelligent terminal, the user can select the uploading server from the user image library and the skin-care reference image library, or the intelligent terminal uploads the skin-care reference image and the user video downloaded by the user through a network to the server.

S303: and inputting the user image and the skin-beautifying reference image into a skin color migration model to obtain a skin-beautifying image.

Specifically, after obtaining a user image and a skin-beautifying reference image transmitted from an intelligent terminal of a user, the server inputs the user image and the skin-beautifying reference image into a trained skin color migration model, the skin color migration model migrates the skin color style of the skin-beautifying reference image to the user image, outputs the skin-beautifying image after migration is completed, and transmits the skin-beautifying image to the intelligent terminal of the user.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device includes at least one processor 11 and a memory 12 (bus connection, one processor is taken as an example in fig. 5) that are communicatively connected. It will be appreciated by those skilled in the art that the configuration shown in fig. 5 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the electronic device may also include more or fewer components than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

Wherein the processor 11 is configured to provide computing and control capabilities for controlling the electronic device 10 to perform corresponding tasks, e.g. for controlling the electronic device 10 to perform the method provided by any of the embodiments of the present application.

It is understood that the processor 11 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The memory 12 is used as a non-transitory computer readable storage medium, and can be used to store a non-transitory software program, a non-transitory computer executable program, and a module, such as a program instruction/module corresponding to a method for training a skin color migration model or a program instruction/module corresponding to a method for generating a skin care image in the embodiment of the present application. The processor 11 may implement the skin tone migration model training method in any of the method embodiments described above, or generate skin tone images, by running non-transitory software programs, instructions, and modules stored in the memory 12, where the memory 12 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 12 may also include memory located remotely from the processor, which may be connected to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

It will be appreciated that the electronic device 10 further comprises transmission means 13, the transmission means 13 being arranged to receive or transmit data via a network, e.g. the transmission means 13 being arranged to transmit user images and/or skin care reference images uploaded by the user or skin care images returned to the user. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device. In one example, the transmission means 13 comprises a network adapter (Network Interface Controller, NIC) which can be connected to other network devices via a base station so as to communicate with the internet. In one example, the transmission device 13 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a computer, cause the computer to perform a method of training a skin tone migration model or a method of generating a skin care image as in the previous embodiments.

Embodiments of the present application also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method of training a skin tone migration model or a method of generating a skin care image as in the previous embodiments.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program may include processes implementing the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method of training a skin tone migration model, the skin tone migration model comprising an encoding network, a blending network, and a decoding network, the method comprising:

acquiring a training set, wherein the training set comprises a plurality of training data, the training data comprises a source image and a reference image, the source image is an original human body image obtained by shooting under different illumination environments by shooting equipment, and the reference image is a beautified human body image;

2. The method according to claim 1, wherein the fusing, by the fusing network, the feature map with the smallest size from the plurality of first feature maps and the second feature map to obtain a third feature map includes:

performing convolution operation on a fourth feature map, extracting mean features and variance features of the fourth feature map, and performing convolution operation on the second feature map, extracting mean features and variance features of the second feature map; the fourth feature map is the feature map with the smallest size in the first feature maps;

3. The method of claim 2, wherein a fusion formula fusing the mean feature of the fourth feature map, the variance feature of the fourth feature map, the mean feature of the second feature map, and the variance feature of the second feature map is:

4. The method according to claim 1, wherein the method further comprises:

5. The method of any one of claims 1-4, wherein the loss function is:

L＝λ ₁ L _percept +λ ₂ L _skin

6. The method of claim 5, wherein the skin color perception loss L _percept The method comprises the following steps:

R _i ＝C _i W _i H _i

7. The method of claim 5, wherein the skin tone histogram is lost L _skin The method comprises the following steps:

wherein T is a reference image and P is a reference image _T The skin area of the reference image is the skin color migration image, the Y is the skin color migration image, and the P is _s P for the skin region of the source image ^- (T*P _T ) Probability of skin color for the reference imageDistribution, p ⁺ (Y*P _s ) And migrating a probability distribution of skin colors of the image for the skin colors.

8. A method of generating a skin-friendly image, the method comprising:

The skin color migration model is trained by the method for training a skin color migration model according to any one of claims 1-7.

9. A computer device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

10. A computer readable storage medium storing computer executable instructions for causing a computer device to perform the method of any one of claims 1-8.