[go: up one dir, main page]

CN116758611B - A method for simulating facial aging - Google Patents

A method for simulating facial aging

Info

Publication number
CN116758611B
CN116758611B CN202310689267.5A CN202310689267A CN116758611B CN 116758611 B CN116758611 B CN 116758611B CN 202310689267 A CN202310689267 A CN 202310689267A CN 116758611 B CN116758611 B CN 116758611B
Authority
CN
China
Prior art keywords
age
face
layer
aging
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310689267.5A
Other languages
Chinese (zh)
Other versions
CN116758611A (en
Inventor
张刚
潘利嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Priority to CN202310689267.5A priority Critical patent/CN116758611B/en
Publication of CN116758611A publication Critical patent/CN116758611A/en
Application granted granted Critical
Publication of CN116758611B publication Critical patent/CN116758611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)

Abstract

本发明提供一种人脸年龄化模拟方法,属于人工智能技术领域,该人脸年龄化模拟方法采用的人脸年龄数据集使用语义分割模型进行图像预处理,得到待进行年龄化模拟的人脸图像;设计从原年龄段年龄化至目标年龄段时,所需求的原‑目标年龄差异编码;构造生成式对抗网络,进而构造年龄差异信息;通过反向传播的方式更新网络参数,进而完成对生成式对抗网络的训练;将待年龄化的人脸图像作为生成器的输入,获得人脸年龄化图像。本发明基于年龄差异信息得到的人脸年龄化图像能够使生成清晰的年龄化人脸效果所需训练时间更短,且生成的年龄化人脸身份一致性性能更好,使得得到的年龄化人脸更接近输入人脸在目标年龄段时的真实人脸样貌。

This invention provides a method for simulating facial aging, belonging to the field of artificial intelligence technology. The method uses a facial age dataset and a semantic segmentation model for image preprocessing to obtain the facial image to be age-simulated. It designs the original-to-target age difference encoding required to age from the original age group to the target age group; constructs a generative adversarial network (GAN) to generate age difference information; updates the network parameters through backpropagation to complete the training of the GAN; and uses the facial image to be aged as input to the generator to obtain the age-simulated image. This invention, based on age difference information, enables the generation of clear age-simulated faces with shorter training time and better consistency in the identity of the generated age-simulated faces, making the obtained age-simulated face closer to the real appearance of the input face at the target age group.

Description

Face aging simulation method
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a face aging simulation method.
Background
The face aging simulation method mainly adopted at present is based on the generation type countermeasure network, namely the method of decoupling the identity information and the age information of the input face and then fusing the identity information and the age information of the target age group is generally adopted to generate the face aging image, but the face aging method widely applied to date is still weaker in keeping identity consistency.
The scheme in the prior art is often used for improving the decoupling effect of the face identity information and the age information, but the face identity information and the age information are difficult to sufficiently decouple, which can lead to weakening of the identity characteristics of the person, so that the finally generated face aging image and the input face image are difficult to identify as the same face, namely, the identity consistency maintaining performance of the existing face aging method is poor. And the larger the span of the age of the input face and the target age range is, the more obvious the phenomenon is, and finally, the aging result is changed from one person to another person.
For this reason, in order to solve the above-mentioned technical problems, a face aging simulation method needs to be provided.
Disclosure of Invention
The invention aims to provide a human face aging simulation method, which solves the technical problems that the prior human face aging method adopts a scheme of decoupling identity information and age information, and the human identity characteristics are weakened due to difficult decoupling, so that the performance of the human face aging method on keeping identity consistency is weaker, an aging result is finally converted from one person to another person, and the phenomenon is more obvious as the span between the age of an input human face and a target age bracket is larger.
The invention provides a face aging simulation method, which comprises the following steps:
S1, performing image preprocessing on an adopted face age data set by using a semantic segmentation model, removing a background, and reserving a simple face area to obtain a face image to be subjected to age simulation;
s2, designing a required original-target age difference code when the age is changed from the original age to the target age, and designing a target-original age difference code when the age is changed from the target age to the original age;
S3, constructing a generated countermeasure network, wherein the generated countermeasure network comprises a generator, a discriminator, an age difference encoder and a mapping network, and sending an original-target age difference code, a target-original age difference code from a target age group to an original age group, and a face image to be subjected to age simulation into the generator and the discriminator so as to construct age difference information;
S4, respectively calculating a loss function, an age loss function and an identity consistency loss function of the generator and the discriminator, and a loss function of the generated countermeasure network, wherein the loss function of the discriminator comprises the loss function of the generated countermeasure network, and the network parameters are updated in a counter-propagation mode so as to complete training of the generated countermeasure network;
S5, taking the face image to be aged as input of a generator to obtain the face aging image.
Optionally, in step S1, the specific steps include:
S11, collecting a face aging data set, wherein the face aging data set comprises face images of different age groups and age labels corresponding to the face images;
S12, a pre-trained semantic segmentation network is adopted, semantic information in an input image can be distinguished by the network, each picture in a data set is used as input of the network, and a semantic graph of each picture is output;
And S13, taking the corresponding semantic graph of each face picture as a reference, only reserving semantic parts related to faces, five sense organs, hairs and necks, and randomly rotating the face pictures subjected to the operation, thereby completing the construction of a data set.
Optionally, in step S11, the different age groups include 0-2 years old, 3-6 years old, 7-9 years old, 15-19 years old, 30-39 years old, and 50-69 years old.
Alternatively, in step S2, assuming that the age groups have n groups, the age difference encoding I has 50×2n bits in total, where each 50 bits is used to represent age difference information for converting from one age group to an adjacent age group;
Firstly, adding an age difference code I with a noise vector which is of the same length and obeys Gaussian distribution, secondly, taking the 50 Xn bit as a reference bit, constructing an age difference code from an age of an original age group to a target age group j, adding 1 to the 50 Xn to 50X (n+j) -1 bit, and keeping the rest bits unchanged;
the age difference code from the age of the target age group j to the original age group is constructed by adding 1 to the 50 Xn-1 to 50X (n-j) bits, and the rest bits are unchanged.
Optionally, in step S3, the specific steps include:
s31, constructing an encoder structure in a generator, wherein the encoder part firstly adopts a 7 multiplied by 7 convolution layer, the step length of the encoder part is 1, then a ReLU activation function and a pixel normalization layer are connected, then 3 multiplied by 3 convolution layers are connected, the step length of each convolution layer is 2, the ReLU activation function and the pixel normalization layer are connected to the back of each convolution layer, then 4 layers of residual blocks are connected, the step length of each residual block is 1, the ReLU activation function and the pixel normalization layer are connected to the back of the first three residual blocks, and the pixel normalization layer is not connected to the back of the last residual block, so that the construction of the encoder is completed;
S32, constructing a decoder structure in a generator, constructing a main body of the decoder by adopting an age difference injection module, wherein the age difference injection module is a residual structure formed by a convolution layer and a style convolution layer in StyleGAN, the decoder totally comprises 6 age difference injection modules and a convolution layer of a1 multiplied by 1 convolution kernel, an up-sampling layer is connected behind the 5 th and 6 th age difference injection modules, a feature map is restored to the size of an input image, the last layer is the convolution layer of the 1 multiplied by 1 convolution kernel, the channel number of the feature map is reduced to 3, and finally a Tanh activation function is connected, so that the construction of the decoder is completed;
S33, constructing an age difference encoder which is a convolutional neural network, wherein the first layer of the encoder is a convolutional layer formed by 7 multiplied by 7 convolutional kernels, the step length is 1, then the first layer is connected with a convolutional layer formed by 5 multiplied by 3 convolutional kernels, the step length is 2, finally the first layer is connected with a convolutional layer formed by 1 multiplied by 1 convolutional kernels, the first 5 layers of convolutional layers are connected with LReLU activation functions, and the second layer is connected with a global average pooling layer after the third layer of convolutional layers, and is responsible for reducing the dimension of the feature map to vectors;
s34, constructing a mapping network, wherein the mapping network consists of 8 layers of linear layers, wherein a ReLU activation function and a pixel normalization layer are connected to the back of the first 7 layers of linear layers, and only one pixel normalization layer is connected to the back of the last layer;
s35, constructing a decoder part, wherein the decoder part adopts the decoder structure proposed in StyleGAN;
S36, sending the original-target age difference code and the face image to be subjected to aging simulation into a generator to generate an aging face image of a target age group and a reconstructed face image of the original age group, sending the aging face image of the target age group and the target-original age difference code into the generator as input, generating an aging face image of the reconstructed image of the target age group and the original age group, sending the aging image and the real face into a discriminator, sending the real face of the target age group, the aging face image of the target age group and the face image to be subjected to aging simulation into the input of an age difference encoder, and constructing age difference information.
Compared with the prior art, the invention provides a human face aging simulation method, which comprises the steps of carrying out image preprocessing by using a semantic segmentation model through an adopted human face age data set to obtain a human face image to be subjected to aging simulation, designing an original-target age difference code required when the human face is aged from an original age bracket to a target age bracket, constructing a generated type countermeasure network to further construct age difference information, updating network parameters in a counter-propagation mode to further finish training of the generated type countermeasure network, and taking the human face image to be subjected to aging as input of a generator to obtain the human face aging image. The invention has the advantages that (1) the training time required for generating clear aging face effect is shorter, (2) the identity consistency performance of the generated aging face is better, and (3) the generated aging face is closer to the real face appearance of the input face in the target age range.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. In the drawings, wherein like or corresponding reference numerals indicate like or corresponding parts, there are shown by way of illustration, and not limitation, several embodiments of the invention, in which:
FIG. 1 is a flow chart of a face aging simulation method of the present invention;
FIG. 2 is a schematic diagram of an age difference injection module according to the present invention;
FIG. 3 is a block diagram of an age difference encoder of the present invention;
FIG. 4 is a network structure diagram of the face aging method based on age difference of the present invention;
FIG. 5 is an age effect diagram of the face aging method based on age difference according to the present invention;
fig. 6 is a graph showing the face aging effect obtained by different optomechanical procedures in the prior art.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The technical means used in the examples are conventional means well known to those skilled in the art unless otherwise indicated.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. Relational terms such as "first" and "second", and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "coupled," "connected," and the like are to be construed broadly and may be, for example, fixedly coupled, detachably coupled, or integrally formed, mechanically coupled, electrically coupled, or indirectly coupled via an intervening medium. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, the present embodiment provides a face aging simulation method, which is performed according to the following steps:
S1, performing image preprocessing on an adopted face age data set by using a semantic segmentation model, removing a background, and reserving a simple face area to obtain a face image to be subjected to age simulation;
In step S1, the specific steps include:
S11, collecting a face aging data set, wherein the face aging data set comprises face images of different age groups and age labels corresponding to the face images;
In step S11, the different ages include 0-2 years old, 3-6 years old, 7-9 years old, 15-19 years old, 30-39 years old, and 50-69 years old.
S12, a pre-trained semantic segmentation network is adopted, the network can distinguish 19 semantic information such as background, face skin, five sense organs, clothes, neck and the like in an input image, each picture in a data set is used as input of the network, and a semantic graph of each picture is output;
And S13, taking the corresponding semantic graph of each face picture as a reference, only reserving semantic parts related to faces, five sense organs, hairs and necks, and randomly rotating the face pictures subjected to the operation, thereby completing the construction of a data set.
S2, designing a required original-target age difference code when the age is changed from the original age to the target age, and designing a target-original age difference code when the age is changed from the target age to the original age;
In step S2, assuming that there are n groups of age groups, the age difference encoding I has 50×2n bits in total, where each 50 bits is used to represent age difference information for converting from one group of age groups to an adjacent age group;
Firstly, adding an age difference code I with a noise vector which is of the same length and obeys Gaussian distribution, secondly, taking the 50 Xn bit as a reference bit, constructing an age difference code from an age of an original age group to a target age group j, adding 1 to the 50 Xn to 50X (n+j) -1 bit, and keeping the rest bits unchanged;
the age difference code from the age of the target age group j to the original age group is constructed by adding 1 to the 50 Xn-1 to 50X (n-j) bits, and the rest bits are unchanged.
S3, constructing a generating type countermeasure network, wherein the generating type countermeasure network comprises a generator, a discriminator, an age difference encoder and a mapping network, and sending an original-target age difference code, a target-original age difference code for aging from a target age group to an original age group, and a face image to be subjected to aging simulation into the generator and the discriminator, so as to construct age difference information, as shown in fig. 2 and 4;
in step S3, the specific steps include:
s31, constructing an encoder structure in a generator, wherein the encoder part firstly adopts a 7 multiplied by 7 convolution layer, the step length of the encoder part is 1, then a ReLU activation function and a pixel normalization layer are connected, then 3 multiplied by 3 convolution layers are connected, the step length of each convolution layer is 2, the ReLU activation function and the pixel normalization layer are connected to the back of each convolution layer, then 4 layers of residual blocks are connected, the step length of each residual block is 1, the ReLU activation function and the pixel normalization layer are connected to the back of the first three residual blocks, and the pixel normalization layer is not connected to the back of the last residual block, so that the construction of the encoder is completed;
S32, constructing a decoder structure in a generator, constructing a main body of the decoder by adopting an age difference injection module, wherein the age difference injection module is a residual structure formed by a convolution layer and a style convolution layer in StyleGAN, the decoder totally comprises 6 age difference injection modules and a convolution layer of a1 multiplied by 1 convolution kernel, an up-sampling layer is connected behind the 5 th and 6 th age difference injection modules, a feature map is restored to the size of an input image, the last layer is the convolution layer of the 1 multiplied by 1 convolution kernel, the channel number of the feature map is reduced to 3, and finally a Tanh activation function is connected, so that the construction of the decoder is completed;
S33, constructing an age difference encoder which is a convolutional neural network, wherein the first layer of the encoder is a convolutional layer formed by 7 multiplied by 7 convolutional kernels, the step length is 1, then the first layer is connected with a convolutional layer formed by 5 multiplied by 3 convolutional kernels, the step length is 2, finally the first layer is connected with a convolutional layer formed by 1 multiplied by 1 convolutional kernels, the first 5 layers of convolutional layers are connected with LReLU activation functions, the second layer is connected with a global average pooling layer, and the third layer is responsible for dimension reduction to vectors of the feature map, as shown in figure 3;
s34, constructing a mapping network, wherein the mapping network consists of 8 layers of linear layers, wherein a ReLU activation function and a pixel normalization layer are connected to the back of the first 7 layers of linear layers, and only one pixel normalization layer is connected to the back of the last layer;
s35, constructing a decoder part, wherein the decoder part adopts the decoder structure proposed in StyleGAN;
S36, sending the original-target age difference code and the face image to be subjected to aging simulation into a generator to generate an aging face image of a target age group and a reconstructed face image of the original age group, sending the aging face image of the target age group and the target-original age difference code into the generator as input, generating an aging face image of the reconstructed image of the target age group and the original age group, sending the aging image and the real face into a discriminator, sending the real face of the target age group, the aging face image of the target age group and the face image to be subjected to aging simulation into the input of an age difference encoder, and constructing age difference information.
S4, respectively calculating a loss function, an age loss function and an identity consistency loss function of the generator and the discriminator, and a loss function of the generated countermeasure network, wherein the loss function of the discriminator comprises the loss function of the generated countermeasure network, and the network parameters are updated in a counter-propagation mode so as to complete training of the generated countermeasure network;
S5, taking the face image to be aged as input of a generator to obtain a face aging image, as shown in fig. 5.
Illustratively, in this embodiment, the face age dataset is divided into 6 groups including 0-2 (group 1), 3-6 (group 2), 7-9 (group 3), 15-19 (group 4), 30-39 (group 5), and 50-69 (group 6), the face dataset is preprocessed, a semantic graph corresponding to each face picture in the dataset is obtained by using a pre-trained deeplabv3 +network, and only semantic parts related to faces, five sense organs, hairs and necks of each picture in the dataset are reserved based on the semantic graph, and then the dataset is randomly rotated to complete preprocessing operation of the dataset.
Age difference codes were constructed on a 6-group basis, the total length of the codes was 50× (6×2) =600, and each bit was set to 0. The age difference code is added to a 600 bit long vector with a mean of 0 and a variance of 0.04. With reference to the 300 th bit, 50 bits per forward direction represent a set of age difference information adjacent to the age group of the input face, and the target age group is older than the age group of the input picture. The 50 bits are each added with 1. If the aging process spans two age groups, it is necessary to start from the base position and add 1 to the forward 100 positions. 50 bits per negative direction represent a set of age difference information adjacent to the age of the input face and the target age is less than the age of the input picture.
The network is constructed as follows. Table 1 is a table of encoder network structures.
Table 1 encoder network architecture table
Table 2 is a decoder network architecture table.
Table 2 decoder network architecture table
Table 3 is the mapping network.
TABLE 3 mapping network structural Table
Table 4 is a decoder network architecture table.
Table 4 age difference encoder structure table
The following is part of the loss function during training. First, for constraint identity information, the network contains an input image x orig, an aged image y trans, a reconstructed image x rec, and a regenerated image x cyc of the original age group x with y trans as input. With x orig and x rec construction reconstruction losses, it is desirable that the face image obtained after reconstruction can be sufficiently close to the original face image. The cyclic consistency loss is constructed by adopting x orig and x cyc, and when the image after aging is expected to be aged again to be the face image of the initial age group, the generated result is close to the original face image. Through these two losses, the control generator as a whole strengthens identity information consistency when generating age images. The two loss functions are shown in the formula (1) and the formula (2):
Lrec=||xorig-xrec||1 (1)
Lcyc=||xorig-xcyc||1 (2)
Next, constraint on age information is performed. First, constraint is made with the value obtained by the age difference encoder. The age difference encoder is responsible for completing the difference learning work. The input is an image y trans after age change, an original face image y orig of the y age group and an original face image x orig of the x age group. The three face pictures are used for obtaining age information codes A (y trans)、A(yorig)、A(xorig) contained in the face pictures through an age difference encoder A. Wherein, A (y trans) is used as age information code of the face picture after aging, and the included characteristics are extracted from the non-real data. And a (y orig) and a (x orig) are extracted from the real data. When the age of the age group x is taken as an original age group and the age is changed to a y age group, taking A (x orig) as a reference, taking A (y trans) and A (y orig) as subtracted numbers to respectively subtract A (x orig) so as to obtain two codes dif y_trans-x_orig and dif y_orig-x_orig containing age difference information. both age codes are involved in the loss function calculation as values for control l age. Finally, the content of the age difference information can be obtained more reliably by the device age. The whole process can be represented by the formula (3):
The dif y_trans-x_orig and the dif y_orig-x_orig obtained after calculation by the age difference encoder are used as true values of age constraint. The dif y_orig-x_orig takes two pieces of real data as input and makes difference to obtain age information, and the dif y_orig-x_orig can provide real information for the age code I age, so that the hidden vector l age obtained through I age can obtain more real age characterization. Meanwhile, dif y_orig-x_orig is used as age difference information obtained by using one piece of real data and one piece of generated data, and does not completely contain real age representation, but contains age information of the obtained aged picture, and the age difference information can be fed back to the age code I age through constraint to improve the age authenticity of the aged picture. The two loss functions are shown in the formula (4) and the formula (5):
Ldif_t=||Iage-dify_orig-x_orig||1 (4)
Ldif_f=||Iage-dify_trans-x_orig||1 (5)
the final age difference loss function is the sum of the two loss functions as shown in equation (6):
Lage=Ldif_t+Ldif_f (6)
Meanwhile, the generated aged face image and the real face image are respectively used as input of a discriminator, and a vector is generated and used as a result of the discriminator predicting the age range to which the provided picture belongs. Also responsible for constraining the age information of the generated picture, as shown in equation (7):
Ladv(G,D)=Ex,a(logDa(x))+Ex,b(log(1-Db(ytrans))) (7)
Wherein a represents the age group of the real picture and b represents the age group of the generated picture. The output of the discriminator is a vector whose number of bits represents a specific age group category, and each bit represents the probability that the input picture is in that age group. But in doing the loss function it is only responsible for providing a prediction of the number of bits in the correct age range. For example, if the input real picture is an age group a, only the prediction bit of the age group a is provided when the loss is calculated, and the prediction bit is used as a constraint term of the loss function. When the input picture is generated into a picture, and the age bracket of the picture is b, only the prediction bit of the age bracket b is provided, and the prediction bit is calculated by taking the prediction bit as a constraint term of a loss function. The loss function of the final network as a whole is shown in formula (8):
L=min maxLadv(G,D)+λrecLrec(G)+λcycLcyc(G)+λageLage(G) (8)
Wherein, lambda rec represents the super-parameters of the reconstruction loss function, lambda cyc represents the super-parameters of the cyclic consistency loss function, lambda age represents the super-parameters of the age difference loss function, and the reconstruction loss, the cyclic consistency loss and the age difference loss are only participated in the constraint in the training of the generator. Therefore, the definition of the loss function of the face aging network based on the age difference is completed, and based on the loss, people can optimize the age information and the identity information of the face at the same time. The super parameter part, lambda rec、λcyc, is set to 10 and lambda age is set to 0.5. The optimizer employs Adam, whose momentum is set to 0.9 and the learning rate is set to 0.001.
Details of the training process are detailed below. In the whole, each epoch at least comprises two face pictures when forward propagation is carried out once, wherein one face picture is the face picture of the age group x and the other face picture is the face picture of the age group y, so that the mutual conversion between the two age groups can be carried out on the basis of a real sample at the same time in the forward propagation once. Thus, the gradient calculated by loss in back propagation simultaneously comprises the face aging process in two directions, so that the accuracy of bidirectional aging is improved. The overall training of the network is divided into a training process of the generator and a training process of the arbiter, and specific details are described below.
The training process of the generator is mainly participated by the generator and the discriminator at the same time. The generator part correspondingly generates a reconstructed picture and an aged picture by one input picture in one forward propagation process. It is emphasized that the two pictures in the present method are not generated separately during one forward propagation, but simultaneously due to the use of the age-difference injection structure. Because the feature map obtains a feature map into which the age difference information is injected through the age difference injection structure of each layer, and a feature map in which the age information injection part is skipped only through the convolution layer. Therefore, after the multi-layer age difference injection structure, the finally obtained output is the reconstructed face and the aged face.
After the reconstructed face and the aged face are obtained, continuous cyclic consistent training is carried out, namely the aged face is used as the input of a generator, and the aging operation is carried out again to obtain the face picture which is close enough to the face picture before aging. The generator used at this time is the same as the generator used for generating the reconstructed face and the aged face by taking the original face picture as input. I.e. the network has only one generator, which is responsible for the two-way simulation of aging and younger age. The main part of controlling the direction of aging is the age information provided. The generated aging face only participates in the calculation of the cyclic consistency loss function, and does not participate in the training process of the discriminator.
And a discriminator section for obtaining a vector predicted by the discriminator by taking the aged face as an input. The content of the vector contains age-group probabilities predicted by the arbiter for the input picture. And when the loss function is calculated, selecting a value in a bit represented by the expected age group of the picture for calculation, and discarding the rest. The above is the training process of the whole generator.
The training process of the arbiter requires the generator to provide the generated pictures. The generated picture does not store gradient information, is only responsible for training of the discriminator, and does not update the weight of the generator. And respectively taking the aged human face and the real human face as input to obtain two vectors predicted by the discriminator. Similarly, when the two vectors participate in the loss calculation, only the value in the bits represented by the expected age group of each picture is selected to participate in the loss function calculation, and the rest bits are discarded. To complete training of the discriminant.
After the network training is completed, the test face to be subjected to face aging can be input into the network to complete face aging simulation, as shown in fig. 5, the left image is an input image, and the right image is an output image. Whereas in the prior art, i.e. in fig. 6, left 1 is the input image and the rest is the output image.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. A face aging simulation method is characterized by comprising the following steps:
S1, performing image preprocessing on an adopted face age data set by using a semantic segmentation model, removing a background, and reserving a simple face area to obtain a face image to be subjected to age simulation;
S2, designing a required original-target age difference code when the age is changed from an original age group to a target age group and a target-original age difference code when the age is changed from the target age group to the original age group, wherein the specific steps comprise that the age difference code I is 50 multiplied by 2n in total on the assumption that the age groups are n in total, and each 50 bits are used for representing age difference information converted from one age group to an adjacent age group;
Firstly, adding an age difference code I with a noise vector which is of the same length and obeys Gaussian distribution, secondly, taking the 50 Xn bit as a reference bit, constructing an age difference code from an age of an original age group to a target age group j, adding 1 to the 50 Xn to 50X (n+j) -1 bit, and keeping the rest bits unchanged;
constructing an age difference code from the age of the target age bracket j to the original age bracket, wherein 1 is added to 50 Xn-1 to 50X (n-j) bits, and the rest bits are unchanged;
The training process of the generator comprises the steps of inputting a face image to be subjected to aging simulation and a primary-target age difference code into the generator to correspondingly generate an aging face image of a target age group and a reconstructed face image of the primary age group in a forward propagation process, and continuing to perform cyclic consistent training after obtaining the aging face image of the target age group and the reconstructed face image of the primary age group, namely taking the aging face image of the target age group and the target-primary age difference code as input of the generator, and performing aging operation again to obtain the aging face image of the target age group and the reconstructed face image of the primary age group;
The training process of the discriminator comprises the steps of taking a real face image and an age face image of a target age group as input to obtain a vector predicted by the discriminator, wherein the content of the vector comprises age group probability predicted by the discriminator on the input image;
S4, respectively calculating a loss function, an age loss function and an identity consistency loss function of the generator and the discriminator, and a loss function of the generated countermeasure network, wherein the loss function of the discriminator comprises the loss function of the generated countermeasure network, and the network parameters are updated in a counter-propagation mode so as to complete training of the generated countermeasure network;
S5, taking the face image to be aged as input of a generator to obtain the face aging image.
2. The method for facial aging simulation according to claim 1, wherein in step S1, the specific steps include:
S11, collecting a face aging data set, wherein the face aging data set comprises face images of different age groups and age labels corresponding to the face images;
s12, a pre-trained semantic segmentation network is adopted, the network is used for distinguishing semantic information in the input images, each image in the data set is used as the input of the network, and the semantic graph of each image is output;
And S13, taking the corresponding semantic graph of each face image as a reference, only reserving semantic parts related to the face, the five sense organs, the hair and the neck, and carrying out random rotation on the face image subjected to the operation, thereby completing the construction of a data set.
3. The method of facial aging simulation according to claim 2, wherein in step S11, the different ages include 0-2 years old, 3-6 years old, 7-9 years old, 15-19 years old, 30-39 years old, and 50-69 years old.
4. The method for facial aging simulation according to claim 1, further comprising, in step S3:
s31, constructing an encoder structure in a generator, wherein the encoder part firstly adopts a 7 multiplied by 7 convolution layer, the step length of the encoder part is 1, then a ReLU activation function and a pixel normalization layer are connected, then 3 multiplied by 3 convolution layers are connected, the step length of each convolution layer is 2, the ReLU activation function and the pixel normalization layer are connected to the back of each convolution layer, then 4 layers of residual blocks are connected, the step length of each residual block is 1, the ReLU activation function and the pixel normalization layer are connected to the back of the first three residual blocks, and the pixel normalization layer is not connected to the back of the last residual block, so that the construction of the encoder is completed;
S32, constructing a decoder structure in a generator, constructing a main body of the decoder by adopting an age difference injection module, wherein the age difference injection module is a residual structure formed by a convolution layer and a style convolution layer in StyleGAN, the decoder totally comprises 6 age difference injection modules and a convolution layer of a1 multiplied by 1 convolution kernel, an up-sampling layer is connected behind the 5 th and 6 th age difference injection modules, a feature map is restored to the size of an input image, the last layer is the convolution layer of the 1 multiplied by 1 convolution kernel, the channel number of the feature map is reduced to 3, and finally a Tanh activation function is connected, so that the construction of the decoder is completed;
S33, constructing an age difference encoder which is a convolutional neural network, wherein the first layer of the encoder is a convolutional layer formed by 7 multiplied by 7 convolutional kernels, the step length is 1, then the first layer is connected with a convolutional layer formed by 5 multiplied by 3 convolutional kernels, the step length is 2, finally the first layer is connected with a convolutional layer formed by 1 multiplied by 1 convolutional kernels, the first 5 layers of convolutional layers are connected with LReLU activation functions, and the second layer is connected with a global average pooling layer after the third layer of convolutional layers, and is responsible for reducing the dimension of the feature map to vectors;
s34, constructing a mapping network, wherein the mapping network consists of 8 layers of linear layers, wherein a ReLU activation function and a pixel normalization layer are connected to the back of the first 7 layers of linear layers, and only one pixel normalization layer is connected to the back of the last layer;
and S35, constructing a decoder part, wherein the decoder part adopts the decoder structure proposed in StyleGAN.
CN202310689267.5A 2023-06-12 2023-06-12 A method for simulating facial aging Active CN116758611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310689267.5A CN116758611B (en) 2023-06-12 2023-06-12 A method for simulating facial aging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310689267.5A CN116758611B (en) 2023-06-12 2023-06-12 A method for simulating facial aging

Publications (2)

Publication Number Publication Date
CN116758611A CN116758611A (en) 2023-09-15
CN116758611B true CN116758611B (en) 2025-11-04

Family

ID=87960182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310689267.5A Active CN116758611B (en) 2023-06-12 2023-06-12 A method for simulating facial aging

Country Status (1)

Country Link
CN (1) CN116758611B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
CN109523463B (en) * 2018-11-20 2023-04-07 中山大学 Face aging method for generating confrontation network based on conditions
CN110852935A (en) * 2019-09-26 2020-02-28 西安交通大学 An image processing method for changing face images with age
FR3112633B1 (en) * 2020-06-30 2023-03-03 Oreal High-resolution controllable facial aging with spatially sensitive conditional GANs
EP4150514A1 (en) * 2020-06-30 2023-03-22 L'Oréal High-resolution controllable face aging with spatially-aware conditional gans

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于生成式对抗网络的人脸年龄化模拟方法研究;潘利嘉;《中国优秀硕士学位论文全文数据库 信息科技辑》;20240715(第7期);全文 *

Also Published As

Publication number Publication date
CN116758611A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
Xu et al. Adversarially approximated autoencoder for image generation and manipulation
WO2020029356A1 (en) Method employing generative adversarial network for predicting face change
CN111160467A (en) Image description method based on conditional random field and internal semantic attention
CN114549850A (en) Multi-modal image aesthetic quality evaluation method for solving modal loss problem
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN109902750A (en) Image description method based on bidirectional single attention mechanism
CN114639374A (en) Real-time voice-driven photo-level realistic human face portrait video generation method
CN116342379A (en) A Flexible and Diverse Facial Image Aging Generation System
CN114331821B (en) An image conversion method and system
CN117058276A (en) Image generation method, device, equipment and storage medium
CN114332287A (en) Method, device, equipment and medium for reconstructing PET (positron emission tomography) image based on transformer feature sharing
CN116230154A (en) Chest X-ray diagnosis report generation method based on memory strengthening transducer
CN114820303A (en) Method, system and storage medium for reconstructing super-resolution face image from low-definition image
CN113408430A (en) Image Chinese description system and method based on multistage strategy and deep reinforcement learning framework
CN116188912A (en) Training method, device, medium and equipment for image synthesis model of subject image
Sun et al. A unified framework for biphasic facial age translation with noisy-semantic guided generative adversarial networks
Ye et al. Lifelong generative adversarial autoencoder
Lin et al. Consensus-agent deep reinforcement learning for face aging
CN116758611B (en) A method for simulating facial aging
CN120260103B (en) A method and terminal for generating identity-preserving images
CN119939036B (en) Sequence recommendation method based on contrastive learning and variational autoencoder
CN119784576B (en) Facial expression migration method and system based on diffusion model and facial key points
Thengane et al. Cycle face aging generative adversarial networks
Meira et al. Generating Synthetic Faces for Data Augmentation with StyleGAN2-ADA.
CN116071793A (en) A method and device for identity privacy protection for face images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant