WO2022068451A1 - Procédé et appareil de génération d'image de style, procédé et appareil de formation de modèle, dispositif et support - Google Patents
Procédé et appareil de génération d'image de style, procédé et appareil de formation de modèle, dispositif et support Download PDFInfo
- Publication number
- WO2022068451A1 WO2022068451A1 PCT/CN2021/113225 CN2021113225W WO2022068451A1 WO 2022068451 A1 WO2022068451 A1 WO 2022068451A1 CN 2021113225 W CN2021113225 W CN 2021113225W WO 2022068451 A1 WO2022068451 A1 WO 2022068451A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- style
- face
- target
- generation model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/02—Affine transformations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4023—Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to a style image generation method, a model training method, an apparatus, a device and a medium.
- Image style conversion refers to the style conversion of one or more images to generate a style image that meets user needs.
- Training a model with the function of generating style images is currently the main way to realize image style transfer.
- the training method of the model in the existing scheme is single, which cannot meet the needs of users to generate style images in real time.
- the embodiments of the present disclosure provide a style image generation method, a model training method, an apparatus, a device and a medium.
- an embodiment of the present disclosure provides a method for generating a style image, including:
- the target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
- obtained by training and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
- an embodiment of the present disclosure also provides a method for training a style image generation model, including:
- an initial style image generation model is obtained by training
- the at least one style image real-time generation model is trained to obtain a trained target style image Generate models in real time.
- an embodiment of the present disclosure further provides an apparatus for generating a style image, including:
- the original face image acquisition module is used to obtain the original face image
- a target-style face image generation module used to generate a model in real time by using the pre-trained target-style image to obtain a target-style face image corresponding to the original face image;
- the target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
- obtained by training and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
- an embodiment of the present disclosure further provides a training device for a style image generation model, including:
- a sample acquisition module used for acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image
- a first training module used for training to obtain an initial style image generation model based on the plurality of original face sample images and the target style face sample images corresponding to each original face sample image;
- a model cropping module configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, wherein the style image real-time generation model follows the cropping changes in parameters;
- a second training module configured to train the at least one style image real-time generation model based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, Obtain the trained target style image to generate the model in real time.
- an embodiment of the present disclosure further provides an electronic device, the electronic device includes: a processing device; a memory for storing executable instructions of the processing device; the processing device for obtaining an instruction from the memory
- the executable instructions are read and executed to implement any style image generation method provided by the embodiments of the present disclosure, or to implement any style image generation model training methods provided by the embodiments of the present disclosure.
- an embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processing device, implements any style image generation method provided by the embodiment of the present disclosure , or implement the training method for any style image generation model provided by the embodiments of the present disclosure.
- the initial style image generation model After training the initial style image generation model based on the multiple original face sample images and the target style face sample image corresponding to each original face sample image, the initial style image generation model is cropped based on at least one set of cropping parameters , and continue to train the cropped initial style image generation model to obtain a style image real-time generation model.
- the space occupation and computational complexity of the style image real-time generation model are smaller than the initial style image generation model, and it has the ability to generate style images in real time. Therefore, in the application stage of the style image real-time generation model, the style image that meets the user's needs can be generated in real time by using the style image real-time generation model on the user's terminal device.
- the style image real-time generation model can be changed with the change of the cropping parameters, that is, the user can train the style image real-time generation model in different ways.
- the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time on the user terminal device are solved, the effect of generating style images for users in real time is realized, and the user's understanding of the image style conversion function is improved.
- different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
- FIG. 1 is a flowchart of a method for generating a style image according to an embodiment of the present disclosure
- FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of the position of a face area bounding box on a first original face image according to an embodiment of the present disclosure
- FIG. 4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure.
- FIG. 5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure.
- FIG. 6 is a flowchart of a method for training a style image generation model according to an embodiment of the present disclosure
- FIG. 7 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
- FIG. 8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
- FIG. 9 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
- FIG. 10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure.
- FIG. 11 is a schematic structural diagram of a style image generating apparatus according to an embodiment of the present disclosure.
- FIG. 12 is a schematic structural diagram of a training device for a style image generation model according to an embodiment of the present disclosure
- FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
- FIG. 1 is a flowchart of a method for generating a style image provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure can be applied to a situation in which a style image of any style is generated based on an original face image.
- the styles mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing.
- the original face image may refer to any image including a face region.
- the original face image may be an image captured by a device with a capturing function, or an image drawn by a drawing technology.
- the style image generation method provided by the embodiments of the present disclosure may be executed by a style image generation apparatus, which may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a terminal, etc.
- the terminal may include But not limited to smart mobile terminals, tablet computers, personal computers, etc.
- the style image generating device can be implemented in the form of an independent application program or a small program integrated on the public platform, and can also be implemented as an application program with a style image generating function or a functional module integrated in the small program.
- the programs may include, but are not limited to, video interactive applications or video interactive applets.
- the style image generation method provided by the embodiment of the present disclosure may include:
- an image stored in the terminal may be uploaded or an image or video may be captured in real time by an image capturing device of the terminal.
- the terminal may acquire the original face image to be processed according to the user's image selection operation, image capture operation or image upload operation in the terminal.
- the target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
- the initial style image generation model and the target style image real-time generation model are both trained based on multiple original face sample images and target style face sample images corresponding to each original face sample image.
- the real-time generation model of the style image changes with the change of the cropping parameter.
- the first cropping parameters in the initial style image generation model may be obtained, and based on the first crop parameters, at least one cropping operation is performed based on the initial style image generation model.
- the first cropping parameter as the first important factor of the activation layer as an example
- the first important factor of the activation layer in the initial style image generation model can be obtained, and the first important factor in the initial style image generation model can be determined according to the first important factor.
- the activation layer and the convolution layer corresponding to the activation layer are cropped to obtain at least one style image real-time generation model, and then continue to train the at least one style image real-time generation model to obtain the trained target style image real-time generation model.
- a real-time generation model of style images is obtained.
- at least two cropping operations are performed on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model.
- an initial style image generation model is obtained by training based on multiple original face sample images and a target style face sample image corresponding to each original face sample image, and the first important factor of the activation layer in the initial style image generation model is obtained.
- the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the real-time generation model of the first style image; target-style face sample images corresponding to the original face sample images, train the real-time generation model for the first style image, obtain the trained real-time generation model for the first style image, and obtain the trained real-time generation model for the first style image.
- the second important factor of the middle activation layer based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
- the second style image real-time generation model is trained, and the trained second style image real-time generation model is obtained.
- Both the trained first style image real-time generation model and the trained second style image real-time generation model can be used as target style image real-time generation models, and have the function of real-time generation of style images.
- At least two cropping operations are performed based on the initial style image generation model to correspondingly obtain at least two style image real-time generation models, and at least two targets are obtained by training the at least two style image real-time generation models.
- the style image real-time generation model, and at least two target style image real-time generation models correspond to different equipment performance information respectively; correspondingly, using the pre-trained style image to generate the model in real time, the target style person corresponding to the original face image is obtained.
- the method further includes: based on the current device performance information, acquiring a real-time generation model of the target style image adapted to the current device performance information.
- the server after the server receives the model acquisition request or the model issuing request from the terminal device, it can match the current device performance information according to the current device performance information of the terminal device carried in the model acquisition request or the model issuing request.
- the matched target style image is generated in real time and the model is sent to the terminal device.
- the current device performance information of the terminal device may include, but is not limited to, storage space usage information of the terminal device, processor running indicators, and other information that can be used to measure the current running performance of the terminal device.
- the initial style image generation model can be sent to the terminal device; otherwise, the target style can be sent to the terminal device. Images generate models in real time.
- the initial style image generation model or the target style image real-time generation model may include a conditional generative adversarial network (CGAN, Conditional Generative Adversarial Networks) model, a cycle-consistent generative adversarial network (Cyclegan, Cycle Consistent Adversarial Networks) model and other arbitrary support non-
- CGAN conditional generative adversarial network
- Cyclegan Cycle Consistent Adversarial Networks
- the network model for alignment training is not specifically limited in this embodiment of the present disclosure.
- the initial style image generation model is generated based on the initial style image.
- the complexity is smaller than the initial style image generation model, and it has the function of real-time generation of style images. Therefore, in the application stage of the target style image real-time generation model, the target style image real-time generation model can be used to generate real-time style images that meet user needs.
- the embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function; Moreover, different target style image real-time generation models can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
- multiple original face sample images and a target-style face sample image corresponding to each original face sample image are the input and output of the pre-trained target image model, respectively.
- the target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the target style image real-time generation model through training, so as to be used for training to obtain the initial style image generation model and the target style image.
- the sample data of the style image real-time generation model is consistent, which reduces the training difficulty of the target style image real-time generation model.
- the target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
- FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above-mentioned technical solution, and can be combined with each of the above-mentioned optional embodiments.
- the style image generation method may include:
- any available face recognition technology can be used to identify the face area of the original face image, and output the parameter information of the bounding box surrounding the face area on the original face image, that is, the face area Parameter information for the bounding box.
- the key point detection technology is used to determine the key point of the face area, and then the rotation angle of the face area is determined based on the key point.
- the parameter information of the bounding box of the face region includes the position of the bounding box on the original face image. Further, the parameter information of the face region bounding box may also include the size and shape of the face region bounding box.
- the size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be customized.
- the face area bounding box can be any regular geometric figure; the rotation angle of the face area refers to the angle by which the face area should be rotated on the original face image in order to obtain an image that meets the preset face position requirements.
- the key point detection technology By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method.
- the complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
- the parameter information of the bounding box of the face region may include the position of the bounding box on the original face image.
- the position of the bounding box on the original face image can be represented by the position coordinates of each vertex of the face region bounding box on the original face image, or the distance of each edge from the image boundary on the original face image.
- an affine transformation matrix for adjusting the position of the face region can be constructed based on the parameter information of the bounding box of the face region and the rotation angle of the face region with reference to the existing affine transformation principle. The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face image.
- the preset face position requirement may be: after the face region position is adjusted, the face region is located in the central region of the entire image; or, after the face region position is adjusted, the face region The facial features of the region are located at a specific position in the entire image; or, after the face region position is adjusted, the face region and the background region (the remaining image region after the face region is removed from the entire image) occupy the entire image.
- the position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping. According to the actual position of the face area bounding box on the original face image and the preset face position requirements, at least one position adjustment operation can be flexibly selected to adjust the position of the face area until the preset face position requirements are obtained. face image. In the process of adjusting the position of the face area on the original face image, the position of the original face image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face area or the face area including the face area. The sub-region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is adjusted independently, which is not specifically limited in the embodiment of the present disclosure.
- the normalization preprocessing of the original face image is realized, It can ensure the generation effect of subsequent style images.
- the target style image can be further processed flexibly according to the style image processing requirements, such as image background fusion requirements, face position recovery requirements, etc.
- the style image generation method provided by the embodiment of the present disclosure further includes:
- the position of the target face area in the target style face image is adjusted to obtain the first style face image corresponding to the position of the face area in the original face image, that is, the position of the target face area in the target style face image is restored. to a position consistent with the position of the face region in the original face image, thereby reducing the difference between the target style face image and the face region position on the original face image.
- the inverse matrix M′ of the affine transformation matrix M can be obtained, and Using the inverse matrix M′ of the affine transformation matrix M, the position of the target face region in the target style face image is adjusted to obtain the first style face image.
- style image generation method provided by the embodiment of the present disclosure further includes:
- the target face area in the first style face image is fused with the target background area to obtain the second style face image.
- the target background area (that is, the remaining image area except the face area) may be the background area of the original face image, or the background area processed by the background processing algorithm, such as the background area on the target-style face image. etc., on the basis of ensuring that a style image with a higher display effect is provided for the user, the embodiment of the present disclosure does not make a specific limitation. By blending with the target background area, the display effect of the final style image can be optimized.
- any available image fusion technology may be used to perform fusion processing on the target face region and the target background region in the face image of the first style.
- the target face area in the first style face image and the background area of the original face image to obtain the second style face image as an example, in addition to changing the image style, the second style face image can be realized.
- Other image features or image details on the face image are still consistent with the original face image, and finally, the second style face image can be displayed to the user.
- the position of the face region is adjusted on the original face image to be processed, and then the real-time generation model of the pre-trained target style image is used to obtain the corresponding target style person in real time.
- the face image improves the generation effect of the style image, and solves the problem of poor image effect after image style conversion in the existing solution; and, in the embodiment of the present disclosure, the face area can be obtained while the face area is recognized.
- the rotation angle is directly used in the adjustment of face position (or called face alignment), which improves the efficiency of face position adjustment, and then can realize real-time face position adjustment.
- the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain the first face image, including:
- the position of the face region is adjusted to obtain the first face image.
- an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
- the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the actual position of the face region on the original face image.
- the accuracy of the position determination thereby ensuring the accuracy of the face region position adjustment.
- the preset face position correction parameter value can be used to accurately determine the face region.
- the preset image size refers to the predetermined image size of the input style image generation model. That is, if the original face image does not meet the preset image size, it is also necessary to perform image cropping on the original face image.
- the rotation angle of the face region determined by the key point detection technology can be expressed as Roll
- the value of the face position correction parameter can be expressed as ymeanScale
- the value range of ymeanScale can be set to [0, 1]
- the preset The image size can be expressed as targetSize
- the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face image. Taking Figure 3 as an example, it is assumed that the lower left corner of the original face image is used as the image.
- the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t
- the distances between the two sides of the face region bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r.
- affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
- FIG. 4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the foregoing technical solution, and may be combined with the foregoing optional implementation manners.
- the same operations exist in FIG. 4 and FIG. 2 , which will not be repeated below, and reference may be made to the descriptions of the foregoing embodiments.
- the style image generation method may include:
- the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image;
- the face area bounding box can be This includes any regular geometric figure, which can be a square, for example.
- the position representation of the face region bounding box on the original face image can be simplified.
- S303 Acquire a preset face position correction parameter value and a preset image size.
- the value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
- S304 Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
- S305 Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
- the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area may include the third distance 1 and the fourth distance r, and the four sides of the square corresponding to the position parameters in the vertical direction.
- yMean ymeanScale ⁇ t+(1 ⁇ ymeanScale) ⁇ b.
- the face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face area on the original face image.
- the face cropping ratio is 2, which means that on the original face image, according to the face area 2 times the size of the bounding box, crop the area of the image that includes the face area.
- the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b).
- the edge length value edgeLength of the face area can be expressed as:
- edgeLength edgeScale ⁇ (r-l).
- the affine transformation matrix M can be expressed as follows:
- the affine transformation matrix required for adjusting the position of the face region is constructed according to the requirements of cropping, scaling, etc. for the original face image, so that the affine transformation matrix Adjust the position of the face area on the original face image to ensure the accuracy of the adjustment of the face area, and then use the pre-trained target style image to generate the model in real time to obtain the corresponding target style face image in real time, which improves the style
- the image generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
- a pre-trained target style image is used to generate a model in real time to obtain a target style face image corresponding to the first face image, including:
- the maximum pixel value on the second face image can be determined, and then all pixels on the second face image can be The value normalizes the currently determined maximum pixel value;
- the target style face image corresponding to the third face image is obtained.
- gamma correction can also be called gamma nonlinearization or gamma coding, which is used to perform nonlinear operations or inverse operations on the luminance or tristimulus values of light in a film or imaging system.
- Gamma-correcting images can compensate for the characteristics of human vision, thereby maximizing the use of data bits or bandwidth representing black and white based on human perception of light or black and white.
- the preset gamma value may be preset, which is not specifically limited in the embodiment of the present disclosure. For example, the pixel values of the three RGB channels on the first face image are simultaneously corrected with a gamma value of 1/1.5.
- the specific implementation of gamma correction can be implemented with reference to the principles of the prior art.
- a second face image with a more balanced brightness distribution can be obtained, which can reduce facial defects, avoid the phenomenon of unbalanced image brightness distribution leading to unsatisfactory effect of the generated style image, and ensure the obtained target image.
- the presentation of style images is more stable.
- FIG. 5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is used to exemplarily illustrate an embodiment of the present disclosure.
- a user image is obtained first, and a matting processing technology can be used to extract the user The face area on the image, and then based on the affine transformation matrix determination method in the above-mentioned embodiment, determine the affine transformation matrix used to adjust the position of the face area on the user image, and use the affine transformation matrix to position the face area.
- Adjustment that is, the face alignment processing in Figure 5
- Inverse transformation matrix adjust the position of the face area on the target style image, restore the position of the face area, and fuse the restored face area with the background area on the user image, and finally the background can be fused After the style image is fed back to the user.
- FIG. 6 is a flowchart of a training method for a style image generation model provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the style conversion requirements.
- the style image generation model uses It is used to generate style images corresponding to the original face images.
- the image style mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing.
- the training apparatus for the style image generation model provided by the embodiments of the present disclosure may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a server, etc.
- the training method of the style image generation model may include:
- the sample images in the model training process can be obtained from an open image database. Using a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image in the model training of the embodiment of the present disclosure can ensure the consistency of the sample data, thereby achieving a higher level of performance.
- the model training effect lays the foundation.
- obtain multiple original face sample images and target-style face sample images corresponding to each original face sample image including:
- the target-style face sample images corresponding to each original face sample image are obtained respectively.
- the target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the style image real-time generation model through training, so as to be used for subsequent training to obtain the initial style image generation model and style
- the sample data of the image real-time generation model is consistent, which reduces the training difficulty of the style image real-time generation model.
- the target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
- the target image model is trained based on the style face sample images obtained by using the image generation model.
- the image generation model may include a Generative Adversarial Networks (GAN, Generative Adversarial Networks) model, and the specific implementation principle may refer to the prior art.
- the training process of the target image model may include: acquiring multiple standard-style face sample images, and training the obtained standard image generation model based on the multiple standard-style face sample images; using the standard image generation model to generate multiple
- the style face sample image used for training the target image model is trained to obtain the target image model based on the style face sample image used for training the target image model.
- the aforesaid standard style face sample images may be obtained by professional painters drawing style images for a preset number (values may be determined according to training requirements) of original face sample images according to current image style requirements.
- an initial style image generation model is obtained by training based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image.
- the initial style image generation model has the function of style image generation.
- the initial style image generation model may include a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, or any other network model that supports non-aligned training, which is not specifically limited in the embodiment of the present disclosure.
- the first cropping parameters of the initial style image generation model may be acquired, and based on the first cropping parameters, at least one cropping operation is performed on the initial style image generation model to obtain at least one style image real-time generation model.
- the first cropping parameter is used to measure the importance of functional modules or neural network layers in the initial style image generation model.
- the function module or neural network layer corresponding to the first cropping parameter that is smaller than the preset parameter threshold can be cropped to obtain a real-time generation model of style images.
- the first cropping parameter may include, but is not limited to, the first important factor of the activation layer in the initial style image generation model, according to the first important factor, the activation layer in the initial style image generation model and the corresponding activation layer.
- the convolutional layer is cropped, for example, the activation layer corresponding to the first important factor smaller than the preset parameter threshold and the convolutional layer corresponding to the activation layer can be cropped to obtain a style image real-time generation model.
- the style image real-time generation model is obtained by cropping the initial style image generation model. Compared with the original style image generation model, the storage space occupation and computational complexity of the style image real-time generation model are reduced. The performance requirements of the terminal equipment during the model running process can realize the function of real-time generation of style images.
- the style image real-time generation model is of the same type as the initial style image generation model, and can also include any network models that support non-aligned training, such as conditional generative adversarial network CGAN model, cycle-consistent generative adversarial network Cyclegan model, etc. limited.
- a style image real-time generation model that meets the style image generation requirements can be obtained.
- the training process of the initial style image generation model (large model) and the style image real-time generation model (small model) is equivalent to the training strategy of the size model, because the style image real-time generation model is the same as the initial style image generation model It is realized on the basis of the training of the original style image, and the sample data used is consistent, so the training difficulty of the real-time model can be greatly reduced. Supervising the features of real-time models further accelerates the training of real-time generation models for style images.
- the initial style image generation model is generated based on the initial style image.
- the embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function.
- the real-time generation model of the style image can be obtained by performing the training and cropping operations of the model one or more times.
- performing at least one cropping operation based on the initial style image generation model includes: performing at least two cropping operations based on the initial style image generation model to obtain a first style image real-time generation model and a second style image real-time generation model.
- the first-style image real-time generation model and the second-style image real-time generation model are trained based on the multiple original face sample images and the target-style face sample image corresponding to each original face sample image, so as to obtain the first target
- the style image real-time generation model and the second target style image real-time generation model wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
- At least two cropping operations are performed based on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model, including:
- crop the initial style image generation model Based on the first cropping parameters, crop the initial style image generation model to obtain the first style image real-time generation model;
- the second cropping parameter of the trained first style image real-time generation model is used to measure the importance of the functional module or the neural network layer in the first style image real-time generation model
- the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model.
- the number of times of cyclic execution of the model cropping operation may be determined according to the model training requirements, which is not specifically limited in the embodiment of the present disclosure.
- the trained first style image real-time generation model and the trained second style image real-time generation model, etc. can both be used as style image real-time generation models, and have the function of real-time generation of style images.
- the first style image real-time generation model, the second style image real-time generation model, and other style image real-time generation models, etc. can respectively correspond to different device performance information, so that the terminal device can be sent to the terminal device according to the performance information of the terminal device.
- the model is generated in real time from the style image adapted to the performance information. That is, different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
- obtaining the first cropping parameters of the initial style image generation model including:
- the initial style image generation model is cropped to obtain the first style image real-time generation model, including:
- the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model
- obtain the second cropping parameters of the real-time generation model of the first style image after training including:
- the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model, including:
- the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
- the multiple important factors of the activation layer in the initial style image generation model obtained after model training can be different.
- the average value of multiple important factors can be used.
- the first important factor of the activation layer in the initial style image generation model similarly, using different original face sample images as the model training input, the first style image after training is generated in real time.
- Multiple important factors of the activation layer in the model are also It can be different.
- the average value of multiple important factors can be used as the second important factor of the activation layer in the real-time generation model of the first style image after training.
- obtain the first important factor of the activation layer in the initial style image generation model including:
- the second important factor of the activation layer in the real-time generation model of the first style image after training including:
- the Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
- an initial style image generation model (large model) can be obtained by first training, and then the first-order Taylor expansion of each activation layer at the output value at the end of training is calculated, and each activation layer is estimated.
- the importance of the layer according to the calculated first important factor, cut out the unimportant activation layer and the corresponding convolutional layer, and then continue the training to obtain the real-time generation model of the first style image;
- the real-time generation model of the first style image is cropped, and the training is continued to obtain the second style image generation model.
- FIG. 7 is a flowchart of another training method for a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
- the processing process of the original face image except for the different image processing objects, all belong to the same inventive concept, which is not detailed in the following embodiments.
- FIG. 7 and FIG. 6 have the same operations, which will not be repeated hereafter, and reference may be made to the descriptions of the foregoing embodiments.
- the training method of the style image generation model may include:
- S702 Identify the face region of the original face sample image, and determine parameter information of the bounding box of the face region and the rotation angle of the face region.
- any available face recognition technology can be used to identify the face region of the original face sample image, and output the bounding box surrounding the face region on the original face sample image, and at the same time, use the key point detection technology to determine The key points of the face area and the rotation angle of the face area.
- the rotation angle of the face region refers to the angle at which the face region should be rotated on the original face sample image in order to obtain an image that meets the preset face position requirements;
- the parameter information of the face region bounding box is used to represent The position of the bounding box on the original face sample image, and the size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be set by yourself.
- the bounding box of the face region can be any regular geometric figure.
- the key point detection technology By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method.
- the complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
- the parameter information of the face area bounding box may include, but is not limited to, the position coordinates of each vertex of the face area bounding box on the original face sample image, or the distance of each edge from the image boundary on the original face sample image, etc. .
- an affine transformation matrix for adjusting the position of the face region can be constructed, and the original face sample image on the The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face sample image.
- the position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping.
- the position of the original face sample image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face region or include the face
- the sub-region of the region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is individually adjusted, which is not specifically limited in this embodiment of the present disclosure.
- S704 Obtain a target-style face sample image corresponding to each first face sample image.
- a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
- the target style face sample image corresponding to the first face sample image is used as a training sample, which improves the training effect of the initial style image generation model and the style image real-time generation model, and solves the problem that the existing model has a single training method and cannot meet the needs of users. It solves the problem of the requirement of real-time generation of style images, and at the same time improves the generation effect of style images in the model application stage, and solves the problem of poor image effect after image style conversion in the existing scheme.
- the rotation angle of the face area can be obtained, which can be directly used in the adjustment of face alignment, which improves the efficiency of face position adjustment, realizes real-time face position adjustment, and improves model training. s efficiency.
- the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face sample image, including:
- the position of the face region is adjusted to obtain a first face sample image.
- an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
- the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the accuracy of the face region on the original face sample image.
- the accuracy of the actual position determination thereby ensuring the accuracy of the face region position adjustment. For example, if the vertical position of the face region determined based on the parameter information of the face region bounding box is higher than the actual position on the original face sample image, the preset face position correction parameter value can be used to accurately determine the face region. The actual location of the face area.
- the preset image size refers to pre-determining the input image size in the model training process, that is, if the original face sample image does not meet the preset image size, the original face sample image needs to be cropped to ensure the final use of the model training process.
- the sample images are of uniform size.
- the rotation angle of the face region determined by the key point detection technology can be expressed as Roll
- the value of the face position correction parameter can be expressed as ymeanScale
- the value range of ymeanScale can be set to [0, 1]
- the preset The image size can be expressed as targetSize
- the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face sample image.
- the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t
- the distance between the two sides of the face area bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r.
- yMean ymeanScale ⁇ t+(1-ymeanScale) ⁇ b;
- the affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
- FIG. 8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
- FIG. 8 has the same operations as those in FIG. 6 or FIG. 7 respectively, which will not be repeated below, but the description of the above-mentioned embodiment may be referred to.
- the training method of the style image generation model may include:
- the four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes position parameters of the four sides in the original face sample image.
- the value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
- S804 Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
- S805 Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
- the face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face region on the original face sample image.
- the face cropping ratio is 2, which means that on the original face sample image, according to the human 2 times the size of the face area bounding box, crop the image area including the face area.
- the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b).
- the edge length value edgeLength of the face area can be expressed as:
- edgeLength edgeScale ⁇ (r-l).
- the affine transformation matrix M can be expressed as follows:
- Roll represents the rotation angle of the face area determined by the key point detection technology
- targetSize represents the preset image size
- (xMean, yMean) represents the coordinates of the center of the face area.
- a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
- an affine transformation matrix required for adjusting the position of the face region is constructed through the requirements of cropping, scaling, etc. of the original face sample image.
- the position on the sample image is adjusted to ensure the accuracy of the adjustment of the position of the face region, and the multiple first face sample images and the target style face sample image corresponding to each first face sample image are used as training samples.
- the generation effect solves the problem that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, and at the same time improves the model application stage.
- the generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
- FIG. 9 is a flowchart of another method for training a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
- the operations in FIG. 9 and FIG. 6 are the same, which will not be described in detail below. Reference may be made to the descriptions of the above embodiments.
- the training method of the style image generation model may include:
- Face adjustment refers to adjusting the face of the person on the target-style face sample image according to the display requirements for the face shape of the person.
- the face adjustment includes at least one of the following: face shape adjustment and mouth adjustment.
- Face shape adjustment refers to the adjustment of the face shape on the target style face sample image according to the display requirements of the character face, such as face-lift adjustment; Adjust the mouth of the character on the image, such as adjusting the shape of the mouth, controlling the thickness of the mouth lines to be consistent, etc. That is to say, in this embodiment of the present disclosure, face fine-tuning is supported on the target style face sample image, so that the presentation effect of the target style face sample image is more beautiful, thereby ensuring that the initial style image generation model obtained by training and the style image are generated in real time. The model is more accurate, and can output a style image with high display effect for any input image.
- the display effect of facial features is optimized, the construction of high-quality sample data is realized, and the initial style image generation model and the real-time generation of style images are improved.
- the training effect of the model thereby ensuring the generation effect of the style image in the model application stage.
- face shape adjustment is performed on the face region on the target style face sample image, including:
- the face contour of the face region on the target style face sample image is adjusted to obtain the first style face sample image.
- the key points of the initial face contour can be obtained by using the key point detection technology to perform key point detection on the face region on the target style face sample image.
- the key points of the target face contour are determined according to the face shape adjustment requirements. According to the translation transformation between the initial face contour key point and the target face contour key point, the initial face contour key point is moved to the target face contour key point, so as to realize face adjustment.
- the face contour of the face region on the target-style face sample image is adjusted, including:
- the deformed face region is rendered using the face texture of the target style face sample image to obtain the first style face sample image.
- the thin plate spline interpolation function (thin plate spline) is a two-dimensional deformation processing algorithm, and the specific principle can be realized with reference to the prior art.
- Using the thin-plate spline interpolation function to deform the face region can ensure the smoothness of the face contour after face adjustment.
- Using the face texture of the target-style face sample image to render the deformed face region can ensure the consistency of the face texture after face adjustment.
- using the thin-plate spline interpolation function to deform the face region may specifically include:
- the vertices of the triangulation network are translated using a thin-plate spline interpolation function.
- the entire style image area can also be triangulated; as an example, triangulation has the advantage of convenient calculation and processing.
- triangulation has the advantage of convenient calculation and processing.
- other styles of image meshing can also be adapted in practical applications. deal with.
- the face area on the target-style face sample image or the entire target-style face sample image may be triangulated, and the initial face contour key points of the face area on the target-style face sample image are determined.
- the thin-plate spline interpolation function is used to interpolate the translation amount from the initial face contour key point L1 to the target face contour key point L2 to each triangular mesh vertex.
- the vertices are translated, and finally the face texture on the target-style face sample image is used as the current texture, and the new triangular mesh is rendered to obtain the target-style face sample image after face reduction.
- the risk of face deformation can be reduced to a certain extent, and the overall presentation effect of the face can be maintained.
- performing mouth adjustment on the face region on the target-style face sample image including:
- the mouth determined based on the key points of the mouth is removed from the face area of the target-style face sample image, and the incomplete-style face sample image is obtained; for example, when the mouth state is determined to be open, the The mouth determined by the key points is removed from the face area of the target style face sample image, and the incomplete style face sample image is obtained;
- the pre-generated mouth material is fused with the incomplete style face sample image to obtain the first style face sample image.
- the key points of the mouth can also be obtained by using the key point detection technology to perform key point detection on the face area on the target-style face sample image.
- the mouth state can be determined according to the distance between the keypoints belonging to the upper and lower lips, for example, among the keypoints of the upper and lower lips, the number of keypoints whose distance between the upper and lower corresponding keypoints exceeds the distance threshold exceeds the number of Threshold, the mouth state is considered to be open, otherwise it is closed. Both the distance threshold and the number threshold can be set adaptively. If it is determined that the mouth is open, in order to ensure the effect of the mouth, the pre-designed mouth material is used to replace the mouth on the target style face sample image.
- the mouth determined based on the key points of the mouth is removed from the face region of the target-style face sample image to obtain the incomplete-style face sample image, including:
- a sub-region surrounding the mouth is determined in the face region of the target-style face sample image; wherein, the size of the sub-region can be determined adaptively, which is not specifically limited in the embodiment of the present disclosure;
- the fixed boundary solution algorithm refers to an algorithm used to determine the boundary of a target figure (such as a mouth) in the field of image processing, such as based on Laplace
- the edge detection algorithm of the (Laplace) operator, etc. can be implemented with reference to the prior art.
- the boundary conditions in the calculation process are determined according to each key point included on the boundary of the sub-region, that is, according to the key points of the face skin on the boundary of the sub-region. point;
- the mouth is removed from the face region of the target style face sample image to obtain the incomplete style image.
- the pre-generated mouth material is fused with the incomplete-style face sample image, including:
- the deformed mouth material is rendered using the mouth texture of the target-style face sample image.
- the key points marked on the mouth material and the key points of the mouth in the face region of the target-style face sample image have a corresponding relationship, for example, the coordinates of the key points are determined in the same image coordinate system. Align the key points marked on the mouth material with the key points of the mouth in the face area of the target-style face sample image, that is, to realize the difference between the mouth material and the mouth in the face area of the target-style face sample image.
- the key point mapping between the two can paste the mouth material back to the mouth area on the incomplete style sample image.
- Using the thin-plate spline interpolation function to deform the mouth material can ensure the smoothness of the border of the mouth material and ensure the display effect of the mouth.
- fuse the pre-generated mouth footage with the mutilated style image including:
- the mouth material is deformed based on the thin-plate spline interpolation function, including:
- the area between the inner boundary line and the outer boundary line is deformed using the solution optimization algorithm.
- FIG. 10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure, and specifically shows the inner boundary line and outer boundary line of the outline of the mouth edge, and the inner boundary line and the outer boundary line can be appropriately colored according to requirements filling.
- the layer mesh refers to the mesh obtained by meshing the area within the inner boundary line of the mouth material; the outer mesh refers to meshing the area between the inner boundary line and the outer boundary line.
- the resulting grid Both the inner mesh and the inner mesh may be triangulated meshes.
- the deformation control of the outer mesh can be implemented based on as-rigid-as-possible without rotation, and the deformation control of the inner mesh can still be implemented based on the thin-plate spline interpolation function. .
- the inner layer mesh can be deformed by using the thin-plate spline interpolation function, and then the thin-plate spline interpolation can be used to solve the optimization problem.
- the vertices of the outer mesh are obtained.
- the area of the outer mesh can also be determined, so as to realize the fusion of the mouth material and the incomplete style image by controlling the area between the inner boundary line and the outer boundary line.
- the process remains unchanged, that is, the thickness of the hook line on the edge of the mouth remains unchanged.
- u represents an unknown vertex
- I is a 2x2 identity matrix.
- the embodiment of the present disclosure also needs to control the thickness of the hook line on the edge of the mouth when the mouth is in the closed state, and the above method can also be used. accomplish.
- the above mouth adjustment operations are all implemented based on the target style face sample image, to optimize the display effect of the mouth on the target style face sample image, and then optimize the training of the initial style image generation model and the style image real-time generation model. Effect.
- FIG. 11 is a schematic structural diagram of an apparatus for generating a style image provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure may be applicable to a situation in which a style image of any style is generated based on an original face image.
- the apparatus can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capabilities, such as a terminal, which may include, but is not limited to, a smart mobile terminal, a tablet computer, a personal computer, and the like.
- the style image generation apparatus 1100 may include an original face image acquisition module 1101 and a target style face image generation module 1102, wherein:
- the target style face image generation module 1102 is used to generate a model in real time by using the pre-trained target style image to obtain the target style face image corresponding to the original face image;
- the target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after the initial style image generation model is obtained by training, and
- the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and the target style face sample image corresponding to each original face sample image, wherein the style image real-time generation model varies with the cropping parameters.
- a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image are respectively the input and output of a pre-trained target image model, and the target image model is used to generate the original face sample image.
- the target style face sample image corresponding to the face sample image provides training samples for the initial style image generation model and the target style image generation model.
- At least two cropping operations are performed according to at least two sets of cropping parameters based on the initial style image generation model to obtain at least two style image real-time generation models accordingly, and at least two style image real-time generation models are trained to obtain at least two style image real-time generation models.
- real-time generation models for each target style image, and at least two target style image real-time generation models respectively correspond to different device performance information;
- the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
- the model obtaining module is used for obtaining the real-time generation model of the target style image adapted to the current equipment performance information based on the current equipment performance information.
- the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
- the face recognition module is used to identify the face area of the original face image, and determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
- the face position adjustment module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area, and obtain the first face image, so as to obtain the target style based on the first face image. face image.
- the face position adjustment module includes:
- the first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
- the first face image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the face position correction parameter value and the preset image size, and obtain the first person face image.
- the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image;
- the first face The image determination unit includes:
- the first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
- the second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
- the affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
- the first face image determination subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face image.
- the face position adjustment module further includes:
- the face cropping ratio acquisition unit is used to obtain the preset face cropping ratio
- the side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
- the scaling size value determining unit is used for calculating the scaling size value based on the side length value of the face area and the preset image size.
- the affine transformation matrix construction subunit is specifically used for:
- An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
- the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
- the target face area acquisition module is used to obtain the target face area in the target style face image
- the first style face image determination module is used to adjust the position of the target face area in the target style face image, and obtain the first style face image corresponding to the position of the face area in the original face image.
- the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
- the second style face image determination module is configured to perform fusion processing on the target face area and the target background area in the first style face image to obtain the second style face image.
- the style image generating apparatus provided by the embodiment of the present disclosure can execute any style image generating method provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
- any style image generating method provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
- FIG. 12 is a schematic structural diagram of a training device for a style image generation model provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the needs of style conversion.
- the style image generation model uses It is used to generate style images corresponding to the original face images.
- the training device can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capability, such as a server.
- the training apparatus 1200 for the style image generation model may include a sample acquisition module 1201, a first training module 1202, a model cropping module 1203, and a second training module 1204, wherein:
- a sample acquisition module 1201 configured to acquire a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
- the first training module 1202 is used for training to obtain an initial style image generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
- the model cropping module 1203 is configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, which changes with the change of cropping parameters ;
- the second training module 1204 is configured to train at least one style image real-time generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image to obtain a trained target Style images generate models in real-time.
- model cropping module 1203 is specifically used for:
- the model cropping module 1203 includes: a first cropping parameter acquisition unit, configured to acquire the first cropping parameter of the initial style image generation model;
- a first cropping unit configured to crop the initial style image generation model based on the first cropping parameter to obtain the first style image real-time generation model
- the second cropping parameter obtaining unit is used to obtain the second cropping parameter of the real-time generation model of the first style image after training;
- the second cropping unit is configured to crop the trained first style image real-time generation model based on the second cropping parameter to obtain the second style image real-time generation model.
- the first cropping parameter obtaining unit is specifically configured to: obtain the first important factor of the activation layer in the initial style image generation model;
- the first cropping unit is specifically used for:
- the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model
- the second cropping parameter obtaining unit is specifically used for:
- the second cropping unit is specifically used for:
- the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
- the first cropping parameter obtaining unit is specifically used for:
- the second cropping parameter obtaining subunit is specifically used for:
- the Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
- the first style image real-time generation model and the second style image real-time generation model are trained based on a plurality of original face sample images and the target style face sample image corresponding to each original face sample image, so as to respectively.
- a first target style image real-time generation model and a second target style image real-time generation model are obtained, wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
- the sample acquisition module 1201 includes:
- an original face sample image acquisition unit used for acquiring multiple original face sample images
- the target-style face sample image acquisition unit is used for obtaining target-style face sample images corresponding to each original face sample image by using the pre-trained target image model.
- the target image model is obtained by training based on the style face sample images generated by the standard image generation model.
- the standard image generation model is trained on multiple standard-style face sample images.
- the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
- the face adjustment module is used to perform face adjustment on the face area on the target style face sample image to obtain the first style face sample image, so as to combine the multiple original face sample images and the obtained multiple first style face images
- the face sample images are used for training to obtain the initial style image generation model and the trained style image real-time generation model.
- the face adjustment includes face shape adjustment and/or mouth adjustment.
- the face adjustment module includes a face shape adjustment unit for performing face shape adjustment on the face region on the target style face sample image;
- the face adjustment unit includes:
- the key point determination subunit is used to determine the initial face contour key points of the face region on the target-style face sample image, and the target face contour key points corresponding to the initial face contour key points; wherein, the target face contour The key points are determined according to the needs of face adjustment;
- the face shape adjustment subunit is used to adjust the face contour of the face region on the target style face sample image based on the initial face contour key points and the target face contour key points to obtain the first style face sample image.
- the face shape adjustment subunit includes:
- the key point moving subunit is used to move the initial face contour key point to the target face contour key point, and use the thin plate spline interpolation function to deform the face area on the target style face sample image;
- the image rendering subunit is used for rendering the deformed face region by using the face texture of the target style face sample image, so as to obtain the first style face sample image.
- the face adjustment module includes a mouth adjustment unit for performing mouth adjustment on the face region on the target-style face sample image;
- the mouth adjustment unit includes:
- the mouth key point determination subunit is used to determine the mouth key points of the face area on the target style face sample image
- the incomplete-style face sample image determination subunit is used to remove the mouth determined based on the key points of the mouth from the face area of the target-style face sample image to obtain the incomplete-style face sample image;
- the first style face sample image determination subunit is used for fusing the pre-generated mouth material with the incomplete style face sample image to obtain the first style face sample image.
- the subunit for determining the incomplete-style face sample image includes:
- the sub-region determination sub-unit is used to determine the sub-region surrounding the mouth in the face region of the target-style face sample image based on the key points of the mouth;
- the mouth boundary determination subunit is used to determine the mouth boundary line in the subregion by using the fixed boundary solution algorithm
- the mouth removal subunit is used to remove the mouth from the face area of the target style face sample image based on the mouth boundary line to obtain the incomplete style face sample image.
- the first style face sample image determination subunit includes:
- the key point alignment and deformation sub-unit is used to align the key points marked on the mouth material with the key points of the mouth in the face area of the target style face sample image, and based on the thin-plate spline interpolation function, the mouth material is processed. deformation processing;
- the image rendering subunit is used to render the deformed mouth material using the mouth texture of the target-style face sample image.
- the first style face sample image determination subunit further includes:
- the inner and outer boundary determination subunits are used to determine the inner and outer boundary lines of the mouth contour on the mouth material
- Keypoint alignment and deformation subunits including:
- the key point alignment sub-unit is used to align the key points of the mouth in the face region of the target-style face sample image based on the key points marked on the mouth material;
- the first deformation subunit is used to deform the area within the inner boundary line of the mouth material by using the thin-plate spline interpolation function
- the second deformation subunit is used to perform deformation processing on the area between the inner boundary line and the outer boundary line by using the solution optimization algorithm.
- the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
- the face recognition module is used to identify the face area of the original face sample image, and to determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
- the first face sample image determination module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area to obtain the first face sample image.
- the target-style face sample image acquisition unit is specifically configured to obtain a target-style face sample image corresponding to each original face sample image by using a pre-trained target image model based on a plurality of first face sample images.
- the first face sample image determination module includes:
- the first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
- the first face sample image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the value of the face position correction parameter and the preset image size, and obtain the first A sample image of a person's face.
- the four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face sample image;
- the first face sample image determination unit includes:
- the first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
- the second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
- the affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
- the position adjustment subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face sample image.
- the first face sample image determination module further includes:
- the face cropping ratio acquisition unit is used to obtain the preset face cropping ratio
- the side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
- a scaling size value determination unit used for calculating the scaling size value based on the side length value of the face area and the preset image size
- the affine transformation matrix construction subunit is specifically used for:
- An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
- the apparatus for training a style image generation model provided by the embodiment of the present disclosure can execute the training method for an arbitrary style image generation model provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
- the apparatus for training a style image generation model provided by the embodiment of the present disclosure can execute the training method for an arbitrary style image generation model provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
- FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is used to exemplarily illustrate the electronic device for executing a style image generation method or a training method for a style image generation model in an example of the present disclosure.
- the electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
- the electronic device shown in FIG. 13 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- electronic device 1300 may include processing means (eg, central processing unit, graphics processor, etc.) 1301 that may be loaded into random access according to a program stored in read only memory (ROM) 1302 or from storage means 1308 Various appropriate actions and processes are executed by the programs in the memory (RAM) 1303 .
- RAM read only memory
- various programs and data necessary for the operation of the electronic device 1300 are also stored.
- the processing device 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304.
- An input/output (I/O) interface 1305 is also connected to bus 1304 .
- the ROM 1302, RAM 1303 and storage device 1308 shown in FIG. 13 may be collectively referred to as a memory for storing executable instructions or programs of the processing device 1301.
- I/O interface 1305 the following devices can be connected to the I/O interface 1305: input devices 1306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 1307 of a computer, etc.; a storage device 1308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1309. Communication means 1309 may allow electronic device 1300 to communicate wirelessly or by wire with other devices to exchange data.
- FIG. 13 shows an electronic device 1300 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 1309, or from the storage device 1308, or from the ROM 1302.
- the processing device 1301 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
- HTTP HyperText Transfer Protocol
- Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire an original face image; use a pre-trained target style image A model is generated in real time, and a target style face image corresponding to the original face image is obtained; wherein, the target style image real-time generation model is obtained after training to obtain an initial style image generation model.
- At least one set of cropping parameters is obtained by performing at least one cropping operation, and the initial style image generation model and the target style image real-time generation model are both based on a plurality of original face sample images and a target corresponding to each original face sample image.
- the style face sample image is obtained by training, wherein the real-time generation model of the style image changes with the change of the cropping parameter.
- a computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire a plurality of original face sample images and compare them with each target style face sample images corresponding to the original face sample images; based on the multiple original face sample images and the target style face sample images corresponding to each original face sample image, the initial style image is obtained by training generating a model; performing at least one cropping operation on the initial style image generation model according to at least one set of cropping parameters to obtain a style image real-time generation model, and the style image real-time generation model changes with the change of the cropping parameters; based on The multiple original face sample images and the target style face sample images corresponding to each original face sample image are trained on the style image real-time generation model to obtain a trained target style image real-time generation model .
- the electronic device can also be made to execute other style image generation methods or other style image generation models provided by the examples of the present disclosure training method.
- computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages, such as Java, Smalltalk, C++, and also conventional procedural programming languages, such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to connect).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
- the modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware.
- the name of the module or unit does not constitute a limitation of the module or unit itself in some cases, for example, the original face image acquisition module can also be described as "a module for acquiring original face images”.
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
La présente invention se rapporte, dans des modes de réalisation, à un procédé et à un appareil de génération d'image de style, à un procédé et à un appareil de formation de modèle, à un dispositif et à un support. Le procédé de génération d'image de style consiste : à obtenir une image de visage d'origine ; à utiliser un modèle de génération en temps réel d'image de style cible préformé pour obtenir une image de visage de style cible correspondant à l'image de visage d'origine, le modèle de génération en temps réel d'image de style cible étant obtenu en formant, après la formation d'un modèle de génération d'image de style initial, au moins un modèle de génération en temps réel d'image de style obtenu en réalisant au moins une opération de recadrage sur le modèle de génération d'image de style initial selon au moins un ensemble de paramètres de recadrage. Le modèle de génération en temps réel d'image de style change avec les changements des paramètres de recadrage. Les modes de réalisation de la présente invention peuvent résoudre les problèmes selon lesquels il y a peu de modes de formation de modèle existants et selon lesquels les modes de formation de modèle existants ne peuvent pas répondre aux besoins des utilisateurs pour générer des images de style en temps réel, et obtenir l'effet de génération d'images de style en temps réel pour des utilisateurs.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011066405.7A CN112991358A (zh) | 2020-09-30 | 2020-09-30 | 风格图像生成方法、模型训练方法、装置、设备和介质 |
| CN202011066405.7 | 2020-09-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022068451A1 true WO2022068451A1 (fr) | 2022-04-07 |
Family
ID=76344350
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/113225 Ceased WO2022068451A1 (fr) | 2020-09-30 | 2021-08-18 | Procédé et appareil de génération d'image de style, procédé et appareil de formation de modèle, dispositif et support |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN112991358A (fr) |
| WO (1) | WO2022068451A1 (fr) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114897672A (zh) * | 2022-05-31 | 2022-08-12 | 北京外国语大学 | 一种基于等形变约束的图像漫画风格迁移方法 |
| CN115187450A (zh) * | 2022-06-27 | 2022-10-14 | 北京奇艺世纪科技有限公司 | 图像生成方法、图像生成装置及相关设备 |
| CN115222578A (zh) * | 2022-06-30 | 2022-10-21 | 北京旷视科技有限公司 | 图像风格迁移方法、程序产品、存储介质及电子设备 |
| CN115953339A (zh) * | 2022-12-07 | 2023-04-11 | 北京小米移动软件有限公司 | 图像融合处理方法、装置、设备、存储介质及芯片 |
| CN116862757A (zh) * | 2023-05-19 | 2023-10-10 | 上海任意门科技有限公司 | 一种控制人脸风格化程度的方法、装置、电子设备及介质 |
| CN117112826A (zh) * | 2023-08-24 | 2023-11-24 | 北京火山引擎科技有限公司 | 一种图像生成方法、装置、计算机设备及存储介质 |
| CN117253267A (zh) * | 2022-06-10 | 2023-12-19 | 北京字跳网络技术有限公司 | 图像处理方法、装置、设备及介质 |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112991358A (zh) * | 2020-09-30 | 2021-06-18 | 北京字节跳动网络技术有限公司 | 风格图像生成方法、模型训练方法、装置、设备和介质 |
| CN113222993A (zh) * | 2021-06-25 | 2021-08-06 | 北京市商汤科技开发有限公司 | 图像处理方法、装置、设备以及存储介质 |
| CN113961746B (zh) * | 2021-09-29 | 2023-11-21 | 北京百度网讯科技有限公司 | 视频生成方法、装置、电子设备及可读存储介质 |
| CN114004905B (zh) * | 2021-10-25 | 2024-03-29 | 北京字节跳动网络技术有限公司 | 人物风格形象图的生成方法、装置、设备及存储介质 |
| CN114241387B (zh) * | 2021-12-22 | 2025-10-03 | 脸萌有限公司 | 具有金属质感图像的生成方法以及模型的训练方法 |
| CN114429664A (zh) * | 2022-01-29 | 2022-05-03 | 脸萌有限公司 | 视频生成方法以及视频生成模型的训练方法 |
| CN116630139A (zh) * | 2022-02-11 | 2023-08-22 | 华为云计算技术有限公司 | 数据生成的方法、装置、设备和存储介质 |
| CN114972683A (zh) * | 2022-05-31 | 2022-08-30 | 北京百度网讯科技有限公司 | 基底生成方法、装置、电子设备以及存储介质 |
| CN115082299B (zh) * | 2022-07-21 | 2022-11-25 | 中国科学院自动化研究所 | 非严格对齐的小样本不同源图像转换方法、系统及设备 |
| CN119653204A (zh) * | 2024-12-11 | 2025-03-18 | 北京有竹居网络技术有限公司 | 用于视频生成的方法、装置、设备、存储介质和程序产品 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109325988A (zh) * | 2017-07-31 | 2019-02-12 | 腾讯科技(深圳)有限公司 | 一种面部表情合成方法、装置及电子设备 |
| CN110414378A (zh) * | 2019-07-10 | 2019-11-05 | 南京信息工程大学 | 一种基于异质人脸图像融合特征的人脸识别方法 |
| US20190370936A1 (en) * | 2018-06-04 | 2019-12-05 | Adobe Inc. | High Resolution Style Transfer |
| CN111062382A (zh) * | 2019-10-30 | 2020-04-24 | 北京交通大学 | 用于目标检测网络的通道剪枝方法 |
| CN111243050A (zh) * | 2020-01-08 | 2020-06-05 | 浙江省北大信息技术高等研究院 | 肖像简笔画生成方法、系统及绘画机器人 |
| CN111563455A (zh) * | 2020-05-08 | 2020-08-21 | 南昌工程学院 | 基于时间序列信号和压缩卷积神经网络的伤损识别方法 |
| CN112991358A (zh) * | 2020-09-30 | 2021-06-18 | 北京字节跳动网络技术有限公司 | 风格图像生成方法、模型训练方法、装置、设备和介质 |
Family Cites Families (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1462982A (zh) * | 2002-05-29 | 2003-12-24 | 明日工作室股份有限公司 | 智能动画生成方法以及系统 |
| CN100585775C (zh) * | 2007-01-30 | 2010-01-27 | 南京理工大学 | GaAs光电阴极退火后表面清洁程度的判断方法 |
| CN101034481A (zh) * | 2007-04-06 | 2007-09-12 | 湖北莲花山计算机视觉和信息科学研究院 | 一种肖像画的自动生成方法 |
| CN104657974A (zh) * | 2013-11-25 | 2015-05-27 | 腾讯科技(上海)有限公司 | 一种图像处理方法及装置 |
| WO2017029488A2 (fr) * | 2015-08-14 | 2017-02-23 | Metail Limited | Procédés de génération de modèles de têtes en 3d ou modèles de corps en 3d personnalisés |
| CN106934822B (zh) * | 2017-03-13 | 2019-09-13 | 浙江优迈德智能装备有限公司 | 基于线性混合形变的汽车工件非刚体三维点云配准方法 |
| CN108229278B (zh) * | 2017-04-14 | 2020-11-17 | 深圳市商汤科技有限公司 | 人脸图像处理方法、装置和电子设备 |
| CN107105315A (zh) * | 2017-05-11 | 2017-08-29 | 广州华多网络科技有限公司 | 直播方法、主播客户端的直播方法、主播客户端及设备 |
| CN109117947A (zh) * | 2017-10-30 | 2019-01-01 | 上海寒武纪信息科技有限公司 | 轮廓检测方法及相关产品 |
| CN108171789B (zh) * | 2017-12-21 | 2022-01-18 | 迈吉客科技(北京)有限公司 | 一种虚拟形象生成方法和系统 |
| CN108470320B (zh) * | 2018-02-24 | 2022-05-20 | 中山大学 | 一种基于cnn的图像风格化方法及系统 |
| CN108537725A (zh) * | 2018-04-10 | 2018-09-14 | 光锐恒宇(北京)科技有限公司 | 一种视频处理方法和装置 |
| CN109272111A (zh) * | 2018-08-15 | 2019-01-25 | 东南大学 | 一种基于化学反应网络的神经网络元件实现方法 |
| CN109255831B (zh) * | 2018-09-21 | 2020-06-12 | 南京大学 | 基于多任务学习的单视图人脸三维重建及纹理生成的方法 |
| CN109410131B (zh) * | 2018-09-28 | 2020-08-04 | 杭州格像科技有限公司 | 基于条件生成对抗神经网络的人脸美颜方法及系统 |
| CN109712080A (zh) * | 2018-10-12 | 2019-05-03 | 迈格威科技有限公司 | 图像处理方法、图像处理装置及存储介质 |
| CN109816098B (zh) * | 2019-01-25 | 2021-09-07 | 京东方科技集团股份有限公司 | 神经网络的处理方法及评估方法、数据分析方法及装置 |
| CN111488759A (zh) * | 2019-01-25 | 2020-08-04 | 北京字节跳动网络技术有限公司 | 动物脸部的图像处理方法和装置 |
| CN110072047B (zh) * | 2019-01-25 | 2020-10-09 | 北京字节跳动网络技术有限公司 | 图像形变的控制方法、装置和硬件装置 |
| CN110070540B (zh) * | 2019-04-28 | 2023-01-10 | 腾讯科技(深圳)有限公司 | 图像生成方法、装置、计算机设备及存储介质 |
| CN110826593B (zh) * | 2019-09-29 | 2021-02-05 | 腾讯科技(深圳)有限公司 | 融合图像处理模型的训练方法、图像处理方法、装置 |
| CN110930297B (zh) * | 2019-11-20 | 2023-08-18 | 咪咕动漫有限公司 | 人脸图像的风格迁移方法、装置、电子设备及存储介质 |
| CN111145283A (zh) * | 2019-12-13 | 2020-05-12 | 北京智慧章鱼科技有限公司 | 一种用于输入法的表情个性化生成方法及装置 |
| CN111222041A (zh) * | 2019-12-30 | 2020-06-02 | 北京达佳互联信息技术有限公司 | 拍摄资源数据获取方法、装置、电子设备及存储介质 |
| CN111160264B (zh) * | 2019-12-30 | 2023-05-12 | 中山大学 | 一种基于生成对抗网络的漫画人物身份识别方法 |
| CN111243051B (zh) * | 2020-01-08 | 2023-08-18 | 杭州未名信科科技有限公司 | 基于肖像照片的简笔画生成方法、系统及存储介质 |
| CN111429415B (zh) * | 2020-03-18 | 2020-12-08 | 东华大学 | 基于网络协同剪枝的产品表面缺陷高效检测模型构建方法 |
| CN111626113A (zh) * | 2020-04-20 | 2020-09-04 | 北京市西城区培智中心学校 | 一种基于面部动作单元的面部表情识别方法和装置 |
| CN111626968B (zh) * | 2020-04-29 | 2022-08-26 | 杭州火烧云科技有限公司 | 一种基于全局信息和局部信息的像素增强设计方法 |
| CN111696028B (zh) * | 2020-05-22 | 2025-06-06 | 华南理工大学 | 真实场景图像卡通化的处理方法、装置、计算机设备和存储介质 |
-
2020
- 2020-09-30 CN CN202011066405.7A patent/CN112991358A/zh active Pending
-
2021
- 2021-08-18 WO PCT/CN2021/113225 patent/WO2022068451A1/fr not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109325988A (zh) * | 2017-07-31 | 2019-02-12 | 腾讯科技(深圳)有限公司 | 一种面部表情合成方法、装置及电子设备 |
| US20190370936A1 (en) * | 2018-06-04 | 2019-12-05 | Adobe Inc. | High Resolution Style Transfer |
| CN110414378A (zh) * | 2019-07-10 | 2019-11-05 | 南京信息工程大学 | 一种基于异质人脸图像融合特征的人脸识别方法 |
| CN111062382A (zh) * | 2019-10-30 | 2020-04-24 | 北京交通大学 | 用于目标检测网络的通道剪枝方法 |
| CN111243050A (zh) * | 2020-01-08 | 2020-06-05 | 浙江省北大信息技术高等研究院 | 肖像简笔画生成方法、系统及绘画机器人 |
| CN111563455A (zh) * | 2020-05-08 | 2020-08-21 | 南昌工程学院 | 基于时间序列信号和压缩卷积神经网络的伤损识别方法 |
| CN112991358A (zh) * | 2020-09-30 | 2021-06-18 | 北京字节跳动网络技术有限公司 | 风格图像生成方法、模型训练方法、装置、设备和介质 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114897672A (zh) * | 2022-05-31 | 2022-08-12 | 北京外国语大学 | 一种基于等形变约束的图像漫画风格迁移方法 |
| CN117253267A (zh) * | 2022-06-10 | 2023-12-19 | 北京字跳网络技术有限公司 | 图像处理方法、装置、设备及介质 |
| CN115187450A (zh) * | 2022-06-27 | 2022-10-14 | 北京奇艺世纪科技有限公司 | 图像生成方法、图像生成装置及相关设备 |
| CN115222578A (zh) * | 2022-06-30 | 2022-10-21 | 北京旷视科技有限公司 | 图像风格迁移方法、程序产品、存储介质及电子设备 |
| CN115953339A (zh) * | 2022-12-07 | 2023-04-11 | 北京小米移动软件有限公司 | 图像融合处理方法、装置、设备、存储介质及芯片 |
| CN116862757A (zh) * | 2023-05-19 | 2023-10-10 | 上海任意门科技有限公司 | 一种控制人脸风格化程度的方法、装置、电子设备及介质 |
| CN117112826A (zh) * | 2023-08-24 | 2023-11-24 | 北京火山引擎科技有限公司 | 一种图像生成方法、装置、计算机设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112991358A (zh) | 2021-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022068451A1 (fr) | Procédé et appareil de génération d'image de style, procédé et appareil de formation de modèle, dispositif et support | |
| US11410284B2 (en) | Face beautification method and apparatus, computer device, and storage medium | |
| CN112989904B (zh) | 风格图像生成方法、模型训练方法、装置、设备和介质 | |
| WO2022012085A1 (fr) | Procédé et appareil de traitement d'image de visage, support de stockage et dispositif électronique | |
| CN113506305B (zh) | 三维点云数据的图像增强方法、语义分割方法及装置 | |
| CN114049417B (zh) | 虚拟角色图像的生成方法、装置、可读介质及电子设备 | |
| WO2022042290A1 (fr) | Procédé et appareil de traitement de modèle virtuel, dispositif électronique et support de stockage | |
| WO2023284401A1 (fr) | Procédé et appareil de traitement d'embellissement d'image, support de stockage et dispositif électronique | |
| CN114913061A (zh) | 一种图像处理方法、装置、存储介质及电子设备 | |
| CN113902636A (zh) | 图像去模糊方法及装置、计算机可读介质和电子设备 | |
| CN113630549A (zh) | 变焦控制方法、装置、电子设备和计算机可读存储介质 | |
| CN114078083A (zh) | 头发变换模型生成方法和装置、头发变换方法和装置 | |
| WO2022132032A1 (fr) | Procédé et dispositif de traitement d'image de portrait | |
| US20240355036A1 (en) | Video processing method and apparatus, device and storage medium | |
| WO2023207379A1 (fr) | Procédé et appareil de traitement d'images, dispositif et support de stockage | |
| WO2025077567A1 (fr) | Procédé, appareil et dispositif de sortie de modèle tridimensionnel, et support de stockage lisible par ordinateur | |
| CN110211017B (zh) | 图像处理方法、装置及电子设备 | |
| CN113240599B (zh) | 图像调色方法及装置、计算机可读存储介质、电子设备 | |
| CN114596383A (zh) | 线条特效处理方法、装置、电子设备、存储介质及产品 | |
| CN114445301A (zh) | 图像处理方法、装置、电子设备及存储介质 | |
| CN114723600A (zh) | 美妆特效的生成方法、装置、设备、存储介质和程序产品 | |
| CN115937010A (zh) | 一种图像处理方法、装置、设备及介质 | |
| WO2023040813A1 (fr) | Procédé et appareil de traitement d'image faciale, dispositif et support | |
| CN114332590A (zh) | 联合感知模型训练、联合感知方法、装置、设备和介质 | |
| WO2025087392A1 (fr) | Procédé et appareil de traitement de données multimédia, et dispositif électronique et support de stockage |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21874113 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.07.2023) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21874113 Country of ref document: EP Kind code of ref document: A1 |