CN111814566A

CN111814566A - Image editing method, device, electronic device, and storage medium

Info

Publication number: CN111814566A
Application number: CN202010529860.XA
Authority: CN
Inventors: 李琦; 柴振华; 赖申其; 孙哲南; 邓琪瑶
Original assignee: Institute of Automation of Chinese Academy of Science; Beijing Sankuai Online Technology Co Ltd
Current assignee: Institute of Automation of Chinese Academy of Science; Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2020-10-23

Abstract

The embodiments of the present application disclose an image editing method, device, electronic device, and storage medium. The method includes: acquiring a face image to be edited and a mask map corresponding to a target attribute region in the face image to be edited, and acquiring The reference image; according to the mask map, the face image to be edited is processed as a missing image missing the target attribute area; the missing image is encoded by the first encoder to obtain the missing feature corresponding to the missing image; Perform image coding on the reference image to obtain the reference feature corresponding to the reference image; according to the mask map, the missing feature and the reference feature are fused by the attention model to obtain the fusion feature; the image fusion feature is processed by the decoder. Decoding is performed to obtain the target image corresponding to the face image to be edited and the reference image. The embodiment of the present application improves the diversity of face attribute editing, and avoids the influence on irrelevant areas outside the target attribute area.

Description

Image editing method, device, electronic device, and storage medium

技术领域technical field

本申请实施例涉及图像处理技术领域，特别是涉及一种图像编辑方法、装置、电子设备及存储介质。The embodiments of the present application relate to the technical field of image processing, and in particular, to an image editing method, apparatus, electronic device, and storage medium.

背景技术Background technique

人脸编辑由于其在影视制作、照片处理和交互式娱乐等方面的潜在应用，在计算机视觉界引起了极大的关注。近年来随着生成对抗网络的发展，人脸编辑已取得了巨大进步。目前主流的人脸编辑方法大致分为三大类：基于标签条件的方法、基于几何指导的方法和基于参考引导的方法。基于标签条件的方法以预定义的属性标签为条件编辑人脸属性，该方法以二值属性标签为条件，只适用于编辑外观纹理变化的显著属性(例如发色、年老化和去除胡子等)，难以实现抽象形状变化(例如鹰钩鼻、丹凤眼等)，缺乏控制高级语义面部组件(例如眼睛、鼻子和嘴等)形状的灵活性。为了能灵活编辑人脸属性的形状，基于几何指导的方法提出利用精确的中间表示(例如关键点，分割图和轮廓草图等)实现具有明显拓扑形变的人脸属性编辑，然而，这种方式费时费力并且要求具有绘画技能。与基于几何指导的方法不同，基于参考引导的方法直接从参考图像中学习相应的人脸信息进行人脸编辑，不需要精确的辅助表示，避免了人脸编辑对精确的轮廓草图、颜色图和分割图的依赖。Face editing has attracted great attention in the computer vision community due to its potential applications in film and television production, photo manipulation, and interactive entertainment. Face editing has made great progress in recent years with the development of generative adversarial networks. The current mainstream face editing methods are roughly divided into three categories: methods based on label conditions, methods based on geometric guidance, and methods based on reference guidance. Label-condition-based methods edit face attributes conditioned on predefined attribute labels. This method is conditioned on binary attribute labels and is only suitable for editing notable attributes with changes in appearance and texture (such as hair color, ageing, and beard removal, etc.) , it is difficult to achieve abstract shape changes (such as hooked nose, phoenix eyes, etc.), and it lacks the flexibility to control the shape of high-level semantic facial components (such as eyes, nose, and mouth, etc.). In order to flexibly edit the shape of face attributes, geometric guidance-based methods propose to use accurate intermediate representations (such as keypoints, segmentation maps and contour sketches, etc.) to achieve face attribute editing with obvious topological deformation, however, this method is time-consuming It is laborious and requires drawing skills. Different from the method based on geometric guidance, the method based on reference guidance directly learns the corresponding face information from the reference image for face editing, does not require accurate auxiliary representation, and avoids the need for accurate contour sketches, color maps and face editing. Dependency of the segmentation map.

现有技术中，基于参考引导的方法有如下两种：ExGANs和ELEGANT。In the prior art, there are two methods based on reference guidance: ExGANs and ELEGANT.

ExGANs是一种基于参考图像进行人脸补全的方法。ExGANs是条件生成对抗网络的一种扩展，它以具有目标补全内容的参考图像或感知编码作为条件。ExGANs的训练流程可以概括为：对输入图像标记眼睛，即去除眼睛区域；以具有目标补全内容的参考图像或感知编码为指导对图像进行补全；通过输入图像和补全图像之间的内容重构损失，计算生成器参数的梯度；通过补全图像、原始图和参考图像或感知代码计算判别器参数的梯度；通过生成器反向传播判别器的误差。ExGANs is a method for face completion based on reference images. ExGANs are an extension of Conditional Generative Adversarial Networks conditioned on a reference image or perceptual encoding with target completion content. The training process of ExGANs can be summarized as: label the eyes of the input image, that is, remove the eye area; complete the image with the reference image or perceptual coding with the target completion content as a guide; pass the content between the input image and the completed image Reconstruction loss, calculate the gradient of the generator parameters; calculate the gradient of the discriminator parameters through the completed image, original image and reference image or perceptual code; backpropagate the error of the discriminator through the generator.

ELEGANT是一种基于参考图像实现人脸属性编辑的方法。ELEGANT以具有相反属性的两张图片A和B作为输入，两张图片不要求相同身份。在潜在空间中以解耦的方式对图片的所有属性进行编码，即假设所有属性互不关联，可以分开表示。通过交换两张图片同一类型属性的潜在编码将属性从一幅图像转移到另一幅图像。两张图片的原始属性编码和交换后的属性编码两两结合，可以得到四种编码结果。为了缓解对目标属性无关区域的影响，以残差图的形式表示编辑的属性区域。最后，残差图与相应原始图相加得到四种生成结果：A图片的重建，交换属性后的A图片，B图片的重建，交换属性后的B图片。ELEGANT is a method for editing face attributes based on reference images. ELEGANT takes as input two images A and B with opposite properties, the two images do not require the same identity. All attributes of a picture are encoded in a decoupled manner in the latent space, that is, all attributes are assumed to be independent of each other and can be represented separately. Attributes are transferred from one image to another by exchanging latent encodings of attributes of the same type of two images. Combining the original attribute encoding and the exchanged attribute encoding of the two pictures, four encoding results can be obtained. To alleviate the impact on the target attribute-independent region, the edited attribute region is represented in the form of a residual map. Finally, the residual map is added to the corresponding original image to obtain four generation results: the reconstruction of the A picture, the A picture after exchanging attributes, the reconstruction of B picture, and the B picture after exchanging attributes.

由于ExGANs要求参考图像必须是同一身份，而同一身份的人脸五官形状不会发生变化，因此该模型只能应用于睁眼-闭眼的人眼补全任务，无法扩展到其他人脸属性编辑。而ELEGANT局限于编辑外观纹理变化的属性，不能编辑抽象语义形状的属性，而且由于ELEGANT以解耦的方式编码所有属性，但是，属性标注集的属性彼此关联(例如：发色和年龄)，无法完全分开独立编码，因此，解耦的前提假设并不符合实际情况，导致对与目标属性无关的区域有显著影响。Since ExGANs requires that the reference image must be of the same identity, and the facial features of the same identity will not change, the model can only be applied to the eye-opening-eye-closed eye completion task, and cannot be extended to other face attribute editing . However, ELEGANT is limited to editing the properties of appearance and texture changes, and cannot edit the properties of abstract semantic shapes, and because ELEGANT encodes all properties in a decoupled manner, however, the properties of the attribute annotation set are related to each other (for example: hair color and age), it is impossible to edit the properties of abstract semantic shapes. Encodings are completely separate and independent, therefore, the premise of decoupling does not correspond to reality, resulting in significant effects on regions unrelated to the target attribute.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种图像编辑方法、装置、电子设备及存储介质，有助于提高人脸属性编辑的多样性且避免对无关区域的影响。Embodiments of the present application provide an image editing method, apparatus, electronic device, and storage medium, which help to improve the diversity of face attribute editing and avoid influence on irrelevant areas.

为了解决上述问题，第一方面，本申请实施例提供了一种图像编辑方法，包括：In order to solve the above problems, in the first aspect, an embodiment of the present application provides an image editing method, including:

获取待编辑人脸图像和与待编辑人脸图像中的目标属性区域对应的掩码图，并获取参考图像；Obtain the face image to be edited and the mask map corresponding to the target attribute area in the face image to be edited, and obtain the reference image;

根据所述掩码图，将所述待编辑人脸图像处理为缺失所述目标属性区域的缺失图像；According to the mask map, the face image to be edited is processed as a missing image missing the target attribute area;

通过第一编码器对所述缺失图像进行图像编码，得到所述缺失图像对应的缺失特征；Image encoding is performed on the missing image by the first encoder to obtain the missing feature corresponding to the missing image;

通过第二编码器对所述参考图像进行图像编码，得到所述参考图像对应的参考特征；Image coding is performed on the reference image by the second encoder to obtain a reference feature corresponding to the reference image;

根据所述掩码图，通过注意力模型对所述缺失特征和参考特征进行融合，得到融合特征；According to the mask map, the missing feature and the reference feature are fused through an attention model to obtain a fused feature;

通过解码器对所述融合特征进行图像解码，得到所述待编辑人脸图像和参考图像对应的目标图像。The decoder performs image decoding on the fusion feature to obtain the target image corresponding to the face image to be edited and the reference image.

第二方面，本申请实施例提供了一种图像编辑装置，包括：In a second aspect, an embodiment of the present application provides an image editing apparatus, including:

图像获取模块，用于获取待编辑人脸图像和与待编辑人脸图像中的目标属性区域对应的掩码图，并获取参考图像；an image acquisition module, used for acquiring the face image to be edited and the mask map corresponding to the target attribute region in the face image to be edited, and acquiring the reference image;

图像缺失处理模块，用于根据所述掩码图，将所述待编辑人脸图像处理为缺失所述目标属性区域的缺失图像；an image missing processing module, configured to process the to-be-edited face image as a missing image missing the target attribute area according to the mask map;

第一编码模块，用于通过第一编码器对所述缺失图像进行图像编码，得到所述缺失图像对应的缺失特征；a first encoding module, configured to perform image encoding on the missing image by the first encoder to obtain the missing feature corresponding to the missing image;

第二编码模块，用于通过第二编码器对所述参考图像进行图像编码，得到所述参考图像对应的参考特征；a second encoding module, configured to perform image encoding on the reference image through a second encoder to obtain a reference feature corresponding to the reference image;

特征融合模块，用于根据所述掩码图，通过注意力模型对所述缺失特征和参考特征进行融合，得到融合特征；a feature fusion module, configured to fuse the missing feature and the reference feature through an attention model according to the mask map to obtain a fused feature;

解码模块，用于通过解码器对所述融合特征进行图像解码，得到所述待编辑人脸图像和参考图像对应的目标图像。The decoding module is used for performing image decoding on the fusion feature through a decoder to obtain the target image corresponding to the face image to be edited and the reference image.

第三方面，本申请实施例还提供了一种电子设备，包括存储器、处理器及存储在所述存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现本申请实施例所述的图像编辑方法。In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements the computer program when the processor executes the computer program. The image editing method described in the embodiments of the present application.

第四方面，本申请实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时本申请实施例公开的图像编辑方法的步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image editing method disclosed in the embodiment of the present application are performed.

本申请实施例提供的图像编辑方法、装置、电子设备及存储介质，通过根据待编辑人脸图像中的目标属性区域对应的掩码图，将待编辑人脸图像处理为缺失目标属性区域的缺失图像，通过第一编码器对缺失图像进行图像编码得到缺失图像对应的缺失特征，通过第二编码器对参考图像进行图像编码得到参考图像对应的参考特征，通过注意力模型将缺失特征和参考特征融合为融合特征，并通过解码器对融合特征进行图像解码，得到目标图像，由于只需要获取到待编辑人脸图像和目标属性区域对应的掩码图，即可将参考图像中的目标属性区域对应的特征迁移到待编辑人脸图像的目标属性区域中，从而可以迁移各个区域的特征，解决了现有技术中的ExGANs只能编辑眼睛区域的缺陷，而且参考图像可以是与待编辑人脸图像不同身份的图像，从而提高了人脸属性编辑的多样性，而且不需要将目标属性区域与其他区域进行解耦，避免了对目标属性区域外无关区域的影响。The image editing method, device, electronic device, and storage medium provided by the embodiments of the present application process the face image to be edited as a missing target attribute area according to the mask map corresponding to the target attribute area in the face image to be edited Image, the missing image is encoded by the first encoder to obtain the missing feature corresponding to the missing image, the image encoding is performed on the reference image by the second encoder to obtain the reference feature corresponding to the reference image, and the missing feature and the reference feature are combined by the attention model. The fusion features are fused into fusion features, and the fused features are decoded by the decoder to obtain the target image. Since only the mask map corresponding to the face image to be edited and the target attribute area needs to be obtained, the target attribute area in the reference image can be converted into the target attribute area. The corresponding features are migrated to the target attribute area of the face image to be edited, so that the features of each area can be migrated, which solves the defect that ExGANs in the prior art can only edit the eye area, and the reference image can be the same as the face to be edited. Image images with different identities, thus improving the diversity of face attribute editing, and there is no need to decouple the target attribute area from other areas, avoiding the impact on irrelevant areas outside the target attribute area.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the drawings in the present application. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative labor.

图1是本申请实施例一的图像编辑方法的流程图；1 is a flowchart of an image editing method according to Embodiment 1 of the present application;

图2是本申请实施例中的通过注意力模型融合缺失特征和参考特征的计算结构图；2 is a computational structure diagram of fusion of missing features and reference features through an attention model in an embodiment of the present application;

图3是本申请实施例二的图像编辑装置的结构示意图；3 is a schematic structural diagram of an image editing apparatus according to Embodiment 2 of the present application;

图4是本申请实施例三的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to Embodiment 3 of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

实施例一Example 1

本实施例提供的一种图像编辑方法，如图1所示，该方法包括：步骤110至步骤140。An image editing method provided by this embodiment, as shown in FIG. 1 , includes steps 110 to 140 .

步骤110，获取待编辑人脸图像和与待编辑人脸图像中的目标属性区域对应的掩码图，并获取参考图像。Step 110: Obtain a face image to be edited and a mask map corresponding to a target attribute region in the face image to be edited, and obtain a reference image.

其中，目标属性区域包括眼睛区域、鼻子区域、嘴巴区域或其他区域(如头发、胡子等)中的至少一种。待编辑人脸图像是要进行人脸属性编辑的图像，参考图像用于提供目标属性区域对应的参考特征。The target attribute area includes at least one of an eye area, a nose area, a mouth area, or other areas (such as hair, beard, etc.). The face image to be edited is an image to be edited with face attributes, and the reference image is used to provide reference features corresponding to the target attribute area.

可以根据用户指定的待编辑人脸图像的存储路径获取待编辑人脸图像，并获取用户指定的要编辑的目标属性区域，并可以根据用户指定的参考图像的存储路径获取参考图像；或者，还可以获取用户上传的待编辑人脸图像和参考图像，并获取用户指定的要编辑的目标属性区域。获取用户指定的要编辑的目标属性区域可以是获取用户指定的目标属性区域的名称，并基于该名称进行人脸识别，确定目标属性区域；或者，在待编辑人脸图像上供用户操作选定目标属性区域。The face image to be edited can be obtained according to the storage path of the face image to be edited specified by the user, and the target attribute region to be edited specified by the user can be obtained, and the reference image can be obtained according to the storage path of the reference image specified by the user; The face image to be edited and the reference image uploaded by the user can be obtained, and the target attribute area to be edited specified by the user can be obtained. Obtaining the target attribute area to be edited specified by the user may be obtaining the name of the target attribute area specified by the user, and performing face recognition based on the name to determine the target attribute area; or, on the face image to be edited for the user to operate and select Target property area.

在待编辑人脸图像中的目标属性区域后，生成所述待编辑人脸图像中的目标属性区域对应的掩码图，从而得到待编辑人脸图像中的目标属性区域对应的掩码图。在所述掩码图中，所述目标属性区域对应的位置的值是0，而其他区域是有值的，其值可以为1。After the target attribute region in the face image to be edited, a mask map corresponding to the target attribute region in the face image to be edited is generated, thereby obtaining a mask map corresponding to the target attribute region in the face image to be edited. In the mask map, the value of the position corresponding to the target attribute area is 0, while other areas have values, and the value may be 1.

步骤120，根据所述掩码图，将所述待编辑人脸图像处理为缺失所述目标属性区域的缺失图像。Step 120: According to the mask map, the face image to be edited is processed as a missing image missing the target attribute area.

根据待编辑人脸图像中的目标属性区域对应的掩码图，将待编辑人脸图像中的目标属性区域抠掉，从而将待编辑人脸图像处理成了缺失目标属性区域的缺失图像。According to the mask map corresponding to the target attribute area in the face image to be edited, the target attribute area in the face image to be edited is cut out, so that the face image to be edited is processed into a missing image missing the target attribute area.

在本申请的一个实施例中，所述根据所述掩码图，将所述待编辑人脸图像处理为缺失所述目标属性区域的缺失图像，包括：对所述掩码图进行腐蚀和膨胀处理，得到处理后的掩码图；根据所述处理后的掩码图，将所述待编辑人脸图像中的目标属性区域去除，得到缺失图像。In an embodiment of the present application, processing the face image to be edited into a missing image missing the target attribute area according to the mask map includes: eroding and dilating the mask map processing to obtain a processed mask map; and according to the processed mask map, remove the target attribute area in the face image to be edited to obtain a missing image.

其值，膨胀是求局部最大值的操作，即将图像与核进行卷积，计算核覆盖区域的像素点的最大值，并把这个最大值赋值给参考点指定的元素。膨胀操作会使图像中的高亮区域逐渐增长。腐蚀和膨胀是相反的操作，腐蚀是求局部最小值的操作，腐蚀操作会使图像中的高亮区逐渐减小。Its value, expansion is the operation of finding the local maximum, that is, convolving the image with the kernel, calculating the maximum value of the pixels in the area covered by the kernel, and assigning this maximum value to the element specified by the reference point. Dilation causes the highlighted areas in the image to grow gradually. Erosion and dilation are opposite operations. Erosion is an operation to find a local minimum. Erosion will gradually reduce the highlight area in the image.

首先对待编辑人脸图像中的目标属性区域对应的掩码图进行腐蚀处理，之后再对腐蚀结果进行膨胀处理，得到处理后的掩码图。通过对原始掩码图进行腐蚀和膨胀处理，可以消除掩码图中的噪声。使用处理后的掩码图，来对待编辑人脸图像进行抠图操作，以将待编辑人脸图像处理为缺失目标属性区域的缺失图像，由于处理后的掩码图消除了噪声，从而得到的缺失图像更加准确，可以提高后续图像编辑的效果。First, the mask image corresponding to the target attribute area in the face image to be edited is subjected to corrosion processing, and then the corrosion result is expanded to obtain the processed mask image. Noise in the mask image can be removed by eroding and dilating the original mask image. Use the processed mask image to perform a matting operation on the face image to be edited, so as to process the face image to be edited as a missing image of the missing target attribute area. Since the processed mask image eliminates noise, the obtained Missing images are more accurate and can improve the effect of subsequent image editing.

步骤130，通过第一编码器对所述缺失图像进行图像编码，得到所述缺失图像对应的缺失特征。Step 130: Perform image encoding on the missing image by the first encoder to obtain the missing feature corresponding to the missing image.

其中，第一编码器是针对缺失某个区域的图像使用的编码器。第一编码器可以是基于神经网络的图像编码器，经过训练，网络参数已确定，可以直接对缺失某区域的图像进行编码。The first encoder is an encoder used for images missing a certain area. The first encoder may be an image encoder based on a neural network. After training, the network parameters have been determined, and the image with a missing area can be directly encoded.

通过第一编码器对缺失图像进行图像编码，以对缺失图像进行压缩，在满足一定质量的前提下以较少的比特数表示缺失图像中所包含的信息，得到缺失图像对应的缺失特征。Image coding is performed on the missing image by the first encoder to compress the missing image, and the information contained in the missing image is represented by a smaller number of bits on the premise of satisfying a certain quality, so as to obtain the missing feature corresponding to the missing image.

步骤140，通过第二编码器对所述参考图像进行图像编码，得到所述参考图像对应的参考特征。Step 140: Perform image coding on the reference image by using the second encoder to obtain a reference feature corresponding to the reference image.

其中，第二编码器是针对完整的图像使用的编码器。第二编码器可以是基于神经网络的图像编码器，经过训练，网络参数已确定，可以直接对完整的图像进行编码。Wherein, the second encoder is the encoder used for the complete image. The second encoder can be an image encoder based on a neural network. After training, the network parameters have been determined, and the complete image can be directly encoded.

通过第二编码器对参考图像进行图像编码，以对参考图像进行压缩，得到参考图像对应的参考特征。Image coding is performed on the reference image by the second encoder to compress the reference image to obtain the reference feature corresponding to the reference image.

步骤150，根据所述掩码图，通过注意力模型对所述缺失特征和参考特征进行融合，得到融合特征。Step 150: According to the mask map, the missing feature and the reference feature are fused through an attention model to obtain a fused feature.

通过注意力模型，首先对参考特征进行自注意力处理，得到自注意特征，并根据待编辑人脸图像中目标属性区域对应的掩码图或者经过腐蚀和膨胀处理后的掩码图，强化参考特征中与目标属性区域对应区域的特征，并弱化其他区域的特征，之后将处理后的缺失特征和参考特征进行融合，以将参考特征中的与目标属性区域对应区域的特征迁移到缺失特征中的目标属性区域对应的特征中，得到的融合特征。Through the attention model, first self-attention processing is performed on the reference features to obtain self-attention features, and the reference is strengthened according to the mask map corresponding to the target attribute area in the face image to be edited or the mask map after erosion and expansion processing. In the feature, the feature of the area corresponding to the target attribute area is weakened, and then the processed missing feature and the reference feature are fused to transfer the feature of the reference feature corresponding to the target attribute area to the missing feature. Among the features corresponding to the target attribute region of , the fusion features are obtained.

在本申请的一个实施例中，所述根据所述掩码图，通过注意力模型对所述缺失特征和参考特征进行融合，得到融合特征，包括：生成所述缺失特征对应的注意力图；融合所述缺失特征和所述注意力图，得到所述缺失特征对应的自注意特征；根据所述掩码图、注意力图和参考特征，生成所述参考特征对所述缺失特征的实例指导特征；融合所述自注意特征和所述实例指导特征，得到所述缺失特征和参考特征的融合特征。In an embodiment of the present application, according to the mask map, the missing feature and the reference feature are fused through an attention model to obtain a fused feature, including: generating an attention map corresponding to the missing feature; fusing The missing feature and the attention map are used to obtain the self-attention feature corresponding to the missing feature; according to the mask map, the attention map and the reference feature, an instance guiding feature of the reference feature to the missing feature is generated; fusion The self-attention feature and the instance guidance feature are used to obtain a fusion feature of the missing feature and the reference feature.

图2是本申请实施例中的通过注意力模型融合缺失特征和参考特征的计算结构图。如图2所示，首先对所述缺失特征进行1×1卷积处理，得到所述缺失特征对应的询问特征，之后将询问特征的转置与询问特征进行相乘，并对矩阵乘积进行softmax计算，得到缺失特征对应的注意力图，融合缺失特征和注意力图，得到缺失特征对应的自注意特征，通过缺失特征得到自注意特征，以公式表示如下：FIG. 2 is a computational structure diagram of fusing missing features and reference features through an attention model in an embodiment of the present application. As shown in Figure 2, the missing feature is first subjected to 1×1 convolution processing to obtain the query feature corresponding to the missing feature, then the transpose of the query feature is multiplied by the query feature, and the matrix product is softmaxed Calculate, get the attention map corresponding to the missing feature, fuse the missing feature and the attention map, get the self-attention feature corresponding to the missing feature, and get the self-attention feature through the missing feature, which is expressed in the following formula:

其中，F_a表示自注意特征，F_s表示输入注意力模型的原始特征，即缺失特征，F_m表示注意力图，

F_q＝Conv(F_s)，表示询问特征，Conv表示1×1卷积。where F _a represents the self-attention feature, F _s represents the original feature input to the attention model, that is, the missing feature, and F _m represents the attention map,

F _q = Conv(F _s ), representing the query feature, and Conv representing a 1×1 convolution.

根据掩码图、注意力图对参考特征进行处理，以强化参考特征中与目标属性区域对应区域的特征，并弱化其他区域的特征，从而生成参考特征对缺失特征的实例指导特征。对自注意特征和实例指导特征进行融合，得到缺失特征和参考特征的融合特征，即实现将参考图像中的与目标属性区域对应区域的特征迁移到缺失图像中的目标属性区域，得到的融合特征。The reference features are processed according to the mask map and attention map to strengthen the features of the regions corresponding to the target attribute region in the reference features, and weaken the features of other regions, so as to generate the instance-guided features of the missing features from the reference features. The self-attention feature and instance-guided feature are fused to obtain the fusion feature of the missing feature and the reference feature, that is, the feature of the region corresponding to the target attribute area in the reference image is transferred to the target attribute area in the missing image, and the obtained fusion feature .

在本申请的一个实施例中，所述根据所述掩码图、注意力图和参考特征，生成所述参考特征对所述缺失特征的实例指导特征，包括：融合所述参考特征和所述自注意力图，得到参考注意特征；将所述掩码图和所述参考注意特征进行相乘运算，得到所述目标属性区域以外的区域对应的实例指导特征，将所述目标属性区域以外的区域对应的实例指导特征作为第一实例指导特征；计算1矩阵与所述掩码图之差，得到所述掩码图的反掩码图，将所述反掩码图和所述参考特征进行相乘运算，得到所述目标属性区域对应的实例指导特征，将所述目标属性区域对应的实例指导特征作为第二实例指导特征；将所述第一实例指导特征和所述第二实例指导特征进行相加运算，得到所述参考特征对所述缺失特征的实例指导特征。In an embodiment of the present application, generating an instance guidance feature of the reference feature for the missing feature according to the mask map, the attention map and the reference feature includes: fusing the reference feature and the self- attention map to obtain the reference attention feature; multiply the mask map and the reference attention feature to obtain the instance guidance feature corresponding to the area outside the target attribute area, and the area outside the target attribute area corresponds to The instance guiding feature of is as the first instance guiding feature; calculate the difference between the 1 matrix and the mask map, obtain the inverse mask map of the mask map, and multiply the inverse mask map and the reference feature operation to obtain the instance guidance feature corresponding to the target attribute area, and use the instance guidance feature corresponding to the target attribute area as the second instance guidance feature; compare the first instance guidance feature with the second instance guidance feature An addition operation is performed to obtain an instance guiding feature of the reference feature to the missing feature.

可以通过如下公式得到参考特征对缺失特征的实例指导特征：The instance-guided feature of the reference feature to the missing feature can be obtained by the following formula:

其中，

是待编辑人脸图像中目标属性区域对应的掩码图，或者进行腐蚀和膨胀处理后的掩码图，F_r表示参考特征，

表示对参考特征F_r和注意力图F_m进行融合得到的参考注意特征。上述公式中的前一部分，即

表示第一实例指导特征，上述公式中的后一部分即

表示第二实例指导特征，计算这两部分之和，得到参考特征对缺失特征的实例指导特征。in,

is the mask map corresponding to the target attribute area in the face image to be edited, or the mask map after corrosion and expansion processing, F _r represents the reference feature,

Represents the reference attention feature obtained by fusing the reference feature _Fr and the attention map _Fm . The first part of the above formula, i.e.

represents the guiding feature of the first instance, the latter part of the above formula is

Represents the second instance guidance feature, calculates the sum of these two parts, and obtains the instance guidance feature of the reference feature to the missing feature.

步骤160，通过解码器对所述融合特征进行图像解码，得到所述待编辑人脸图像和参考图像对应的目标图像。Step 160: Perform image decoding on the fusion feature through a decoder to obtain a target image corresponding to the face image to be edited and the reference image.

其中，解码器可以是基于神经网络的图像解码器，经过训练，网络参数已确定，可以直接将图像特征解码为完整的图像。The decoder can be an image decoder based on a neural network. After training, the network parameters have been determined, and the image features can be directly decoded into a complete image.

通过解码器对融合特征进行图像解码，得到目标图像，即得到将参考图像中与目标属性区域对应区域的特征迁移到缺失图像中的图像。The decoder performs image decoding on the fusion features to obtain the target image, that is, the image in which the features of the region corresponding to the target attribute region in the reference image are migrated to the missing image.

本申请实施例提供的图像编辑方法，通过根据待编辑人脸图像中的目标属性区域对应的掩码图，将待编辑人脸图像处理为缺失目标属性区域的缺失图像，通过第一编码器对缺失图像进行图像编码得到缺失图像对应的缺失特征，通过第二编码器对参考图像进行图像编码得到参考图像对应的参考特征，通过注意力模型将缺失特征和参考特征融合为融合特征，并通过解码器对融合特征进行图像解码，得到目标图像，由于只需要获取到待编辑人脸图像和目标属性区域对应的掩码图，即可将参考图像中的目标属性区域对应的特征迁移到待编辑人脸图像的目标属性区域中，从而可以迁移各个区域的特征，解决了现有技术中的ExGANs只能编辑眼睛区域的缺陷，而且参考图像可以是与待编辑人脸图像不同身份的图像，从而提高了人脸属性编辑的多样性，而且不需要将目标属性区域与其他区域进行解耦，避免了对目标属性区域外无关区域的影响。In the image editing method provided by the embodiment of the present application, the face image to be edited is processed into a missing image missing the target attribute area according to the mask map corresponding to the target attribute area in the face image to be edited, and the The missing image is encoded by the image to obtain the missing feature corresponding to the missing image, and the reference image is encoded by the second encoder to obtain the reference feature corresponding to the reference image. The image decoder decodes the fusion features to obtain the target image. Since only the mask image corresponding to the face image to be edited and the target attribute area needs to be obtained, the features corresponding to the target attribute area in the reference image can be migrated to the person to be edited. In the target attribute area of the face image, the features of each area can be migrated, which solves the defect that ExGANs in the prior art can only edit the eye area, and the reference image can be an image with a different identity from the face image to be edited, thereby improving The diversity of face attribute editing is improved, and the target attribute area does not need to be decoupled from other areas, avoiding the impact on irrelevant areas outside the target attribute area.

在上述技术方案的基础上，所述方法还包括：在对所述第一编码器、第二编码器、注意力模型和解码器进行训练时，分别确定所述待编辑人脸图像与目标图像的感知损失函数的值和风格损失函数的值，并确定所述目标图像与所述参考图像的Contextual损失函数的值；根据所述感知损失函数的值、风格损失函数的值和Contextual损失函数的值，对所述第一编码器、第二编码器、注意力模型和解码器的网络参数进行调整，以使得所述感知损失函数的值、风格损失函数的值和Contextual损失函数的值收敛。Based on the above technical solution, the method further includes: when training the first encoder, the second encoder, the attention model and the decoder, respectively determining the face image to be edited and the target image The value of the perceptual loss function and the value of the style loss function, and determine the value of the contextual loss function of the target image and the reference image; according to the value of the perceptual loss function, the value of the style loss function and the value of the contextual loss function value, the network parameters of the first encoder, the second encoder, the attention model and the decoder are adjusted so that the values of the perceptual loss function, the style loss function and the contextual loss function converge.

感知损失函数衡量两张图像之间的高维特征(例如整体空间结构)的相似性，而风格损失函数衡量两张图像之间的风格特征(例如颜色)的相似性。在对第一编码器、第二编码器、注意力模型和解码器进行训练的过程中，不断减小感知损失函数的值和风格损失函数的值，可以使得目标图像具有与待编辑人脸图像一致的风格特征，看起来自然真实。The perceptual loss function measures the similarity of high-dimensional features (such as overall spatial structure) between two images, while the style loss function measures the similarity of style features (such as color) between two images. In the process of training the first encoder, the second encoder, the attention model and the decoder, continuously reducing the value of the perceptual loss function and the value of the style loss function can make the target image have the same characteristics as the face image to be edited. Consistent style features that look natural and authentic.

Contextual损失函数用于衡量非对称图像之间的相似性，该损失在本方案中保证生成的目标图像中的目标属性与参考图像的目标属性形状一致。The contextual loss function is used to measure the similarity between asymmetric images. In this scheme, the loss ensures that the target attribute in the generated target image is consistent with the target attribute shape of the reference image.

在本申请的一个实施例中，所述感知损失函数和风格损失函数分别表示如下：In an embodiment of the present application, the perceptual loss function and the style loss function are respectively expressed as follows:

其中，l_perc为感知损失函数，l_style为风格损失函数，I_g表示目标图像，I_s表示待编辑人脸图像，I_c表示缺失图像，

表示待编辑人脸图像中目标属性区域对应的掩码图，C_l、H_l和W_l分别表示通道数、高度和宽度，‖·‖₁表示1范数，

表示预设网络第l层的特征图，G_l(·)＝Φ_l(·)^TΦ_l(·)表示Gram矩阵。Among them, l _perc is the perceptual loss function, l _style is the style loss function, I _g represents the target image, I _s represents the face image to be edited, I _c represents the missing image,

represents the mask map corresponding to the target attribute area in the face image to be edited, C _l , H _l and W _l represent the number of channels, height and width respectively, ‖·‖ ₁ represents the 1 norm,

Represents the feature map of the first layer of the preset network, G _l (·)=Φ _l (·) ^T Φ _l (·) represents the Gram matrix.

表示待编辑人脸图像中目标属性区域对应的掩码图，或者是对待编辑人脸图像中目标属性区域对应的掩码图进行腐蚀和膨胀处理后的掩码图。在进行训练时，待编辑人脸图像为训练样本。预设网络可以是VGG-19网络，当然，也可以是其他网络。VGG-19网络是在ImageNet上预训练的卷积神经网络。

Indicates the mask map corresponding to the target attribute region in the face image to be edited, or the mask map corresponding to the target attribute region in the face image to be edited after etching and dilation processing. During training, the face images to be edited are training samples. The preset network can be the VGG-19 network, of course, can also be other networks. The VGG-19 network is a convolutional neural network pretrained on ImageNet.

按照上述公式在计算感知损失函数的值时，将针对每个样本生成的目标图像I_g输入VGG-19网络，获取第l层的特征图Φ_l(I_g)，将样本的原图像即待编辑人脸图像I_s输入VGG-19网络，获取第l层的特征图Φ_l(I_s)，之后再计算目标图像的特征图Φ_l(I_g)与待编辑人脸图像的特征图Φ_l(I_s)之差的1范数，将计算结果除以通道数、高度和宽度的乘积，得到第l层特征图对应的感知损失函数的值，对VGG-19网络每层特征图对应的感知损失函数的值进行累加，得到一次训练对应的感知损失函数的值。When calculating the value of the perceptual loss function according to the above formula, the target image I _g generated for each sample is input into the VGG-19 network, the feature map Φ _l (I _g ) of the lth layer is obtained, and the original image of the sample is to be Edit the face image I _s and input it into the VGG-19 network to obtain the feature map Φ _l (I _s ) of the lth layer, and then calculate the feature map Φ _l (I _g ) of the target image and the feature map Φ of the face image to be edited. _l (I _s ) is the 1 norm of the difference, divide the calculation result by the product of the number of channels, height and width to obtain the value of the perceptual loss function corresponding to the feature map of the lth layer, which corresponds to the feature map of each layer of the VGG-19 network. The value of the perceptual loss function is accumulated to obtain the value of the perceptual loss function corresponding to one training.

按照上述公式在计算感知损失函数的值时，将样本生成的目标图像I_g与待编辑人脸图像中目标属性区域对应的掩码图

的乘积输入VGG-19网络，获取第一层的特征图

并计算该特征图的Gram矩阵

将缺失图像I_c输入VGG-19网络，获取第一层的特征图Φ_l(I_c)，计算该特征图的Gram矩阵G_l(I_c)，计算两个Gram矩阵之差即

除以通道数、高度和宽度的乘积的结果，求取该结果的1范数，并除以通道数与通道数的乘积，得到第l层特征图对应的风格损失函数的值，对VGG-19网络每层特征图对应的风格损失函数的值进行累加，得到一次训练对应的风格损失函数的值。When calculating the value of the perceptual loss function according to the above formula, the target image I _g generated by the sample and the mask map corresponding to the target attribute area in the face image to be edited

The product of is input into the VGG-19 network to obtain the feature map of the first layer

and calculate the Gram matrix of this feature map

Input the missing image I _c into the VGG-19 network, obtain the feature map Φ _l (I _c ) of the first layer, calculate the Gram matrix G _l (I _c ) of the feature map, and calculate the difference between the two Gram matrices namely

Divide by the result of the product of the number of channels, height and width, obtain the 1 norm of the result, and divide by the product of the number of channels and the number of channels to obtain the value of the style loss function corresponding to the feature map of the lth layer. For VGG- 19 The value of the style loss function corresponding to the feature map of each layer of the network is accumulated to obtain the value of the style loss function corresponding to one training.

通过上述感知损失函数可以使得训练完成的网络得到的目标图像与待编辑人脸图像具有高维特征的相似性，通过上述风格损失函数可以使得训练完成的网络得到的目标图像与待编辑人脸图像具有风格特征之间的相似性，即可以使得目标图像可以保留待编辑人脸图像中的姿势或肤色等风格特征。The above perceptual loss function can make the target image obtained by the trained network and the face image to be edited have high-dimensional similarity, and the above style loss function can make the target image obtained by the trained network and the face image to be edited. Having similarity between style features means that the target image can retain style features such as pose or skin color in the face image to be edited.

在本申请的一个实施例中，所述Contextual损失函数表示如下：In an embodiment of the present application, the contextual loss function is expressed as follows:

其中，l_cx表示Contextual损失函数，I_g表示目标图像，

表示待编辑人脸图像中目标属性区域对应的掩码图，I_r表示参考图像，

表示参考图像中与所述目标属性区域对应区域的掩码图，

表示预设网络第l层的特征图，CX(x,y)表示输入图像x和结果图像y之间的Contextual相似度，计算公式如下：Among them, l _cx represents the contextual loss function, I _g represents the target image,

represents the mask map corresponding to the target attribute area in the face image to be edited, I _r represents the reference image,

represents the mask map of the region corresponding to the target attribute region in the reference image,

Represents the feature map of the first layer of the preset network, CX(x, y) represents the contextual similarity between the input image x and the result image y, and the calculation formula is as follows:

输入图像x和结果图像y可以分别表示为特征的集合：

和

N表示特征的个数，max_i CX_ij表示对于每个特征y_j在集合X中找到与y_j相似度最高的特征x_i并计算y_j与x_i之间的相似度。The input image x and the result image y can be represented as collections of features, respectively:

and

N represents the number of features, and max _i CX _ij represents that for each feature y _{j, find the feature xi with the highest similarity with y j} _in _the set X and calculate the similarity between y _j and _xi .

其中，预设网络可以是VGG-19网络，当然，也可以是其他网络。Wherein, the preset network may be the VGG-19 network, and of course, may also be other networks.

在计算Contextual损失函数的值时，将目标图像与待编辑人脸图像中目标属性区域对应的掩码图的乘积

输入VGG-19网络，获取第l层的特征图

并将参考图像与参考图像中与目标属性区域对应区域的掩码图的乘积

输入VGG-19网络，获取第l层的特征图

计算特征图

与

之间的Contextual相似度，在上述计算Contextual相似度的公式中，

并对

与

之间的Contextual相似度求对数，得到Contextual损失函数的值。通过Contextual损失函数可以使得训练完成的网络得到的目标图像中的目标属性与参考图像中的目标属性的形状一致，可以保证将参考图像中的目标属性较好的迁移到待编辑人脸图像的目标属性区域中。When calculating the value of the contextual loss function, the product of the target image and the mask map corresponding to the target attribute area in the face image to be edited

Enter the VGG-19 network to obtain the feature map of the lth layer

and multiply the reference image with the mask map of the region corresponding to the target attribute region in the reference image

Enter the VGG-19 network to obtain the feature map of the lth layer

Calculate the feature map

and

The Contextual similarity between , in the above formula for calculating the Contextual similarity,

and to

and

Take the logarithm of the Contextual similarity between them to get the value of the Contextual loss function. Through the contextual loss function, the shape of the target attribute in the target image obtained by the trained network can be consistent with the shape of the target attribute in the reference image, which can ensure that the target attribute in the reference image is better migrated to the target of the face image to be edited. in the properties area.

实施例二Embodiment 2

本实施例提供的一种图像编辑装置，如图3所示，所述图像编辑装置300包括：An image editing apparatus provided by this embodiment, as shown in FIG. 3 , the image editing apparatus 300 includes:

图像获取模块310，用于获取待编辑人脸图像和与待编辑人脸图像中的目标属性区域对应的掩码图，并获取参考图像；An image acquisition module 310, configured to acquire a face image to be edited and a mask map corresponding to a target attribute region in the face image to be edited, and acquire a reference image;

图像缺失处理模块320，用于根据所述掩码图，将所述待编辑人脸图像处理为缺失所述目标属性区域的缺失图像；An image missing processing module 320, configured to process the to-be-edited face image as a missing image missing the target attribute area according to the mask map;

第一编码模块330，用于通过第一编码器对所述缺失图像进行图像编码，得到所述缺失图像对应的缺失特征；A first encoding module 330, configured to perform image encoding on the missing image by the first encoder to obtain the missing feature corresponding to the missing image;

第二编码模块340，用于通过第二编码器对所述参考图像进行图像编码，得到所述参考图像对应的参考特征；A second encoding module 340, configured to perform image encoding on the reference image through a second encoder to obtain a reference feature corresponding to the reference image;

特征融合模块350，用于根据所述掩码图，通过注意力模型对所述缺失特征和参考特征进行融合，得到融合特征；A feature fusion module 350, configured to fuse the missing feature and the reference feature through an attention model according to the mask map to obtain a fused feature;

解码模块360，用于通过解码器对所述融合特征进行图像解码，得到所述待编辑人脸图像和参考图像对应的目标图像。The decoding module 360 is configured to perform image decoding on the fusion feature through a decoder to obtain a target image corresponding to the face image to be edited and the reference image.

可选的，所述特征融合模块包括：Optionally, the feature fusion module includes:

注意力图生成单元，用于生成所述缺失特征对应的注意力图；an attention map generation unit, used to generate an attention map corresponding to the missing feature;

第一特征融合单元，用于融合所述缺失特征和所述注意力图，得到所述缺失特征对应的自注意特征；a first feature fusion unit, configured to fuse the missing feature and the attention map to obtain a self-attention feature corresponding to the missing feature;

特征生成单元，用于根据所述掩码图、注意力图和参考特征，生成所述参考特征对所述缺失特征的实例指导特征；a feature generation unit, configured to generate an instance guidance feature of the reference feature to the missing feature according to the mask map, the attention map and the reference feature;

第二特征融合单元，用于融合所述自注意特征和所述实例指导特征，得到所述缺失特征和参考特征的融合特征。A second feature fusion unit, configured to fuse the self-attention feature and the instance guidance feature to obtain a fusion feature of the missing feature and the reference feature.

可选的，所述特征生成单元具体用于：Optionally, the feature generation unit is specifically used for:

融合所述参考特征和所述注意力图，得到参考注意特征；fusing the reference feature and the attention map to obtain the reference attention feature;

将所述掩码图和所述参考注意特征进行相乘运算，得到所述目标属性区域以外的区域对应的实例指导特征，将所述目标属性区域以外的区域对应的实例指导特征作为第一实例指导特征；Multiply the mask map and the reference attention feature to obtain the instance guidance feature corresponding to the area outside the target attribute area, and use the instance guidance feature corresponding to the area outside the target attribute area as the first instance guiding features;

计算1矩阵与所述掩码图之差，得到所述掩码图的反掩码图，将所述反掩码图和所述参考特征进行相乘运算，得到所述目标属性区域对应的实例指导特征，将所述目标属性区域对应的实例指导特征作为第二实例指导特征；Calculate the difference between the 1 matrix and the mask map, obtain the inverse mask map of the mask map, and multiply the inverse mask map and the reference feature to obtain the instance corresponding to the target attribute area Guidance feature, taking the instance guidance feature corresponding to the target attribute region as the second instance guidance feature;

将所述第一实例指导特征和所述第二实例指导特征进行相加运算，得到所述参考特征对所述缺失特征的实例指导特征。The first instance guidance feature and the second instance guidance feature are added to obtain an instance guidance feature of the reference feature for the missing feature.

可选的，所述装置还包括：Optionally, the device further includes:

损失值确定模块，用于在对所述第一编码器、第二编码器、注意力模型和解码器进行训练时，分别确定所述待编辑人脸图像与目标图像的感知损失函数的值和风格损失函数的值，并确定所述目标图像与所述参考图像的Contextual损失函数的值；The loss value determination module is used for respectively determining the value of the perceptual loss function of the face image to be edited and the target image and the the value of the style loss function, and determine the value of the contextual loss function of the target image and the reference image;

网络参数调整模块，用于根据所述感知损失函数的值、风格损失函数的值和Contextual损失函数的值，对所述第一编码器、第二编码器、注意力模型和解码器的网络参数进行调整，以使得所述感知损失函数的值、风格损失函数的值和Contextual损失函数的值收敛。A network parameter adjustment module, configured to adjust the network parameters of the first encoder, the second encoder, the attention model and the decoder according to the value of the perceptual loss function, the value of the style loss function and the value of the contextual loss function Adjustments are made such that the values of the perceptual loss function, the style loss function, and the contextual loss function converge.

可选的，所述感知损失函数和风格损失函数分别表示如下：Optionally, the perceptual loss function and the style loss function are respectively expressed as follows:

可选的，所述Contextual损失函数表示如下：Optionally, the contextual loss function is expressed as follows:

其中，l_cx表示Contextual损失函数，I_g表示目标图像，

表示参考图像中与所述目标属性区域对应区域的掩码图，

输入图像x和结果图像y可以分别表示为特征的集合：

和

and

本申请实施例提供的图像编辑装置，用于实现本申请实施例一中所述的图像编辑方法的各步骤，装置的各模块的具体实施方式参见相应步骤，此处不再赘述。The image editing apparatus provided in this embodiment of the present application is used to implement the steps of the image editing method described in the first embodiment of the present application. For the specific implementation of each module of the apparatus, refer to the corresponding steps, which will not be repeated here.

本申请实施例提供的图像编辑装置，通过图像获取模块获取待编辑人脸图像和待编辑人脸图像中的目标属性区域对应的掩码图，并获取参考图像，图像抠图模块根据掩码图将待编辑人脸图像处理为缺失目标属性区域的缺失图像，第一编码模块通过第一编码器对缺失图像进行图像编码得到缺失图像对应的缺失特征，第二编码模块通过第二编码器对参考图像进行图像编码得到参考图像对应的参考特征，特征融合模块根据掩码图，通过注意力模型将缺失特征和参考特征融合为融合特征，解码模块通过解码器对融合特征进行图像解码，得到目标图像，由于只需要获取到待编辑人脸图像和目标属性区域对应的掩码图，即可将参考图像中的目标属性区域对应的特征迁移到待编辑人脸图像的目标属性区域中，从而可以迁移各个区域的特征，解决了现有技术中的ExGANs只能编辑眼睛区域的缺陷，而且参考图像可以是与待编辑人脸图像不同身份的图像，从而提高了人脸属性编辑的多样性，而且不需要将目标属性区域与其他区域进行解耦，避免了对目标属性区域外无关区域的影响。In the image editing device provided by the embodiment of the present application, the image acquisition module acquires the mask image corresponding to the face image to be edited and the target attribute region in the face image to be edited, and acquires the reference image, and the image matting module obtains the mask image according to the mask image. The face image to be edited is processed as a missing image missing the target attribute area, the first encoding module performs image encoding on the missing image through the first encoder to obtain the missing feature corresponding to the missing image, and the second encoding module uses the second encoder to refer to the reference. The image is encoded by the image to obtain the reference feature corresponding to the reference image. The feature fusion module fuses the missing feature and the reference feature into the fusion feature through the attention model according to the mask map. The decoding module decodes the fusion feature through the decoder to obtain the target image. , because only the mask map corresponding to the face image to be edited and the target attribute area needs to be obtained, the features corresponding to the target attribute area in the reference image can be migrated to the target attribute area of the face image to be edited, so that the migration can be performed. The features of each area solve the defect that ExGANs in the prior art can only edit the eye area, and the reference image can be an image with a different identity from the face image to be edited, thereby improving the diversity of face attribute editing, and it is not The target attribute area needs to be decoupled from other areas to avoid the influence on irrelevant areas outside the target attribute area.

实施例三Embodiment 3

本申请实施例还提供了一种电子设备，如图4所示，该电子设备400可以包括一个或多个处理器410以及与处理器410连接的一个或多个存储器420。电子设备400还可以包括输入接口430和输出接口440，用于与另一装置或系统进行通信。被处理器410执行的程序代码可存储在存储器420中。This embodiment of the present application further provides an electronic device. As shown in FIG. 4 , the electronic device 400 may include one or more processors 410 and one or more memories 420 connected to the processors 410 . The electronic device 400 may also include an input interface 430 and an output interface 440 for communicating with another device or system. Program code executed by processor 410 may be stored in memory 420 .

电子设备400中的处理器410调用存储在存储器420的程序代码，以执行上述实施例中的图像编辑方法。The processor 410 in the electronic device 400 invokes the program code stored in the memory 420 to execute the image editing method in the above-mentioned embodiment.

上述电子设备中的上述元件可通过总线彼此连接，总线例如数据总线、地址总线、控制总线、扩展总线和局部总线之一或其任意组合。The above-mentioned elements in the above-mentioned electronic device may be connected to each other by a bus such as one or any combination of a data bus, an address bus, a control bus, an expansion bus and a local bus.

本申请实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本申请实施例一所述的图像编辑方法的步骤。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of the image editing method described in the first embodiment of the present application.

本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。对于装置实施例而言，由于其与方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments may be referred to each other. As for the apparatus embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts.

以上对本申请实施例提供的一种图像编辑方法、装置、电子设备及存储介质进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The image editing method, device, electronic device, and storage medium provided by the embodiments of the present application have been described in detail above. The principles and implementations of the present application are described with specific examples. The descriptions of the above embodiments are only used for In order to help understand the method of the present application and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope. In summary, this specification The content should not be construed as a limitation on this application.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件实现。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

Claims

1. An image editing method, comprising:

acquiring a face image to be edited and a mask image corresponding to a target attribute region in the face image to be edited, and acquiring a reference image;

processing the face image to be edited into a missing image missing the target attribute region according to the mask image;

carrying out image coding on the missing image through a first encoder to obtain the missing characteristics corresponding to the missing image;

carrying out image coding on the reference image through a second encoder to obtain reference characteristics corresponding to the reference image;

fusing the missing features and the reference features through an attention model according to the mask image to obtain fused features;

and decoding the image of the fusion characteristic through a decoder to obtain the target image corresponding to the face image to be edited and the reference image.

2. The method according to claim 1, wherein the fusing the missing feature and the reference feature by the attention model according to the mask map to obtain a fused feature comprises:

generating an attention map corresponding to the missing feature;

fusing the missing feature and the attention map to obtain a self-attention feature corresponding to the missing feature;

generating an example guide feature of the reference feature to the missing feature according to the mask map, the attention map and the reference feature;

and fusing the self-attention feature and the example guidance feature to obtain a fused feature of the missing feature and the reference feature.

3. The method of claim 2, wherein the generating an instance guideline feature of the reference feature for the missing feature from the mask map, an attention map, and a reference feature comprises:

fusing the reference feature and the attention map to obtain a reference attention feature;

multiplying the mask image and the reference attention feature to obtain an example guidance feature corresponding to a region outside the target attribute region, and taking the example guidance feature corresponding to the region outside the target attribute region as a first example guidance feature;

calculating the difference between the matrix 1 and the mask map to obtain a mask map of the mask map, multiplying the mask map and the reference feature to obtain an example guidance feature corresponding to the target attribute region, and taking the example guidance feature corresponding to the target attribute region as a second example guidance feature;

and adding the first example guide characteristic and the second example guide characteristic to obtain the example guide characteristic of the reference characteristic to the missing characteristic.

4. The method according to any one of claims 1-3, further comprising:

when the first encoder, the second encoder, the attention model and the decoder are trained, respectively determining a value of a perception loss function and a value of a style loss function of the face image to be edited and the target image, and determining a value of a context loss function of the target image and the reference image;

and adjusting the network parameters of the first encoder, the second encoder, the attention model and the decoder according to the value of the perceptual loss function, the value of the style loss function and the value of the context loss function, so that the value of the perceptual loss function, the value of the style loss function and the value of the context loss function are converged.

5. The method of claim 4, wherein the perceptual and stylistic loss functions are each expressed as follows:

wherein l_percAs a function of perceptual loss,/_styleAs a style loss function, I_gRepresenting the target image, I_sRepresenting a face image to be edited, I_cA missing image is represented by a number of images,

representing a mask map corresponding to the target attribute region in the face image to be edited, C_l、H_lAnd W_lRepresenting the number, height and width of the channels, |₁The expression is given in the 1 norm,

characteristic diagram, G, representing the l-th layer of the default network_l(·)＝Φ_l(·)^TΦ_l(. cndot.) denotes a Gram matrix.

6. The method of claim 4, wherein the context loss function is represented as follows:

wherein l_cxRepresenting a Contextual loss function, I_gA representation of the target image is shown,

representing a mask map corresponding to the target attribute region in the face image to be edited, I_rWhich represents a reference image, is shown,

a mask map representing a region in the reference image corresponding to the target attribute region,

the characteristic diagram of the l-th layer of the preset network is represented, CX (x, y) represents the context similarity between the input image x and the result image y, and the calculation formula is as follows:

the input image x and the resulting image y may be represented as a set of features, respectively:

and

n denotes the number of features, max_iCX_ijRepresentation for each feature y_jFind y in the set X_jFeature x with highest similarity_iAnd calculate y_jAnd x_iThe similarity between them.

7. The method according to claim 1, wherein the processing the face image to be edited into a missing image missing the target attribute region according to the mask map comprises:

carrying out corrosion and expansion treatment on the mask image to obtain a treated mask image;

and removing the target attribute region in the face image to be edited according to the processed mask image to obtain a missing image.

8. An image editing apparatus characterized by comprising:

the image acquisition module is used for acquiring a face image to be edited and a mask image corresponding to a target attribute region in the face image to be edited and acquiring a reference image;

the image missing processing module is used for processing the face image to be edited into a missing image missing the target attribute region according to the mask image;

the first coding module is used for carrying out image coding on the missing image through a first coder to obtain the missing characteristic corresponding to the missing image;

the second coding module is used for carrying out image coding on the reference image through a second coder to obtain the reference characteristics corresponding to the reference image;

the feature fusion module is used for fusing the missing features and the reference features through an attention model according to the mask image to obtain fusion features;

and the decoding module is used for carrying out image decoding on the fusion characteristics through a decoder to obtain the face image to be edited and a target image corresponding to the reference image.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image editing method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the image editing method as claimed in any one of claims 1 to 7.