CN117036185A

CN117036185A - Image processing method, device, electronic equipment and storage medium

Info

Publication number: CN117036185A
Application number: CN202310861882.XA
Authority: CN
Inventors: 程平; 吴松城
Original assignee: Xiamen Black Mirror Technology Co ltd
Current assignee: Xiamen Black Mirror Technology Co ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-11-10

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed with unbalanced illumination; encoding an image to be processed based on a preset encoder neural network to obtain a plurality of groups of feature latent codes belonging to different scales, wherein the preset encoder neural network adopts a feature pyramid structure based on a channel attention mechanism and a space attention mechanism, and each scale corresponds to different levels in the feature pyramid structure; inputting each group of characteristic latent codes into a pre-trained StyleGAN generator, acquiring a target image with balanced illumination according to the output result of the StyleGAN generator, and coding the image to be processed into a plurality of groups of characteristic latent codes conforming to a characteristic pyramid structure by introducing a channel attention mechanism and a space attention mechanism, so that the StyleGAN generator directly completes image reconstruction according to the characteristic latent codes, thereby improving the efficiency and accuracy of carrying out illumination equalization processing on the image.

Description

An image processing method, device, electronic equipment and storage medium

技术领域Technical field

本申请涉及计算机技术领域，更具体地，涉及一种图像处理方法、装置、电子设备和存储介质。The present application relates to the field of computer technology, and more specifically, to an image processing method, device, electronic device and storage medium.

背景技术Background technique

在图像存在光照不均衡时，图像中会带有阴影、高光等不均衡光照，使得图像显示异常(如“阴阳脸”等)。目前，在解决图像光照不均衡问题时，采用的方案主要有：When there is uneven illumination in the image, the image will have unbalanced illumination such as shadows and highlights, causing the image to display abnormally (such as a "yin-yang face", etc.). At present, when solving the problem of uneven illumination in images, the main solutions used are:

人工手动基于photoshop等工具处理。该方法可以针对不同图像进行有效处理，但需要人工操作，效率低。Manual processing based on tools such as photoshop. This method can effectively process different images, but it requires manual operation and is inefficient.

基于传统的图像处理方案，如先进行基于滤波算法的背景建模，再进行像素灰度层面的运算实现图像的光照均衡化。传统的技术无法针对性调整图像亮度分布、对比度和饱和度，即难以有效提高图像中的暗区域，又难以防止原先偏亮的区域产生过曝，最终导致处理后的图像中信息损失，不自然。Based on traditional image processing solutions, for example, background modeling based on filtering algorithms is first performed, and then operations at the pixel gray level are performed to achieve illumination equalization of the image. Traditional technology cannot adjust the brightness distribution, contrast and saturation of the image in a targeted manner. It is difficult to effectively improve the dark areas in the image, and it is difficult to prevent the originally bright areas from being overexposed, which ultimately leads to the loss of information in the processed image and makes it unnatural. .

因此，如何高效准确的对图像进行光照均衡化处理，是目前有待解决的技术问题。Therefore, how to perform illumination equalization processing on images efficiently and accurately is a technical problem that needs to be solved.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only used to enhance understanding of the background of the present disclosure, and therefore may include information that does not constitute prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本申请实施例提出了一种图像处理方法、装置、电子设备和存储介质，引入通道注意力机制和空间注意力机制，将待处理图像编码为符合特征金字塔结构的多组特征潜码，使StyleGAN生成器根据特征潜码直接完成图像重建，用以实现高效准确的对图像进行光照均衡化处理。The embodiment of this application proposes an image processing method, device, electronic device and storage medium, which introduces a channel attention mechanism and a spatial attention mechanism to encode the image to be processed into multiple sets of feature latent codes that conform to the feature pyramid structure, making StyleGAN The generator directly completes image reconstruction based on the feature latent code to achieve efficient and accurate illumination equalization of the image.

第一方面，提供一种图像处理方法，所述方法包括：获取光照不均衡的待处理图像；基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，所述预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各所述尺度分别对应所述特征金字塔结构中的不同层级；将各组所述特征潜码输入预训练的StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取光照均衡的目标图像。In a first aspect, an image processing method is provided. The method includes: acquiring an image to be processed with unbalanced illumination; encoding the image to be processed based on a preset encoder neural network to obtain multiple sets of feature latent features belonging to different scales. code, wherein the preset encoder neural network adopts a feature pyramid structure based on a channel attention mechanism and a spatial attention mechanism, and each scale corresponds to a different level in the feature pyramid structure; each group of features The latent code is input into the pre-trained StyleGAN generator, and the illumination-balanced target image is obtained according to the output result of the StyleGAN generator.

第二方面，提供一种图像处理装置，所述装置包括：获取模块，用于获取光照不均衡的待处理图像；编码模块，用于基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，所述预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各所述尺度分别对应所述特征金字塔结构中的不同层级；生成模块，用于将各组所述特征潜码输入预训练的StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取光照均衡的目标图像。In a second aspect, an image processing device is provided. The device includes: an acquisition module for acquiring an image to be processed with unbalanced illumination; and an encoding module for encoding the image to be processed based on a preset encoder neural network. , obtain multiple sets of feature latent codes belonging to different scales, wherein the preset encoder neural network adopts a feature pyramid structure based on a channel attention mechanism and a spatial attention mechanism, and each of the scales corresponds to the feature pyramid structure. Different levels; a generation module, used to input each group of feature latent codes into a pre-trained StyleGAN generator, and obtain a target image with balanced lighting according to the output result of the StyleGAN generator.

第三方面，提供一种电子设备，包括：处理器；以及存储器，用于存储所述处理器的可执行指令；其中，所述处理器配置为经由执行所述可执行指令来执行第一方面所述的图像处理方法。In a third aspect, an electronic device is provided, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the first aspect via executing the executable instructions. The image processing method described.

第四方面，提供一种计算机可读存储介质，其上存储有计算机程序所述计算机程序被处理器执行时实现第一方面所述的图像处理方法。A fourth aspect provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the image processing method described in the first aspect is implemented.

通过应用以上技术方案，先获取光照不均衡的待处理图像，再基于预设编码器神经网络对待处理图像进行编码，获取多组属于不同尺度的特征潜码，最后将各组特征潜码输入预训练的StyleGAN生成器，根据StyleGAN生成器的输出结果获取光照均衡的目标图像，预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各尺度分别对应特征金字塔结构中的不同层级，以此通过引入通道注意力机制和空间注意力机制，将待处理图像编码为符合特征金字塔结构的多组特征潜码，使StyleGAN生成器根据特征潜码直接完成图像重建，从而提高了对图像进行光照均衡化处理的效率和准确性。By applying the above technical solution, the image to be processed with unbalanced illumination is first obtained, and then the image to be processed is encoded based on the preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales, and finally each set of feature latent codes is input into the pre-processed image. The trained StyleGAN generator obtains a well-illuminated target image based on the output of the StyleGAN generator. The preset encoder neural network adopts a feature pyramid structure based on the channel attention mechanism and the spatial attention mechanism. Each scale corresponds to the feature pyramid structure. At different levels, by introducing the channel attention mechanism and the spatial attention mechanism, the image to be processed is encoded into multiple sets of feature latent codes that conform to the feature pyramid structure, so that the StyleGAN generator can directly complete image reconstruction based on the feature latent codes, thereby improving It improves the efficiency and accuracy of illumination equalization processing of images.

附图说明Description of the drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.

图1示出了本发明实施例提出的一种图像处理方法的流程示意图；Figure 1 shows a schematic flow chart of an image processing method proposed by an embodiment of the present invention;

图2示出了本发明实施例中训练生成预设编码器神经网络的流程示意图；Figure 2 shows a schematic flow chart of training and generating a preset encoder neural network in an embodiment of the present invention;

图3示出了本发明实施例中对待处理图像进行编码的流程示意图；Figure 3 shows a schematic flow chart of encoding an image to be processed in an embodiment of the present invention;

图4示出了本发明另一实施例提出的一种图像处理方法的原理示意图；Figure 4 shows a schematic diagram of the principle of an image processing method proposed by another embodiment of the present invention;

图5示出了本发明实施例提出的一种图像处理装置的结构示意图；Figure 5 shows a schematic structural diagram of an image processing device proposed by an embodiment of the present invention;

图6示出了本发明实施例提出的一种电子设备的结构示意图。FIG. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

需要说明的是，本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本申请的真正范围和精神由权利要求部分指出。It should be noted that those skilled in the art will easily come up with other embodiments of the present application after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the claims.

应当理解的是，本申请并不局限于下面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described below and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

本申请可用于众多通用或专用的计算装置环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器装置、包括以上任何装置或设备的分布式计算环境等等。The present application may be used in numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor devices, distributed computing environments including any of the above devices or devices, etc.

本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本申请实施例提供一种图像处理方法，如图1所示，该方法包括以下步骤：An embodiment of the present application provides an image processing method, as shown in Figure 1. The method includes the following steps:

步骤S101，获取光照不均衡的待处理图像。Step S101: Obtain an image to be processed with unbalanced illumination.

光照均衡是指图像各部分亮度的差异在预设范围内，可以根据实际应用场景设置参数来确定图像是否光照均衡，如图像中各像素的平均值或标准差等。相应的，光照不均衡的待处理图像为图像各部分亮度差异超出预设范围的图像。待处理图像可以是实时拍摄的图像，也可以是从本地存储路径获取的图像，还可以是从云端存储或其他服务器获取的图像。待处理图像中的显示对象可以是人脸、动物、风景、虚拟形象、建筑物等，待处理图像的格式也可以为任意图像格式，本申请实施例对此不做限定。Illumination balance means that the difference in brightness of each part of the image is within a preset range. Parameters can be set according to the actual application scenario to determine whether the image is illumination balanced, such as the average or standard deviation of each pixel in the image. Correspondingly, the image to be processed with unbalanced illumination is an image in which the brightness difference of each part of the image exceeds the preset range. The image to be processed can be an image captured in real time, an image obtained from a local storage path, or an image obtained from cloud storage or other servers. The display objects in the image to be processed can be human faces, animals, landscapes, avatars, buildings, etc., and the format of the image to be processed can also be any image format, which is not limited in the embodiments of the present application.

可以理解的是，在一些实施例中，光照不均衡的待处理图像为用户上传或选择的图像时，即使该图像不存在光照不均衡的问题，在获取该图像后，仍将该图像作为光照不均衡的待处理图像进行处理，本申请实施例对待处理图像是否真正存在光照不均衡并不做判断。It can be understood that in some embodiments, when the image to be processed with uneven illumination is an image uploaded or selected by the user, even if the image does not have the problem of uneven illumination, after the image is obtained, the image is still used as the illumination The unbalanced image to be processed is processed, and the embodiment of the present application does not judge whether the image to be processed actually has illumination imbalance.

在本申请一些实施例中，所述待处理图像为人脸图像，所述获取光照不均衡的待处理图像，包括：In some embodiments of the present application, the image to be processed is a face image, and obtaining the image to be processed with unbalanced illumination includes:

获取包括人脸且光照不均衡的原始图像；Obtain raw images including faces and uneven lighting;

对所述原始图像进行人脸检测，获取待处理人脸图像；Perform face detection on the original image to obtain the face image to be processed;

对所述待处理人脸图像进行人脸对齐处理并归一化到预设尺寸，获取所述待处理图像。Perform face alignment processing on the face image to be processed and normalize it to a preset size to obtain the image to be processed.

本实施例中，待处理图像为光照不均衡的人脸图像，是对原始图像进行预处理后得到的。具体的，先通过对人脸进行拍摄或从用户指定的存储路径，获取包括人脸且光照不均衡的原始图像，然后基于预设人脸检测算法对原始图像进行人脸检测，获取待处理人脸图像，再对待处理人脸图像进行人脸对齐处理，最后将对齐后的人脸图像归一化到预设尺寸，例如，预设尺寸可以为256*256尺寸，得到待处理图像。In this embodiment, the image to be processed is a face image with unbalanced illumination, which is obtained by preprocessing the original image. Specifically, the original image including the face and unbalanced illumination is first obtained by photographing the face or from the storage path specified by the user, and then performs face detection on the original image based on the preset face detection algorithm to obtain the person to be processed. face image, then perform face alignment processing on the face image to be processed, and finally normalize the aligned face image to a preset size. For example, the preset size can be 256*256 to obtain the image to be processed.

其中，预设人脸检测算法可以为包括Haar cascade+opencv、HOG+Dlib、CNN+Dlib、SSD、MTCNN等算法中的任一种。人脸对齐处理包括基于特征点的对齐方式和基于模板匹配的对齐方式，其中，基于特征点的对齐方式，通过在不同图像中提取人脸的特征点，从而确定人脸在图像中的位置、角度、大小等参数，进而进行对齐。基于模板匹配的对齐方式通过在不同图像中提取人脸的模板，从而确定人脸在图像中的位置、角度、大小等参数，进而进行对齐。Among them, the preset face detection algorithm can be any one of the algorithms including Haar cascade+opencv, HOG+Dlib, CNN+Dlib, SSD, MTCNN, etc. Face alignment processing includes alignment based on feature points and alignment based on template matching. Among them, alignment based on feature points extracts feature points of the face in different images to determine the position of the face in the image. Angle, size and other parameters, and then align. The alignment method based on template matching extracts face templates from different images to determine the position, angle, size and other parameters of the face in the image, and then performs alignment.

对原始图像检测进行人脸检测得到待处理人脸图像后，进行人脸对齐及归一化处理，能够使预设编码器神经网络更加准确和高效对待处理图像进行编码，进而更加高效的进行光照均衡化处理。After performing face detection on the original image to obtain the face image to be processed, face alignment and normalization processing can make the preset encoder neural network more accurate and efficient in encoding the image to be processed, and then perform lighting more efficiently. Equalization processing.

步骤S102，基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，所述预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各所述尺度分别对应所述特征金字塔结构中的不同层级。Step S102: Encode the image to be processed based on a preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales. The preset encoder neural network uses a channel-based attention mechanism and spatial attention. The feature pyramid structure of the mechanism, each of the scales corresponds to different levels in the feature pyramid structure.

StyleGAN为基于风格的生成对抗网络，本实施例中，采用预设编码器神经网络和StyleGAN生成器构成pixel2style2pixel(简称pSp)框架进行图像处理，第一个pixel是指输入的待处理图像，style是指特征潜码，第二个pixel是指输出的目标图像，即将待处理图像先转成特征潜码，再将特征潜码生成目标图像。由于后续需要使用StyleGAN生成器生成目标图像，而StyleGAN生成器的输入需要为图像的特征潜码，因此先基于预设编码器神经网络将待处理图像转换为特征潜码。StyleGAN is a style-based generative adversarial network. In this embodiment, the preset encoder neural network and StyleGAN generator are used to form a pixel2style2pixel (pSp for short) framework for image processing. The first pixel refers to the input image to be processed, and style is refers to the feature latent code, and the second pixel refers to the output target image, that is, the image to be processed is first converted into a feature latent code, and then the feature latent code is generated into the target image. Since the StyleGAN generator needs to be used later to generate the target image, and the input of the StyleGAN generator needs to be the feature latent code of the image, the image to be processed is first converted into the feature latent code based on the preset encoder neural network.

特征潜码是一种特征向量，也可以称为是特征图，具体可以是多维的向量，且向量中的每个值均在[-1,1]的范围内，例如，可以是18*512的向量，该向量中的每个值都在[-1,1]的范围内。特征潜码也可以理解为基于神经网络从图像中提取的图像的特征，特征潜码能够代表图像，在特征潜码确定的情况下，基于该特征潜码生成的图像也是确定的，且在另一角度，特征潜码也可以理解为图像通过神经网络中的卷积层之后输出的向量。The feature latent code is a kind of feature vector, which can also be called a feature map. Specifically, it can be a multi-dimensional vector, and each value in the vector is in the range of [-1,1]. For example, it can be 18*512 A vector where each value in the vector is in the range [-1,1]. The feature latent code can also be understood as the characteristics of the image extracted from the image based on the neural network. The feature latent code can represent the image. When the feature latent code is determined, the image generated based on the feature latent code is also determined, and in another case From one perspective, the feature latent code can also be understood as the vector output after the image passes through the convolutional layer in the neural network.

预设编码器神经网络采用特征金字塔结构，特征金字塔结构包括不同层级，在编码过程中，不同层级之间引入通道注意力机制和空间注意力机制。通道注意力机制能在神经网络建模时显示不同通道特征图之间的相关性，在网络学习中自动获取每个特征通道的重要程度，最后为每个通道赋予不同的权重系数。空间注意力机制将图像中空间信息通过空间转换模块变换到另一个空间中并保留关键信息，为每个图像区域生成权重掩膜后加权输出，从而增强感兴趣的特定目标区域的同时，弱化了不相关的背景区域。The default encoder neural network adopts a feature pyramid structure, which includes different levels. During the encoding process, channel attention mechanisms and spatial attention mechanisms are introduced between different levels. The channel attention mechanism can display the correlation between different channel feature maps when modeling neural networks, automatically obtain the importance of each feature channel in network learning, and finally assign different weight coefficients to each channel. The spatial attention mechanism transforms the spatial information in the image into another space through the spatial transformation module and retains the key information. It generates a weight mask for each image area and then weights the output, thereby enhancing the specific target area of interest while weakening the Irrelevant background areas.

将待处理图像输入预设编码器神经网络，预设编码器神经网络基于通道注意力机制和空间注意力机制对待处理图像进行编码，并映射为多组属于不同尺度的特征潜码，各尺度分别对应特征金字塔结构中的不同层级，不同尺度可表示图像在不同粒度上的细节，例如，若待处理图像为人脸图像，可输出大、中、小三个尺度，对应从脸部轮廓、姿势到更细致的头发颜色等，相应的，特征金字塔结构也包括三个层级。Input the image to be processed into the preset encoder neural network. The preset encoder neural network encodes the image to be processed based on the channel attention mechanism and spatial attention mechanism, and maps it into multiple sets of feature latent codes belonging to different scales. Each scale is respectively Corresponding to different levels in the feature pyramid structure, different scales can represent the details of the image at different granularities. For example, if the image to be processed is a face image, three scales of large, medium and small can be output, corresponding to facial contours, postures and more. Detailed hair color, etc. Correspondingly, the feature pyramid structure also includes three levels.

属于同一尺度的一组特征潜码中可包括多层特征潜码，例如，若包括18层512维的特征潜码，小尺度可对应第0-2层，中尺度可对应第3-6层，大尺度可对应第7-18层。A set of feature latent codes belonging to the same scale can include multiple layers of feature latent codes. For example, if it includes 18 layers of 512-dimensional feature latent codes, the small scale can correspond to layers 0-2, and the medium scale can correspond to layers 3-6. , large scale can correspond to layers 7-18.

步骤S103，将各组所述特征潜码输入预训练的StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取光照均衡的目标图像。Step S103: Input each set of feature latent codes into a pre-trained StyleGAN generator, and obtain a target image with balanced illumination based on the output result of the StyleGAN generator.

StyleGAN生成器是预先训练好的，在获取多组特征潜码后，将各组特征潜码输入StyleGAN生成器，StyleGAN生成器根据特征潜码直接完成图像重建，输出光照均衡的目标图像。The StyleGAN generator is pre-trained. After obtaining multiple sets of feature latent codes, each set of feature latent codes is input into the StyleGAN generator. The StyleGAN generator directly completes image reconstruction based on the feature latent codes and outputs a target image with balanced lighting.

本申请实施例提供的图像处理方法，先获取光照不均衡的待处理图像，再基于预设编码器神经网络对待处理图像进行编码，获取多组属于不同尺度的特征潜码，最后将各组特征潜码输入预训练的StyleGAN生成器，根据StyleGAN生成器的输出结果获取光照均衡的目标图像，预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各尺度分别对应特征金字塔结构中的不同层级，通过引入通道注意力机制和空间注意力机制，将待处理图像编码为符合特征金字塔结构的多组特征潜码，使StyleGAN生成器根据特征潜码直接完成图像重建，从而提高了对图像进行光照均衡化处理的效率和准确性。The image processing method provided by the embodiment of the present application first obtains the image to be processed with unbalanced illumination, and then encodes the image to be processed based on the preset encoder neural network to obtain multiple groups of feature latent codes belonging to different scales. Finally, each group of features is The latent code is input to the pre-trained StyleGAN generator, and the illumination-balanced target image is obtained according to the output result of the StyleGAN generator. The preset encoder neural network adopts a feature pyramid structure based on the channel attention mechanism and the spatial attention mechanism, and each scale corresponds to At different levels in the feature pyramid structure, by introducing the channel attention mechanism and the spatial attention mechanism, the image to be processed is encoded into multiple sets of feature latent codes that conform to the feature pyramid structure, allowing the StyleGAN generator to directly complete image reconstruction based on the feature latent codes. This improves the efficiency and accuracy of illumination equalization processing of images.

在本申请任一实施例的基础上，在基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码之前，如图2所示，该方法还包括以下步骤：Based on any embodiment of the present application, before encoding the image to be processed based on the preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales, as shown in Figure 2, the method also includes Following steps:

步骤S21，获取多组样本图像，每组所述样本图像包括一个样本对象在光照不均衡情况下的第一图像和在光照均衡情况下的第二图像。Step S21: Acquire multiple groups of sample images, each group of sample images including a first image of a sample object under unbalanced illumination and a second image under balanced illumination.

本实施例中，对预设初始编码器神经网络进行训练，训练完成后，得到预设编码器神经网络。具体的，先获取多组样本图像，本领域技术人员可根据实际需要采用不同数量的样本图像，例如可以为10000组。每组样本图像中对应一个样本对象，包括一个样本对象在光照不均衡情况下的第一图像和在光照均衡情况下的第二图像，其中，第一图像用于作为预设初始编码器神经网络的输入，第二图像用于与预设初始编码器神经网络的输出结果进行比较，确定训练效果。In this embodiment, the preset initial encoder neural network is trained. After the training is completed, the preset encoder neural network is obtained. Specifically, multiple groups of sample images are first obtained. Those skilled in the art can use different numbers of sample images according to actual needs, for example, 10,000 groups. Each group of sample images corresponds to a sample object, including a first image of a sample object under unbalanced illumination and a second image under balanced illumination, where the first image is used as a preset initial encoder neural network As input, the second image is used to compare with the output of the preset initial encoder neural network to determine the training effect.

在本申请一些实施例中，所述样本对象为人脸，所述获取多组样本图像，包括：In some embodiments of this application, the sample object is a human face, and the acquisition of multiple groups of sample images includes:

获取多个原始样本图像：Get multiple original sample images:

对所述原始样本图像进行人脸检测，获取样本人脸图像；Perform face detection on the original sample image to obtain a sample face image;

对各所述样本人脸图像进行人脸对齐处理并归一化到预设尺寸，获取多组所述样本图像。Face alignment processing is performed on each of the sample face images and normalized to a preset size to obtain multiple groups of the sample images.

本实施例中，样本对象为人脸，通过对原始样本图像进行处理，得到多组样本图像，具体的，先基于预设人脸检测算法对原始样本图像进行人脸检测，获取样本人脸图像，然后对各样本人脸图像进行人脸对齐处理并归一化到预设尺寸，例如，预设尺寸为256*256尺寸，获取多组样本图像。In this embodiment, the sample object is a human face. By processing the original sample image, multiple groups of sample images are obtained. Specifically, face detection is first performed on the original sample image based on a preset face detection algorithm to obtain the sample face image. Then face alignment processing is performed on each sample face image and normalized to a preset size. For example, the preset size is 256*256 size, and multiple sets of sample images are obtained.

其中，预设人脸检测算法可以为包括Haar cascade+opencv、HOG+Dlib、CNN+Dlib、SSD、MTCNN等算法中的任一种。人脸对齐处理包括基于特征点的对齐方式和基于模板匹配的对齐方式，其中，基于特征点的对齐方式，通过在不同图像中提取人脸的特征点，从而确定人脸在图像中的位置、角度、大小等参数，进而进行对齐。基于模板匹配的对齐方式，通过在不同图像中提取人脸的模板，从而确定人脸在图像中的位置、角度、大小等参数，进而进行对齐。Among them, the preset face detection algorithm can be any one of the algorithms including Haar cascade+opencv, HOG+Dlib, CNN+Dlib, SSD, MTCNN, etc. Face alignment processing includes alignment based on feature points and alignment based on template matching. Among them, alignment based on feature points extracts feature points of the face in different images to determine the position of the face in the image. Angle, size and other parameters, and then align. The alignment method based on template matching extracts face templates from different images to determine the position, angle, size and other parameters of the face in the image, and then performs alignment.

对原始样本图像进行人脸检测得到样本人脸图像后，进行人脸对齐及归一化处理，可以更加高效的对预设初始编码器神经网络进行训练，并提高了得到的预设编码器神经网络的准确性。After face detection is performed on the original sample image to obtain the sample face image, face alignment and normalization processing can be performed to train the preset initial encoder neural network more efficiently and improve the obtained preset encoder neural network. Network accuracy.

可以理解的是，为了保证准确性，若样本对象为人脸，则待处理图像也为经过人脸检测、人脸对齐处理和归一化之后人脸图像。It can be understood that, in order to ensure accuracy, if the sample object is a human face, the image to be processed is also the face image after face detection, face alignment processing and normalization.

步骤S22，基于所述样本图像对采用所述特征金字塔结构的预设初始编码器神经网络进行训练，确定训练过程中的损失函数。Step S22: Train the preset initial encoder neural network using the feature pyramid structure based on the sample image, and determine the loss function during the training process.

预设初始编码器神经网络也采用基于通道注意力机制和空间注意力机制的特征金字塔结构，基于多组样本图像对预设初始编码器神经网络进行训练，并确定训练过程中的损失函数，具体的，损失函数可由预设初始编码器神经网络的输出结果与样本图像中的第二图像之间的差异确定。The preset initial encoder neural network also uses a feature pyramid structure based on the channel attention mechanism and the spatial attention mechanism. The preset initial encoder neural network is trained based on multiple sets of sample images, and the loss function during the training process is determined. Specifically , the loss function can be determined by the difference between the output result of the preset initial encoder neural network and the second image in the sample image.

在本申请一些实施例中，所述基于所述样本图像对采用所述特征金字塔结构的预设初始编码器神经网络进行训练，确定训练过程中的损失函数，包括：In some embodiments of the present application, training a preset initial encoder neural network using the feature pyramid structure based on the sample image, and determining the loss function during the training process includes:

将所述第一图像输入所述预设初始编码器神经网络，基于预设初始编码器神经网络对所述第一图像进行编码，获取分别与各所述尺度对应的多组样本特征潜码；Input the first image into the preset initial encoder neural network, encode the first image based on the preset initial encoder neural network, and obtain multiple sets of sample feature latent codes corresponding to each of the scales;

将各组所述样本特征潜码输入所述StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取第三图像；Input the sample feature latent code of each group into the StyleGAN generator, and obtain the third image according to the output result of the StyleGAN generator;

根据所述第三图像与所述第二图像之间的差异确定所述损失函数。The loss function is determined based on the difference between the third image and the second image.

本实施例中，预先将预设初始编码器神经网络接入预训练的StyleGAN生成器，先将光照不均衡的第一图像输入预设初始编码器神经网络，由预设初始编码器神经网络进行编码处理，得到多组分别与各尺度对应的多组样本特征潜码，然后将多组样本特征潜码输入StyleGAN生成器，由StyleGAN生成器生成与样本特征潜码对应的第三图像，最后将第三图像与第二图像进行比较，根据两者的差异确定所述损失函数，从而使损失函数更加真实的反映预设初始编码器神经网络的训练效果，提高了损失函数的准确性。In this embodiment, the preset initial encoder neural network is connected to the pre-trained StyleGAN generator in advance, and the first image with unbalanced illumination is first input into the preset initial encoder neural network, and the preset initial encoder neural network performs the processing. Encoding process, obtain multiple sets of sample feature latent codes corresponding to each scale, and then input the multiple sets of sample feature latent codes into the StyleGAN generator, and the StyleGAN generator generates a third image corresponding to the sample feature latent code, and finally The third image is compared with the second image, and the loss function is determined based on the difference between the two, so that the loss function more truly reflects the training effect of the preset initial encoder neural network and improves the accuracy of the loss function.

在本申请一些实施例中，所述根据所述第三图像与所述第二图像之间的差异确定所述损失函数，包括：In some embodiments of the present application, determining the loss function based on the difference between the third image and the second image includes:

根据所述差异确定像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失；Determine pixel point loss, perceptual loss, identity information loss, saturation loss, contrast loss and brightness loss based on the difference;

根据预设权重系数组合对所述像素点损失、所述感知损失、所述身份信息损失、所述饱和度损失、所述对比度损失和所述亮度损失进行加权求和，生成所述损失函数。The loss function is generated by performing a weighted sum of the pixel point loss, the perceptual loss, the identity information loss, the saturation loss, the contrast loss and the brightness loss according to a preset weight coefficient combination.

本实施例中，损失函数为根据多种损失确定的复合损失函数，具体的，先根据第三图像与第二图像之间的差异确定像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失，其中，像素点损失表示像素级别的损失，具体可以为第三图像的像素点向量与第二图像的像素点向量之间的均方差。感知损失为符合人类感知过程的损失。身份信息损失具体是用预训练好的人脸识别模型计算第三图像与第二图像在身份信息向量上的余弦相似度。饱和度损失、对比度损失和亮度损失可以是第三图像与第二图像之间，分别在饱和度向量、对比度向量和亮度向量上的均方差。In this embodiment, the loss function is a composite loss function determined based on multiple losses. Specifically, pixel loss, perceptual loss, identity information loss, saturation loss, Contrast loss and brightness loss, where the pixel loss represents the loss at the pixel level, which can be specifically the mean square error between the pixel vector of the third image and the pixel vector of the second image. Perceptual loss is a loss consistent with the human perception process. The identity information loss specifically uses a pre-trained face recognition model to calculate the cosine similarity between the third image and the second image on the identity information vector. The saturation loss, contrast loss and brightness loss may be the mean square error on the saturation vector, contrast vector and brightness vector respectively between the third image and the second image.

预先设定分别与各损失对应的权重系数，组成预设权重系数组合，根据预设权重系数组合对各损失进行加权求和，得到损失函数。以此在训练过程中结合多种损失，进一步保证了训练所得到的预设编码器神经网络的准确性，并且由于考虑了饱和度损失、对比度损失和亮度损失，增加了侧重目标光照均衡效果一致性的考察维度，在模型训练中可更快的收敛到目标光照均衡效果。The weight coefficients corresponding to each loss are preset to form a preset weight coefficient combination, and each loss is weighted and summed according to the preset weight coefficient combination to obtain the loss function. In this way, multiple losses are combined during the training process to further ensure the accuracy of the preset encoder neural network obtained by training, and due to the consideration of saturation loss, contrast loss and brightness loss, the focus on the target illumination equalization effect is consistent. With a unique inspection dimension, it can converge to the target illumination balance effect faster during model training.

需要说明的是，以上实施例的方案仅为本申请所提出的一种具体实现方案，本领域技术人员可灵活采用其他类型的损失构建损失函数，这并不影响本申请的保护范围。It should be noted that the solution in the above embodiment is only a specific implementation solution proposed by this application. Those skilled in the art can flexibly use other types of losses to construct the loss function, which does not affect the protection scope of this application.

在本申请具体的应用场景中，若像素点损失为L₂(x，x)，第二图像的像素点向量为x′，第三图像的像素点向量为pSp(x)，则L₂(x，x)＝||x-pSp(x)||²，其中，pSp表示前述的pSp框架。In the specific application scenario of this application, if the pixel loss is L ₂ (x, x), the pixel vector of the second image is x′, and the pixel vector of the third image is pSp(x), then L ₂ ( x, x)=||x-pSp(x)|| ² , where pSp represents the aforementioned pSp framework.

若感知损失为L_LPIPS(x，x)，第二图像的感知向量为F(x′)，第三图像的感知向量为F(pSp(x))，则L_LPIPS(x，x′)＝||F(x′)-F(pSp(x))||²。If the perceptual loss is L _LPIPS (x, x), the perceptual vector of the second image is F(x′), and the perceptual vector of the third image is F(pSp(x)), then L _LPIPS (x, x′)= ||F(x′)-F(pSp(x))|| ² .

若身份信息损失为L_ID(x，x′)，第二图像的身份信息向量为R(x′)，第三图像的身份信息向量为R(pSp(x))，则L_ID(x，x)＝1-(R(x)，R(pSp(x)))。If the identity information loss is L _ID (x, x′), the identity information vector of the second image is R(x′), and the identity information vector of the third image is R(pSp(x)), then L _ID (x, x)=1-(R(x), R(pSp(x))).

若饱和度损失为L_SAT(x，x′)，第二图像的饱和度向量为S(x′)，第三图像的饱和度向量为S(pSp(x))，则L_SAT(x，x)＝||S(x)-S(pSp(x))||²。If the saturation loss is L _SAT (x, x′), the saturation vector of the second image is S(x′), and the saturation vector of the third image is S(pSp(x)), then L _SAT (x, x)=||S(x)-S(pSp(x))|| ² .

若对比度损失为L_CON(x，x′)，第二图像的对比度向量为C(x′)，第三图像的对比度向量为C(pSp(x))，则L_CON(x，x′)＝||C(x′)-C(pSp(x))||²。If the contrast loss is L _CON (x, x′), the contrast vector of the second image is C(x′), and the contrast vector of the third image is C(pSp(x)), then L _CON (x, x′) =||C(x′)-C(pSp(x))|| ² .

若亮度损失为L_LUM(x，x′)，第二图像的亮度向量为L(x′)，第三图像的亮度向量为L(pSp(x))，则L_LUM(x，x′)＝||L(x′)-L(pSp(x))||²。If the brightness loss is L _LUM (x, x′), the brightness vector of the second image is L(x′), and the brightness vector of the third image is L(pSp(x)), then L _LUM (x, x′) =||L(x′)-L(pSp(x))|| ² .

若损失函数为L(x，x′)，与像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失对应的权重系数分别为λ₁、λ₂、λ₃、λ₄、λ₅、λ₆，则：If the loss function is L(x, x′), the weight coefficients corresponding to pixel loss, perceptual loss, identity information loss, saturation loss, contrast loss and brightness loss are λ ₁ , λ ₂ , λ ₃ , λ ₄ respectively. , λ ₅ , λ ₆ , then:

L(x，x′)＝λ₁L₂(x，x′)+λ₂L_LPIPS(x，x′)+λ₃L_ID(x，x′)+λ₄L_SAT(x，x′)+λ₅L_CON(x，x′)+λ₆L_LUM(x，x′)。L(x,x′)＝λ ₁ L ₂ (x,x′)+λ ₂ L _LPIPS (x,x′)+λ ₃ L _ID (x,x′)+λ ₄ L _SAT (x,x′ )+λ ₅ L _CON (x, x′)+λ ₆ L _LUM (x, x′).

可选的，λ₁＝1、λ₂＝0.85、λ₃＝0.15、λ₄＝1、λ₅＝1、λ₆＝1。Optional, λ ₁ =1, λ ₂ =0.85, λ ₃ =0.15, λ ₄ =1, λ ₅ =1, λ ₆ =1.

步骤S23，根据所述损失函数对所述预设初始编码器神经网络的参数进行调整，在所述损失函数满足预设条件时，生成所述预设编码器神经网络。Step S23: Adjust the parameters of the preset initial encoder neural network according to the loss function, and generate the preset encoder neural network when the loss function meets the preset conditions.

在训练过程中，将损失函数反馈到预设初始编码器神经网络中，通过损失函数对预设初始编码器神经网络的参数进行调整，使得网络的权值得到更新和调整，在损失函数满足预设条件时，生成预设编码器神经网络，其中，预设条件可以为损失值收敛到允许的范围内或达到预设迭代次数。During the training process, the loss function is fed back to the preset initial encoder neural network, and the parameters of the preset initial encoder neural network are adjusted through the loss function, so that the weights of the network are updated and adjusted. When the loss function satisfies the preset When conditions are set, a preset encoder neural network is generated, where the preset condition can be that the loss value converges to an allowed range or reaches a preset number of iterations.

基于样本图像对预设初始编码器神经网络进行训练，并基于损失函数更新预设初始编码器神经网络的参数，进一步提高了预设编码器神经网络准确性。The preset initial encoder neural network is trained based on the sample image, and the parameters of the preset initial encoder neural network are updated based on the loss function, which further improves the accuracy of the preset encoder neural network.

在本申请任一实施例的基础上，所述预设编码器神经网络包括预设残差网络、分别与各所述尺度对应的多组第一全连接层、多个第二全连接层，各所述第一全连接层和各所述第二全连接层一一对应，所述基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，如图3所示，包括以下步骤：Based on any embodiment of the present application, the preset encoder neural network includes a preset residual network, a plurality of first fully connected layers and a plurality of second fully connected layers corresponding to each of the scales, Each of the first fully connected layers corresponds to each of the second fully connected layers, and the image to be processed is encoded based on the preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales, As shown in Figure 3, it includes the following steps:

步骤S301，将所述待处理图像输入所述预设残差网络，并基于所述通道注意力机制和所述空间注意力机制对所述待处理图像进行特征提取，获取多个分别与各所述尺度对应的第一特征向量。Step S301: Input the image to be processed into the preset residual network, and perform feature extraction on the image to be processed based on the channel attention mechanism and the spatial attention mechanism, and obtain a plurality of features corresponding to each location. The first eigenvector corresponding to the above scale.

本实施例中，预设编码器神经网络包括预设残差网络、分别与各尺度对应的多组第一全连接层、多个第二全连接层，各第一全连接层和各第二全连接层一一对应，每组第一全连接层可包括多个第一全连接层，每个第二全连接层输出一层特征潜码。将待处理图像输入预设残差网络，预设残差网络基于通道注意力机制和空间注意力机制进行特征提取，提取多个分别与各尺度对应的第一特征向量。每个尺度下可对应多个第一特征向量。In this embodiment, the preset encoder neural network includes a preset residual network, a plurality of first fully connected layers and a plurality of second fully connected layers corresponding to each scale, each first fully connected layer and each second fully connected layer. The fully connected layers have a one-to-one correspondence. Each group of first fully connected layers can include multiple first fully connected layers, and each second fully connected layer outputs a layer of latent feature codes. The image to be processed is input into the preset residual network. The preset residual network performs feature extraction based on the channel attention mechanism and the spatial attention mechanism, and extracts multiple first feature vectors corresponding to each scale. Each scale can correspond to multiple first feature vectors.

步骤S302，按照所述第一特征向量所属的尺度将所述第一特征向量输入所述第一全连接层，获取多个第二特征向量。Step S302: Input the first feature vector into the first fully connected layer according to the scale to which the first feature vector belongs, and obtain a plurality of second feature vectors.

每个第一特征向量输入相同尺度下的第一全连接层，第一全连接层输出一个第二特征向量，获取多个第二特征向量。Each first feature vector is input to the first fully connected layer at the same scale, and the first fully connected layer outputs a second feature vector to obtain multiple second feature vectors.

步骤S303，将各所述第二特征向量分别输入所述第二全连接层，获取多组所述特征潜码。Step S303: Input each of the second feature vectors into the second fully connected layer to obtain multiple sets of feature latent codes.

将第二特征向量分别输入第二全连接层，第二全连接层将各第二特征向量映射为多组特征潜码。The second feature vectors are respectively input into the second fully connected layer, and the second fully connected layer maps each second feature vector into multiple sets of feature latent codes.

通过将待处理图像依次经预设残差网络、各第一全连接层和各第二全连接层进行处理，得到编码后的特征潜码，进一步提高了特征潜码的准确性。By sequentially processing the image to be processed through the preset residual network, each first fully connected layer, and each second fully connected layer, the encoded feature latent code is obtained, which further improves the accuracy of the feature latent code.

需要说明的是，以上实施例的方案仅为本申请所提出的一种具体实现方案，本领域技术人员可灵活采用其他结构的编码器神经网络进行编码处理，这并不影响本申请的保护范围。It should be noted that the solution in the above embodiment is only a specific implementation solution proposed by this application. Those skilled in the art can flexibly use encoder neural networks with other structures to perform encoding processing, which does not affect the protection scope of this application. .

为了进一步阐述本发明的技术思想，现结合具体的应用场景，对本发明的技术方案进行说明。In order to further elaborate on the technical idea of the present invention, the technical solution of the present invention will now be described in conjunction with specific application scenarios.

本申请实施例提供一种图像处理方法，如图4所示，应用于采用预设编码器神经网络和StyleGAN生成器构成的pSp框架中，预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，特征金字塔结构包括三个层级，对应三种尺度，包括以下步骤：The embodiment of the present application provides an image processing method, as shown in Figure 4, applied in the pSp framework composed of a preset encoder neural network and a StyleGAN generator. The preset encoder neural network adopts a channel-based attention mechanism and spatial The feature pyramid structure of the attention mechanism includes three levels, corresponding to three scales, and includes the following steps:

步骤S1，获取10000组样本图像，每组样本图像包括一个人脸在光照不均衡情况下的第一图像和在光照均衡情况下的第二图像。Step S1: Obtain 10,000 sets of sample images. Each set of sample images includes a first image of a face under unbalanced illumination and a second image under balanced illumination.

每组样本图像经过了人脸检测和人脸对齐处理，并归一化到256*256尺寸。Each set of sample images has been processed by face detection and face alignment, and normalized to 256*256 size.

步骤S2，将任意一个第一图像作为输入图像输入预设编码器神经网络，得到三种尺度的样本特征潜码。Step S2: Enter any first image as an input image into the preset encoder neural network to obtain sample feature latent codes of three scales.

具体的，先通过预设残差网络提取三层vector，其中，预设残差网络引入采用通道注意力机制和空间注意力机制的卷积块注意模块。每一层vector中各第一特征向量经过一个map2style的第一全连接层之后转为第二特征向量，各第二特征向量再经过一个第二全连接层A后得到三种尺度的样本特征潜码。其中，包括18层512维的样本特征潜码，小尺度对应第0-2层，中尺度对应第3-6层，大尺度对应第7-18层。Specifically, the three-layer vector is first extracted through the preset residual network, in which the preset residual network introduces a convolutional block attention module that adopts the channel attention mechanism and the spatial attention mechanism. Each first feature vector in each layer of vector is converted into a second feature vector after passing through a map2style first fully connected layer. Each second feature vector then passes through a second fully connected layer A to obtain the sample feature latent of three scales. code. It includes 18 layers of 512-dimensional sample feature latent codes. The small scale corresponds to layers 0-2, the medium scale corresponds to layers 3-6, and the large scale corresponds to layers 7-18.

步骤S3，将三种尺度的样本特征潜码输入预训练的StyleGAN生成器，得到作为输出图像的第三图像。Step S3: Input the sample feature latent codes of three scales into the pre-trained StyleGAN generator to obtain the third image as the output image.

步骤S4，根据第三图像与第二图像之间的差异确定损失函数。Step S4: Determine the loss function based on the difference between the third image and the second image.

具体的，先根据差异确定像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失，然后根据预设权重系数组合对像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失进行加权求和，生成损失函数。Specifically, the pixel loss, perceptual loss, identity information loss, saturation loss, contrast loss and brightness loss are first determined based on the difference, and then the pixel loss, perceptual loss, identity information loss and saturation loss are combined according to the preset weight coefficients. , contrast loss and brightness loss are weighted and summed to generate a loss function.

步骤S5，反馈损失函数并调整参数。Step S5, feedback the loss function and adjust parameters.

将损失函数反馈回网络，并调整网络的权值，返回步骤S2，直到损失函数的损失值收敛到允许的范围内，结束训练，得到预设编码器神经网络。Feed the loss function back to the network, adjust the weights of the network, and return to step S2 until the loss value of the loss function converges to the allowed range. The training ends and the preset encoder neural network is obtained.

步骤S6，获取包括人脸且光照不均衡的原始图像，进行人脸检测、人脸对齐处理并归一化到256*256尺寸，得到待处理图像。Step S6: Obtain the original image including the face and with unbalanced illumination, perform face detection, face alignment processing and normalize to 256*256 size to obtain the image to be processed.

步骤S7，将待处理图像作为输入图像输入预设编码器神经网络，预设编码器神经网络进行编码后，将相应的三种尺度的特征潜码输入StyleGAN生成器，由StyleGAN生成器输出作为输出图像的目标图像。Step S7: The image to be processed is input into the preset encoder neural network as the input image. After the preset encoder neural network performs encoding, the corresponding feature latent codes of the three scales are input into the StyleGAN generator, and the output of the StyleGAN generator is used as the output. The target image of the image.

通过应用以上技术方案，引入通道注意力机制和空间注意力机制，将待处理图像编码为符合特征金字塔结构的多组特征潜码，使StyleGAN生成器根据特征潜码直接完成图像重建，从而提高了对图像进行光照均衡化处理的效率和准确性，并在训练过程中结合多种损失，进一步保证了训练所得到的预设编码器神经网络的准确性，并且由于考虑了饱和度损失、对比度损失和亮度损失，增加了侧重目标光照均衡效果一致性的考察维度，在模型训练中可更快的收敛到目标光照均衡效果。By applying the above technical solution, the channel attention mechanism and the spatial attention mechanism are introduced to encode the image to be processed into multiple sets of feature latent codes that conform to the feature pyramid structure, allowing the StyleGAN generator to directly complete image reconstruction based on the feature latent codes, thereby improving The efficiency and accuracy of illumination equalization processing of images, and the combination of multiple losses during the training process, further ensure the accuracy of the preset encoder neural network obtained by training, and due to the consideration of saturation loss, contrast loss and brightness loss, adding an inspection dimension that focuses on the consistency of the target illumination equalization effect, and can converge to the target illumination equalization effect faster during model training.

本申请实施例还提出了一种图像处理装置，如图5所示，所述装置包括：The embodiment of the present application also proposes an image processing device, as shown in Figure 5. The device includes:

获取模块501，用于获取光照不均衡的待处理图像；编码模块502，用于基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，所述预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各所述尺度分别对应所述特征金字塔结构中的不同层级；生成模块503，用于将各组所述特征潜码输入预训练的StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取光照均衡的目标图像。The acquisition module 501 is used to acquire the image to be processed with unbalanced illumination; the encoding module 502 is used to encode the image to be processed based on a preset encoder neural network and obtain multiple sets of feature latent codes belonging to different scales, wherein, The preset encoder neural network adopts a feature pyramid structure based on a channel attention mechanism and a spatial attention mechanism, and each scale corresponds to a different level in the feature pyramid structure; the generation module 503 is used to combine the features of each group into The feature latent code is input into the pre-trained StyleGAN generator, and the illumination-balanced target image is obtained according to the output result of the StyleGAN generator.

在具体的应用场景中，该装置还包括训练模块，用于：获取多组样本图像，每组所述样本图像包括一个样本对象在光照不均衡情况下的第一图像和在光照均衡情况下的第二图像；基于所述样本图像对采用所述特征金字塔结构的预设初始编码器神经网络进行训练，确定训练过程中的损失函数；根据所述损失函数对所述预设初始编码器神经网络的参数进行调整，在所述损失函数满足预设条件时，生成所述预设编码器神经网络。In a specific application scenario, the device also includes a training module for: acquiring multiple groups of sample images, each group of sample images including a first image of a sample object under unbalanced illumination and a first image under balanced illumination. The second image; based on the sample image, train the preset initial encoder neural network using the feature pyramid structure, and determine the loss function during the training process; train the preset initial encoder neural network according to the loss function The parameters are adjusted, and when the loss function meets the preset conditions, the preset encoder neural network is generated.

在具体的应用场景中，训练模块具体用于：将所述第一图像输入所述预设初始编码器神经网络，基于预设初始编码器神经网络对所述第一图像进行编码，获取分别与各所述尺度对应的多组样本特征潜码；将各组所述样本特征潜码输入所述StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取第三图像；根据所述第三图像与所述第二图像之间的差异确定所述损失函数。In a specific application scenario, the training module is specifically used to: input the first image into the preset initial encoder neural network, encode the first image based on the preset initial encoder neural network, and obtain the respective Multiple sets of sample feature latent codes corresponding to each of the scales; input each set of sample feature latent codes into the StyleGAN generator, and obtain a third image according to the output result of the StyleGAN generator; according to the third image and The difference between the second images determines the loss function.

在具体的应用场景中，训练模块还具体用于：根据所述差异确定像素点损失、感知损失、身份信息损失、饱和度损失、对比度损失和亮度损失；根据预设权重系数组合对所述像素点损失、所述感知损失、所述身份信息损失、所述饱和度损失、所述对比度损失和所述亮度损失进行加权求和，生成所述损失函数。In specific application scenarios, the training module is also specifically used to: determine pixel loss, perceptual loss, identity information loss, saturation loss, contrast loss and brightness loss based on the difference; and adjust the pixels based on the preset weight coefficient combination. The point loss, the perceptual loss, the identity information loss, the saturation loss, the contrast loss and the brightness loss are weighted and summed to generate the loss function.

在具体的应用场景中，所述样本对象为人脸，训练模块还具体用于：获取多个原始样本图像：对所述原始样本图像进行人脸检测，获取样本人脸图像；对各所述样本人脸图像进行人脸对齐处理并归一化到预设尺寸，获取多组所述样本图像。In a specific application scenario, the sample object is a human face, and the training module is also specifically used to: obtain multiple original sample images: perform face detection on the original sample images to obtain sample face images; The face image is subjected to face alignment processing and normalized to a preset size to obtain multiple sets of sample images.

在具体的应用场景中，所述预设编码器神经网络包括预设残差网络、分别与各所述尺度对应的多组第一全连接层、多个第二全连接层，各所述第一全连接层和各所述第二全连接层一一对应，编码模块502，具体用于：将所述待处理图像输入所述预设残差网络，并基于所述通道注意力机制和所述空间注意力机制对所述待处理图像进行特征提取，获取多个分别与各所述尺度对应的第一特征向量；按照所述第一特征向量所属的尺度将所述第一特征向量输入所述第一全连接层，获取多个第二特征向量；将各所述第二特征向量分别输入所述第二全连接层，获取多组所述特征潜码。In a specific application scenario, the preset encoder neural network includes a preset residual network, a plurality of first fully connected layers and a plurality of second fully connected layers corresponding to each of the scales. There is a one-to-one correspondence between a fully connected layer and each of the second fully connected layers. The encoding module 502 is specifically configured to: input the image to be processed into the preset residual network, and based on the channel attention mechanism and the The spatial attention mechanism performs feature extraction on the image to be processed to obtain a plurality of first feature vectors corresponding to each of the scales; input the first feature vector into the corresponding scale according to the scale to which the first feature vector belongs. The first fully connected layer is used to obtain a plurality of second feature vectors; each of the second feature vectors is input into the second fully connected layer to obtain multiple sets of feature latent codes.

在具体的应用场景中，所述待处理图像为人脸图像，获取模块501，具体用于：获取包括人脸且光照不均衡的原始图像；对所述原始图像进行人脸检测，获取待处理人脸图像；对所述待处理人脸图像进行人脸对齐处理并归一化到预设尺寸，获取所述待处理图像。In a specific application scenario, the image to be processed is a human face image. The acquisition module 501 is specifically used to: acquire an original image including a human face and with unbalanced illumination; perform face detection on the original image to obtain the face image to be processed. Face image; perform face alignment processing on the face image to be processed and normalize it to a preset size to obtain the image to be processed.

本申请实施例中的图像处理装置包括：获取模块，用于获取光照不均衡的待处理图像；编码模块，用于基于预设编码器神经网络对待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各尺度分别对应特征金字塔结构中的不同层级；生成模块，用于将各组特征潜码输入预训练的StyleGAN生成器，根据StyleGAN生成器的输出结果获取光照均衡的目标图像，通过引入通道注意力机制和空间注意力机制，将待处理图像编码为符合特征金字塔结构的多组特征潜码，使StyleGAN生成器根据特征潜码直接完成图像重建，从而提高了对图像进行光照均衡化处理的效率和准确性。The image processing device in the embodiment of the present application includes: an acquisition module, used to acquire an image to be processed with unbalanced illumination; an encoding module, used to encode the image to be processed based on a preset encoder neural network, and acquire multiple groups of images belonging to different scales. Feature latent code, in which the preset encoder neural network adopts a feature pyramid structure based on the channel attention mechanism and the spatial attention mechanism. Each scale corresponds to different levels in the feature pyramid structure; the generation module is used to convert each group of feature latent codes into The code is input to the pre-trained StyleGAN generator, and the illumination-balanced target image is obtained according to the output result of the StyleGAN generator. By introducing the channel attention mechanism and the spatial attention mechanism, the image to be processed is encoded into multiple sets of latent features that conform to the feature pyramid structure. code, allowing the StyleGAN generator to directly complete image reconstruction based on the feature latent code, thereby improving the efficiency and accuracy of illumination equalization processing of images.

本发明实施例还提供了一种电子设备，如图6所示，包括处理器601、通信接口602、存储器603和通信总线604，其中，处理器601，通信接口602，存储器603通过通信总线604完成相互间的通信，An embodiment of the present invention also provides an electronic device, as shown in Figure 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604. The processor 601, the communication interface 602, and the memory 603 communicate through the communication bus 604. complete mutual communication,

存储器603，用于存储处理器的可执行指令；Memory 603, used to store executable instructions of the processor;

处理器601，被配置为经由执行所述可执行指令来执行：Processor 601 is configured to, via execution of the executable instructions:

获取光照不均衡的待处理图像；基于预设编码器神经网络对所述待处理图像进行编码，获取多组属于不同尺度的特征潜码，其中，所述预设编码器神经网络采用基于通道注意力机制和空间注意力机制的特征金字塔结构，各所述尺度分别对应所述特征金字塔结构中的不同层级；将各组所述特征潜码输入预训练的StyleGAN生成器，根据所述StyleGAN生成器的输出结果获取光照均衡的目标图像。Obtain the image to be processed with unbalanced illumination; encode the image to be processed based on a preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales, wherein the preset encoder neural network uses channel attention-based The feature pyramid structure of the force mechanism and the spatial attention mechanism, each of the scales corresponds to different levels in the feature pyramid structure; each set of feature latent codes is input into the pre-trained StyleGAN generator, and according to the StyleGAN generator The output result obtains a well-illuminated target image.

上述通信总线可以是PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(Extended Industry Standard Architecture，扩展工业标准结构)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The above communication bus may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the above terminal and other devices.

存储器可以包括RAM(Random Access Memory，随机存取存储器)，也可以包括非易失性存储器，例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include RAM (Random Access Memory), or may include non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.

上述的处理器可以是通用处理器，包括CPU(Central Processing Unit，中央处理器)、NP(Network Processor，网络处理器)等；还可以是DSP(Digital Signal Processing，数字信号处理器)、ASIC(Application Specific Integrated Circuit，专用集成电路)、FPGA(Field Programmable Gate Array，现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including CPU (Central Processing Unit, central processing unit), NP (Network Processor, network processor), etc.; it can also be DSP (Digital Signal Processing, digital signal processor), ASIC ( Application Specific Integrated Circuit, FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

在本发明提供的又一实施例中，还提供了一种计算机可读存储介质，该计算机可读存储介质中存储有计算机程序，所述计算机程序被处理器执行时实现如上所述的图像处理方法。In yet another embodiment of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the image processing as described above is implemented. method.

在本发明提供的又一实施例中，还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行如上所述的图像处理方法。In yet another embodiment of the present invention, a computer program product containing instructions is also provided, which when run on a computer causes the computer to execute the image processing method as described above.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are mutually exclusive. any such actual relationship or sequence exists between them. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。Each embodiment in this specification is described in a related manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments.

以上所述仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention are included in the protection scope of the present invention.

Claims

1. An image processing method, characterized in that the method includes:

Obtain the image to be processed with uneven illumination;

The image to be processed is encoded based on a preset encoder neural network to obtain multiple sets of feature latent codes belonging to different scales. The preset encoder neural network uses features based on channel attention mechanism and spatial attention mechanism. Pyramid structure, each of the scales corresponds to different levels in the characteristic pyramid structure;

Each set of feature latent codes is input into a pre-trained StyleGAN generator, and a target image with balanced illumination is obtained based on the output result of the StyleGAN generator.

2. The method of claim 1, characterized in that, before encoding the image to be processed based on a preset encoder neural network to obtain multiple groups of feature latent codes belonging to different scales, the method further includes:

Acquire multiple sets of sample images, each set of sample images including a first image of a sample object under unbalanced illumination and a second image under balanced illumination;

Train a preset initial encoder neural network using the feature pyramid structure based on the sample image, and determine the loss function during the training process;

The parameters of the preset initial encoder neural network are adjusted according to the loss function, and when the loss function meets the preset conditions, the preset encoder neural network is generated.

3. The method of claim 2, wherein training a preset initial encoder neural network using the feature pyramid structure based on the sample image and determining a loss function during the training process includes:

Input the first image into the preset initial encoder neural network, encode the first image based on the preset initial encoder neural network, and obtain multiple sets of sample feature latent codes corresponding to each of the scales;

Input the sample feature latent code of each group into the StyleGAN generator, and obtain the third image according to the output result of the StyleGAN generator;

The loss function is determined based on the difference between the third image and the second image.

4. The method of claim 3, wherein determining the loss function based on the difference between the third image and the second image includes:

Determine pixel point loss, perceptual loss, identity information loss, saturation loss, contrast loss and brightness loss based on the difference;

The loss function is generated by performing a weighted sum of the pixel point loss, the perceptual loss, the identity information loss, the saturation loss, the contrast loss and the brightness loss according to a preset weight coefficient combination.

5. The method of claim 2, wherein the sample object is a human face, and obtaining multiple groups of sample images includes:

Get multiple original sample images:

Perform face detection on the original sample image to obtain a sample face image;

Face alignment processing is performed on each of the sample face images and normalized to a preset size to obtain multiple groups of the sample images.

6. The method of claim 1, wherein the preset encoder neural network includes a preset residual network, a plurality of first fully connected layers corresponding to each of the scales, and a plurality of second fully connected layers. Fully connected layer, each of the first fully connected layers corresponds to each of the second fully connected layers, the image to be processed is encoded based on the preset encoder neural network, and multiple groups of images belonging to different scales are obtained. Feature latent codes include:

The image to be processed is input into the preset residual network, and feature extraction is performed on the image to be processed based on the channel attention mechanism and the spatial attention mechanism, and a plurality of images corresponding to each of the scales are obtained. The first eigenvector of;

Input the first feature vector into the first fully connected layer according to the scale to which the first feature vector belongs, and obtain a plurality of second feature vectors;

Each of the second feature vectors is input into the second fully connected layer to obtain multiple sets of feature latent codes.

7. The method of claim 1, wherein the image to be processed is a face image, and obtaining the image to be processed with unbalanced illumination includes:

Obtain raw images including faces and uneven lighting;

Perform face detection on the original image to obtain the face image to be processed;

Perform face alignment processing on the face image to be processed and normalize it to a preset size to obtain the image to be processed.

8. An image processing device, characterized in that the device includes:

The acquisition module is used to acquire images to be processed with uneven illumination;

An encoding module, used to encode the image to be processed based on a preset encoder neural network and obtain multiple sets of feature latent codes belonging to different scales, wherein the preset encoder neural network adopts a channel attention mechanism and space-based The feature pyramid structure of the attention mechanism, each of the scales corresponds to different levels in the feature pyramid structure;

A generation module, configured to input each set of feature latent codes into a pre-trained StyleGAN generator, and obtain a target image with balanced illumination according to the output result of the StyleGAN generator.

9. An electronic device, characterized in that it includes:

processor; and

memory for storing executable instructions for the processor;

Wherein, the processor is configured to execute the image processing method according to any one of claims 1 to 7 by executing the executable instructions.

10. A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the image processing method according to any one of claims 1 to 7 is implemented.