CN113284055A

CN113284055A - Image processing method and device

Info

Publication number: CN113284055A
Application number: CN202110293366.2A
Authority: CN
Inventors: 王宪; 汪涛; 郑卓然; 任文琦; 操晓春
Original assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-08-20
Anticipated expiration: 2041-03-18
Also published as: CN113284055B

Abstract

This application provides an image processing method and device in the field of artificial intelligence. By extracting features from multiple channels in an input image, up-sampling guidance is performed on bilateral grid data, so that each channel of the image can be implemented. Better image enhancement effects, such as dehazing effects, can achieve lightweight image dehazing and improve user experience. The method includes: acquiring an input image, where the input image includes information of multiple channels; extracting features from the information of multiple channels of the input image to obtain multiple guide images; acquiring bilateral grid data corresponding to the input image, the bilateral grid The data includes the data formed by the information of the luminance dimension arranged in the spatial dimension, and the resolution of the bilateral grid data is lower than that of the input image; each guide image in the multiple guide images is used as a guide condition, and the bilateral grid data is The grid data is up-sampled to obtain multiple feature maps; multiple feature maps are fused to obtain the output image.

Description

Method and device for image processing

技术领域technical field

本申请涉及人工智能领域，尤其涉及一种图像处理的方法以及装置。The present application relates to the field of artificial intelligence, and in particular, to an image processing method and apparatus.

背景技术Background technique

去雾方法经历了大致三个阶段的发展：早期的主流方法为传统方法，对光线传播建模估计，但人为设计的模型以及图片先验知识并不能准确应用到复杂的不同类型的真实图片中。在卷积神经网络应用于大量视觉任务中并取得效果的重大突破之后，基于学习的方法成为主流，替换传统方法中的一些模块为可学习的网络层。而最近的方法则丢弃掉光线估计等人为设计的模块，采用端到端的网络来解决去雾的任务，即图片变换的整个过程由网络学习，算法完全由数据驱动。而端到端网络对计算量需求很大，无法实时对图像进行处理。因此，如何实现轻量化且更好的图像增强效果，成为亟待解决的问题。The dehazing method has undergone roughly three stages of development: the early mainstream methods are traditional methods, which model and estimate light propagation, but the artificially designed models and image prior knowledge cannot be accurately applied to complex and different types of real images. . After a major breakthrough in the application of convolutional neural networks to a large number of vision tasks and achieved results, learning-based methods have become mainstream, replacing some modules in traditional methods with learnable network layers. The most recent method discards artificially designed modules such as light estimation, and uses an end-to-end network to solve the task of dehazing, that is, the entire process of image transformation is learned by the network, and the algorithm is completely driven by data. However, the end-to-end network requires a lot of computation and cannot process images in real time. Therefore, how to achieve a lightweight and better image enhancement effect has become an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本申请提供一种图像处理的方法以及装置，通过从输入图像中的多个通道中分别提取特征，来对双边网格数据进行上采样引导，从而在图像的各个通道都可以实现更好的图像增强效果，如去雾，且可以实现轻量化的图像去雾，提高用户体验。The present application provides an image processing method and device. By extracting features from multiple channels in an input image, up-sampling guidance is performed on bilateral grid data, so that better images can be achieved in each channel of the image. Enhanced effects, such as dehazing, can achieve lightweight image dehazing to improve user experience.

有鉴于此，第一方面，本申请提供一种图像处理的方法，包括：获取输入图像，输入图像包括多个通道的信息；分别从输入图像的多个通道的信息中提取特征，得到多个引导图，该多个引导图和多个通道一一对应，即每个通道都有一个对应的引导图；获取输入图像对应的双边网格数据，双边网格数据包括在空间维度中排列的亮度维度的信息形成的数据，亮度维度的信息为根据从输入图像中提取到的特征得到，双边网格数据的分辨率低于输入图像的分辨率，空间维度为预设的空间或者根据输入图像确定的空间；分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，得到多个特征图，其中，每个引导图可以用于引导从双边网格数据中的亮度维度选择与对应的通道对应的信息来进行上采样；融合多个特征图，得到输出图像。In view of this, in a first aspect, the present application provides an image processing method, including: acquiring an input image, the input image including information of multiple channels; extracting features from the information of multiple channels of the input image, respectively, to obtain multiple Guide map, the multiple guide maps are in one-to-one correspondence with multiple channels, that is, each channel has a corresponding guide map; obtain the bilateral grid data corresponding to the input image, and the bilateral grid data includes the brightness arranged in the spatial dimension The data formed by the information of the dimension, the information of the brightness dimension is obtained according to the features extracted from the input image, the resolution of the bilateral grid data is lower than that of the input image, and the spatial dimension is the preset space or determined according to the input image space; using each guide map in the multiple guide maps as the guide condition, up-sampling the bilateral grid data to obtain multiple feature maps, wherein each guide map can be used to guide the data from the bilateral grid data. The brightness dimension of , selects the information corresponding to the corresponding channel for up-sampling; fuses multiple feature maps to obtain the output image.

因此，在本申请实施方式中，可以从输入图像的每个通道的维度来提取特征作为引导图，对输入图像的双边网格数据来进行上采样，从而平滑输入图像中的噪声，提高图像的清晰度，且可以避免各个通道中的信息丢失，进一步提高图像的清晰度。并且，本申请通过对低分辨率的双边网格数据进行上采样来实现图像增强，因双边网格数据的分辨率较低，所消耗的计算量也就越少，从而实现轻量化的图像增强。例如，在需要对图像进行去雾时，可以通过本申请实施例提供的方法，从图像的各个通道来提取特征，使最终的输出图像中可以更多的保留输入图像中各个通道的细节信息，轻量化地实现去雾效果，提高用户体验。Therefore, in the embodiment of the present application, features can be extracted from the dimension of each channel of the input image as a guide map, and the bilateral grid data of the input image can be upsampled, thereby smoothing the noise in the input image and improving the quality of the image. It can avoid the loss of information in each channel, and further improve the clarity of the image. In addition, the present application implements image enhancement by upsampling the low-resolution bilateral grid data. Since the resolution of the bilateral grid data is lower, the amount of computation consumed is less, thereby realizing lightweight image enhancement. . For example, when the image needs to be dehazed, the method provided by the embodiment of the present application can be used to extract features from each channel of the image, so that more detailed information of each channel in the input image can be retained in the final output image. Lightweight to achieve dehazing effect and improve user experience.

在一种可能的实施方式中，前述的分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，可以包括：使用第一引导图作为引导条件，对双边网格数据进行上采样，得到上采样特征，第一引导图是多个引导图中的任意一个；融合上采样特征和输入图像，得到第一特征图，第一特征图包括于多个特征图。In a possible implementation manner, the aforementioned up-sampling of bilateral grid data using each guide map of the plurality of guide maps as a guide condition may include: using the first guide map as a guide condition, The grid data is up-sampled to obtain up-sampling features, and the first guide map is any one of multiple guide maps; the up-sampling features and the input image are fused to obtain a first feature map, and the first feature map is included in the multiple feature maps. .

本申请实施方式中，可以以引导图作为引导条件，对双边网格数据进行上采样，从而使进行上采样时，可以参考输入图像中各个通道的特征，在各个通道的特征的引导下，实现更优的上采样效果，使上采样得到的特征可以更准确地描述输入图像中的细节，且平滑输入图像中的噪声，实现去噪效果。In the embodiment of the present application, the guide map can be used as a guide condition to upsample the bilateral grid data, so that when performing upsampling, the characteristics of each channel in the input image can be referred to, and under the guidance of the characteristics of each channel, the realization of Better upsampling effect, so that the features obtained by upsampling can more accurately describe the details in the input image, and smooth the noise in the input image to achieve the denoising effect.

在一种可能的实施方式中，上述的融合上采样特征和输入图像，得到第一特征图，可以包括：对上采样特征进行压缩，得到压缩特征，压缩特征的通道数量少于上采样特征的通道数量；对压缩特征和输入图像进行逐项乘积(element-wise product)，即将压缩特征中每个像素点的值和输入图像中对应像素点的值相乘，得到第一特征图。In a possible implementation manner, the above-mentioned fusion of the up-sampling feature and the input image to obtain the first feature map may include: compressing the up-sampling feature to obtain the compressed feature, and the number of channels of the compressed feature is less than that of the up-sampling feature. Number of channels; perform an element-wise product on the compressed feature and the input image, that is, multiply the value of each pixel in the compressed feature with the value of the corresponding pixel in the input image to obtain the first feature map.

本申请实施方式中，可以通过对上采样特征进行压缩，从而得到通道数量更少的压缩特征，从而减少后续的融合输入图像和特征时的计算量，帮助实现轻量化的图像增强，以便于应用于各种设备中，提高泛化能力。In the embodiments of the present application, compressed features with fewer channels can be obtained by compressing the up-sampling features, thereby reducing the amount of calculation in the subsequent fusion of input images and features, helping to achieve lightweight image enhancement, and facilitating application In a variety of devices, improve the generalization ability.

在一种可能的实施方式中，上述的获取输入图像对应的双边网格数据，得到输出图像，可以包括：对输入图像进行下采样，得到下采样图像；从下采样图像中提取特征，得到下采样特征，随后根据下采样特征得到双边网格数据。In a possible implementation manner, the above-mentioned obtaining the bilateral grid data corresponding to the input image to obtain the output image may include: down-sampling the input image to obtain the down-sampled image; extracting features from the down-sampled image to obtain the down-sampled image Sampling features, and then obtain bilateral grid data according to the down-sampling features.

因此，本申请实施方式中，可以对输入图像进行下采样，从而得到低分辨率图像，并对低分辨率图像进行特征提取，从而在后续对双边网格数据进行上采样时，可以在引导图的引导下进行上采样，从而实现平滑噪声的效果。Therefore, in the embodiments of the present application, the input image can be down-sampled to obtain a low-resolution image, and feature extraction is performed on the low-resolution image. Upsampling is performed under the guidance of , so as to achieve the effect of smoothing noise.

在一种可能的实施方式中，上述的融合多个特征图，得到拼接图像，包括：拼接多个特征图，得到拼接图像；对拼接图像进行至少一次特征提取，得到至少一个第一特征；融合至少一个第一特征和输入图像，得到输出图像。In a possible implementation manner, the above-mentioned fusion of multiple feature maps to obtain a spliced image includes: splicing multiple feature maps to obtain a spliced image; performing feature extraction on the spliced image at least once to obtain at least one first feature; merging At least one first feature and an input image to obtain an output image.

因此，在本申请实施方式中，可以通过拼接的方式实现通道维度的拼接，从而得到多个通道的输出图像。Therefore, in the embodiments of the present application, the splicing of the channel dimensions can be realized by splicing, so as to obtain output images of multiple channels.

在一种可能的实施方式中，上述的拼接多个特征图，可以包括：拼接多个特征图和输入图像，得到拼接图像。In a possible implementation manner, the above-mentioned splicing of multiple feature maps may include: splicing multiple feature maps and an input image to obtain a spliced image.

本申请实施方式中，在拼接特征图时，可以融入输入图像，从而通过输入图像中所包括的信息，来补充拼接图像的细节信息，避免输入图像中的信息丢失，提高图像的清晰度。In the embodiment of the present application, when splicing feature maps, the input image can be integrated, so that the detailed information of the spliced image can be supplemented by the information included in the input image, so as to avoid the loss of information in the input image and improve the clarity of the image.

第二方面，本申请提供一种神经网络训练方法，包括：获取训练集，训练集中包括多个图像样本以及每个图像样本对应的真值图像，每个图像样本包括多个通道的信息；使用训练集对神经网络进行至少一次迭代训练，得到训练后的神经网络；其中，在任意一次迭代训练过程中，神经网络分别从输入图像的多个通道的信息中提取特征，得到多个引导图，该多个引导图和多个通道一一对应，即每个通道对应一个引导图，获取输入图像对应的双边网格数据分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，得到多个特征图，融合多个特征图，得到输出图像，根据输出图像和输入图像对应的真值图像更新神经网络，得到当前次更新后的神经网络，双边网格数据包括在预设空间中排列的亮度维度的信息形成的数据，亮度维度的信息为根据从输入图像中提取到的特征得到，双边网格数据的分辨率低于输入图像的分辨率。In a second aspect, the present application provides a method for training a neural network, including: acquiring a training set, where the training set includes multiple image samples and a ground-truth image corresponding to each image sample, and each image sample includes information of multiple channels; using The training set performs at least one iterative training on the neural network to obtain a trained neural network; wherein, in any iterative training process, the neural network extracts features from the information of multiple channels of the input image to obtain multiple guide maps, The multiple guide maps are in one-to-one correspondence with the multiple channels, that is, each channel corresponds to one guide map, and the bilateral grid data corresponding to the input image is obtained. Each guide map in the multiple guide maps is used as a guide condition. Upsampling the grid data to obtain multiple feature maps, fuse multiple feature maps to obtain the output image, update the neural network according to the true value image corresponding to the output image and the input image, and obtain the current updated neural network, bilateral grid data It includes the data formed by the information of the brightness dimension arranged in the preset space, the information of the brightness dimension is obtained according to the features extracted from the input image, and the resolution of the bilateral grid data is lower than that of the input image.

因此，本申请提供的方法中，在对神经网络进行训练时，可以从输入图像的每个通道的维度来提取特征作为引导图，对输入图像的双边网格数据来进行上采样，从而平滑输入图像中的噪声，提高图像的清晰度，且可以避免各个通道中的信息丢失，进一步提高神经网络输出的图像的清晰度。并且，本申请通过对低分辨率的双边网格数据进行上采样来实现图像增强，因双边网格数据的分辨率较低，所消耗的计算量也就越少，从而实现轻量化的图像增强。例如，在需要对图像进行去雾时，可以通过本申请实施例提供的方法训练得到的神经网络，从图像的各个通道来提取特征，使最终的输出图像中可以更多的保留输入图像中各个通道的细节信息，并实现去雾效果，提高用户体验。Therefore, in the method provided in this application, when training the neural network, features can be extracted from the dimension of each channel of the input image as a guide map, and the bilateral grid data of the input image can be upsampled, thereby smoothing the input The noise in the image can improve the clarity of the image, and can avoid the loss of information in each channel, and further improve the clarity of the image output by the neural network. In addition, the present application implements image enhancement by upsampling the low-resolution bilateral grid data. Since the resolution of the bilateral grid data is lower, the amount of computation consumed is less, thereby realizing lightweight image enhancement. . For example, when the image needs to be dehazed, the neural network obtained by training the method provided in the embodiment of the present application can be used to extract features from each channel of the image, so that the final output image can retain more of the various channels in the input image. The detailed information of the channel, and realize the dehazing effect, improve the user experience.

在一种可能的实施方式中，上述的融合上采样特征和输入图像，可以包括：对上采样特征进行压缩，得到压缩特征，压缩特征的通道数量少于上采样特征的通道数量；对压缩特征和输入图像进行逐项乘积，得到第一特征图。In a possible implementation manner, the above-mentioned fusion of the upsampling feature and the input image may include: compressing the upsampling feature to obtain a compressed feature, and the number of channels of the compressed feature is less than the number of channels of the upsampling feature; Item-wise product with the input image to get the first feature map.

在一种可能的实施方式中，上述的获取输入图像对应的双边网格数据，可以包括：对输入图像进行下采样，得到下采样图像；从下采样图像中提取特征，得到下采样特征，双边网格数据包括下采样特征。In a possible implementation manner, obtaining the bilateral grid data corresponding to the input image above may include: down-sampling the input image to obtain a down-sampled image; extracting features from the down-sampled image to obtain down-sampling features, bilateral The grid data includes downsampled features.

在一种可能的实施方式中，上述的融合多个特征图，包括：拼接多个特征图，得到拼接图像；对拼接图像进行至少一次特征提取，得到至少一个第一特征；融合至少一个第一特征和输入图像，得到输出图像。In a possible implementation manner, the above-mentioned fusing of multiple feature maps includes: splicing multiple feature maps to obtain a spliced image; performing feature extraction on the spliced image at least once to obtain at least one first feature; fusing at least one first feature Features and input image, get output image.

第三方面，本申请提供一种神经网络，可以包括：双边网格生成网络、引导图生成网络、特征重建网络和图像重建网络等。In a third aspect, the present application provides a neural network, which may include: a bilateral grid generation network, a guide map generation network, a feature reconstruction network, an image reconstruction network, and the like.

其中，双边网格生成网络可以用于：对全分辨率的输入图像进行下采样，得到低分辨率图像，然后基于下采样得到的图像生成双边网格数据，该双边网格数据包括空间维度和亮度相关的信息形成得到至少三维的数据。Among them, the bilateral grid generation network can be used to: downsample the full-resolution input image to obtain a low-resolution image, and then generate bilateral grid data based on the down-sampled image, where the bilateral grid data includes spatial dimensions and The luminance-related information forms data resulting in at least three dimensions.

引导图生成网络用于：基于全分辨率的输入图像的每个通道来提取特征，得到每个通道对应的引导图，即每个通道对应一个引导图。The guide map generation network is used to extract features based on each channel of the full-resolution input image, and obtain a guide map corresponding to each channel, that is, each channel corresponds to a guide map.

特征重建网络用于：针对每个通道，在对应的以引导图的引导下，对双边网格进行上采样，得到每个通道对应的特征图。The feature reconstruction network is used to: for each channel, under the guidance of the corresponding guide map, upsample the bilateral grid to obtain the feature map corresponding to each channel.

图像重建网络用于：对每个通道对应的特征图进行融合，并将融合后的特征与输入图像融合，得到输出图像。The image reconstruction network is used to fuse the feature maps corresponding to each channel, and fuse the fused features with the input image to obtain the output image.

此外，应理解，该神经网络可以用于执行前述第一方面或者第一方面任一可选实施方式的方法步骤。Furthermore, it should be understood that the neural network may be used to perform the method steps of the aforementioned first aspect or any of the alternative embodiments of the first aspect.

第四方面，本申请实施例提供一种图像处理装置，该图像处理装置具有实现上述第一方面图像处理方法的功能。该功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, an embodiment of the present application provides an image processing apparatus, and the image processing apparatus has the function of implementing the image processing method of the first aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.

第五方面，本申请实施例提供一种训练装置，该训练装置具有实现上述第二方面神经网络训练方法的功能。该功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fifth aspect, an embodiment of the present application provides a training device, and the training device has the function of implementing the neural network training method of the second aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.

第六方面，本申请实施例提供一种图像处理装置，包括：处理器和存储器，其中，处理器和存储器通过线路互联，处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的用于图像处理的方法中与处理相关的功能。可选地，该图像处理装置可以是芯片。In a sixth aspect, an embodiment of the present application provides an image processing apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any one of the first aspects above Processing-related functions in the shown method for image processing. Optionally, the image processing device may be a chip.

第七方面，本申请实施例提供一种训练装置，包括：处理器和存储器，其中，处理器和存储器通过线路互联，处理器调用存储器中的程序代码用于执行上述第三方面任一项所示的神经网络训练方法中与处理相关的功能。可选地，该训练装置可以是芯片。In a seventh aspect, an embodiment of the present application provides a training device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any of the above-mentioned third aspects. processing-related functions in the neural network training method shown. Alternatively, the training device may be a chip.

第八方面，本申请实施例提供了一种图像处理装置，该图像处理装置也可以称为数字处理芯片或者芯片，芯片包括处理单元和通信接口，处理单元通过通信接口获取程序指令，程序指令被处理单元执行，处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。In an eighth aspect, the embodiments of the present application provide an image processing device, which may also be referred to as a digital processing chip or a chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are The processing unit executes, and the processing unit is configured to execute the processing-related functions in the first aspect or any optional implementation manner of the first aspect.

第九方面，本申请实施例提供了一种训练装置，该训练装置也可以称为数字处理芯片或者芯片，芯片包括处理单元和通信接口，处理单元通过通信接口获取程序指令，程序指令被处理单元执行，处理单元用于执行如上述第二方面或第二方面任一可选实施方式中与处理相关的功能。In a ninth aspect, an embodiment of the present application provides a training device. The training device may also be called a digital processing chip or a chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are processed by the processing unit. For execution, the processing unit is configured to perform the processing-related functions in the second aspect or any optional implementation manner of the second aspect.

第十方面，本申请实施例提供了一种计算机可读存储介质，包括指令，当其在计算机上运行时，使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。In a tenth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect. .

第十一方面，本申请实施例提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。In an eleventh aspect, the embodiments of the present application provide a computer program product including instructions, which, when run on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect.

附图说明Description of drawings

图1为本申请应用的一种人工智能主体框架示意图；Fig. 1 is a schematic diagram of a main frame of artificial intelligence applied by the application;

图2为本申请实施例提供的一种卷积神经网络结构示意图；2 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application;

图3为本申请实施例提供的一种系统架构示意图；3 is a schematic diagram of a system architecture provided by an embodiment of the present application;

图4为本申请实施例提供的另一种系统架构示意图；FIG. 4 is a schematic diagram of another system architecture provided by an embodiment of the present application;

图5为本申请实施例提供的一种应用场景示意图；FIG. 5 is a schematic diagram of an application scenario provided by an embodiment of the present application;

图6为本申请实施例提供的另一种应用场景示意图；FIG. 6 is a schematic diagram of another application scenario provided by an embodiment of the present application;

图7为本申请实施例提供的一种图像处理方法的流程示意图；7 is a schematic flowchart of an image processing method provided by an embodiment of the present application;

图8为本申请实施例提供的另一种图像处理方法的流程示意图；FIG. 8 is a schematic flowchart of another image processing method provided by an embodiment of the present application;

图9为本申请实施例提供的一种生成双边网格数据的流程示意图；9 is a schematic flowchart of generating bilateral grid data according to an embodiment of the present application;

图10为本申请实施例提供的一种生成引导图的流程示意图；FIG. 10 is a schematic flowchart of generating a guide map according to an embodiment of the present application;

图11为本申请实施例提供的另一种生成引导图的流程示意图；FIG. 11 is another schematic flowchart of generating a guide map according to an embodiment of the present application;

图12为本申请实施例提供的一种其中一个通道的特征重建的流程示意图；12 is a schematic flowchart of feature reconstruction of one of the channels provided by an embodiment of the present application;

图13为本申请实施例提供的另一种其中一个通道的特征重建的流程示意图；13 is another schematic flowchart of feature reconstruction of one of the channels provided by an embodiment of the present application;

图14为本申请实施例提供的一种图像重建的流程示意图；14 is a schematic flowchart of an image reconstruction provided by an embodiment of the present application;

图15为本申请实施例提供的一种神经网络的结构示意图；15 is a schematic structural diagram of a neural network provided by an embodiment of the application;

图16为本申请提供的一种神经网络训练方法的流程示意图；16 is a schematic flowchart of a neural network training method provided by the application;

图17为本申请提供的一种图像增强效果示意图；17 is a schematic diagram of an image enhancement effect provided by the application;

图18A为本申请提供的一种图像增强效果示意图；18A is a schematic diagram of an image enhancement effect provided by the application;

图18B为本申请提供的另一种图像增强效果示意图；18B is a schematic diagram of another image enhancement effect provided by the application;

图18C为本申请提供的另一种图像增强效果示意图；18C is a schematic diagram of another image enhancement effect provided by the application;

图19为本申请提供的一种图像处理装置的结构示意图；19 is a schematic structural diagram of an image processing apparatus provided by the application;

图20为本申请提供的一种神经网络训练的结构示意图；20 is a schematic structural diagram of a neural network training provided by the application;

图21为本申请提供的另一种图像处理装置的结构示意图；21 is a schematic structural diagram of another image processing apparatus provided by the application;

图22为本申请提供的另一种神经网络训练的结构示意图；22 is a schematic structural diagram of another neural network training provided by the application;

图23为本申请提供的一种芯片的结构示意图。FIG. 23 is a schematic structural diagram of a chip provided by this application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

首先对人工智能系统总体工作流程进行描述，请参见图1，图1示出的为人工智能主体框架的一种结构示意图，下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中，“智能信息链”反映从数据的获取到处理的一列过程。举例来说，可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中，数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程，反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.

(1)基础设施(1) Infrastructure

基础设施为人工智能系统提供计算能力支持，实现与外部世界的沟通，并通过基础平台实现支撑。通过传感器与外部沟通；计算能力由智能芯片，如中央处理器(centralprocessing unit，CPU)、网络处理器(neural-network processing unit，NPU)、图形处理器(英语：graphics processing unit，GPU)、专用集成电路(application specificintegrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array，FPGA)等硬件加速芯片)提供；基础平台包括分布式计算框架及网络等相关的平台保障和支持，可以包括云存储和计算、互联互通网络等。举例来说，传感器和外部沟通获取数据，这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by smart chips, such as central processing unit (CPU), network processing unit (NPU), graphics processing unit (English: graphics processing unit, GPU), dedicated Integrated circuit (application specific integrated circuit, ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) provided; the basic platform includes distributed computing framework and network and other related platform guarantees and supports, which may include Cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.

(2)数据(2) Data

基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本，还涉及到传统设备的物联网数据，包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)数据处理(3) Data processing

数据处理通常包括数据训练，机器学习，深度学习，搜索，推理，决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.

其中，机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.

推理是指在计算机或智能系统中，模拟人类的智能推理方式，依据推理控制策略，利用形式化的信息进行机器思维和求解问题的过程，典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.

决策是指智能信息经过推理后进行决策的过程，通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4)通用能力(4) General ability

对数据经过上面提到的数据处理后，进一步基于数据处理的结果可以形成一些通用的能力，比如可以是算法或者一个通用系统，例如，翻译，文本的分析，计算机视觉的处理，语音识别，图像的识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.

(5)智能产品及行业应用(5) Smart products and industry applications

智能产品及行业应用指人工智能系统在各领域的产品和应用，是对人工智能整体解决方案的封装，将智能信息决策产品化、实现落地应用，其应用领域主要包括：智能终端、智能交通、智能医疗、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.

本申请实施例涉及了大量神经网络的相关应用，为了更好地理解本申请实施例的方案，下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。The embodiments of the present application involve a large number of related applications of neural networks. In order to better understand the solutions of the embodiments of the present application, the related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.

(1)神经网络(1) Neural network

神经网络可以是由神经单元组成的，神经单元可以是指以xs和截距1为输入的运算单元，该运算单元的输出可以如公式(1-1)所示：A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be shown in formula (1-1):

其中，s＝1、2、……n，n为大于1的自然数，Ws为xs的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，用于将非线性特性引入神经网络中，来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入，激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

(2)深度神经网络(2) Deep neural network

深度神经网络(deep neural network，DNN)，也称多层神经网络，可以理解为具有多层中间层的神经网络。按照不同层的位置对DNN进行划分，DNN内部的神经网络可以分为三类：输入层，中间层，输出层。一般来说第一层是输入层，最后一层是输出层，中间的层数都是中间层，或者称为隐层。层与层之间是全连接的，也就是说，第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple intermediate layers. The DNN is divided according to the position of different layers. The neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all intermediate layers, or hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

虽然DNN看起来很复杂，其每一层可以表示为线性关系表达式：

其中，

是输入向量，

是输出向量，

是偏移向量或者称为偏置参数，w是权重矩阵(也称系数)，α()是激活函数。每一层仅仅是对输入向量

经过如此简单的操作得到输出向量

由于DNN层数多，系数W和偏移向量

的数量也比较多。这些参数在DNN中的定义如下所述：以系数w为例：假设在一个三层的DNN中，第二层的第4个神经元到第三层的第2个神经元的线性系数定义为

上标3代表系数W所在的层数，而下标对应的是输出的第三层索引2和输入的第二层索引4。Although DNN looks complicated, each layer can be expressed as a linear relational expression:

in,

is the input vector,

is the output vector,

is the offset vector or bias parameter, w is the weight matrix (also known as the coefficient), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and offset vector

The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient w as an example: Suppose that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

综上，第L-1层的第k个神经元到第L层的第j个神经元的系数定义为

To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as

需要注意的是，输入层是没有W参数的。在深度神经网络中，更多的中间层让网络更能够刻画现实世界中的复杂情形。理论上而言，参数越多的模型复杂度越高，“容量”也就越大，也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程，其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer does not have a W parameter. In a deep neural network, more intermediate layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).

(3)卷积神经网络(3) Convolutional Neural Network

卷积神经网络(convolutional neuron network，CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器，该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

本申请以下提及的用于提取特征的网络，即可以包括一层或者多层卷积层，示例性地，该用于提取特征的网络即可以采用CNN来实现。The network for extracting features mentioned below in this application may include one or more layers of convolutional layers. Exemplarily, the network for extracting features may be implemented by using CNN.

(4)损失函数(4) Loss function

在训练深度神经网络的过程中，因为希望深度神经网络的输出尽可能的接近真正想要预测的值，所以可以通过比较当前网络的预测值和真正想要的目标值，再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然，在第一次更新之前通常会有初始化的过程，即为深度神经网络中的各层预先配置参数)，比如，如果网络的预测值高了，就调整权重向量让它预测低一些，不断地调整，直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此，就需要预先定义“如何比较预测值和目标值之间的差异”，这便是损失函数(loss function)或目标函数(objective function)，它们是用于衡量预测值和目标值的差异的重要方程。其中，以损失函数举例，损失函数的输出值(loss)越高表示差异越大，那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如，可以使用误差均方作为损失函数，定义为

具体可以根据实际应用场景选择具体的损失函数。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible. The loss function can generally include loss functions such as mean square error, cross entropy, logarithm, and exponential. For example, the mean squared error can be used as a loss function, defined as

Specifically, a specific loss function can be selected according to the actual application scenario.

(5)反向传播算法(5) Back propagation algorithm

神经网络可以采用误差反向传播(back propagation，BP)算法在训练过程中修正初始的神经网络模型中参数的大小，使得神经网络模型的重建误差损失越来越小。具体地，前向传递输入信号直至输出会产生误差损失，通过反向传播误差损失信息来更新初始的神经网络模型中参数，从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动，旨在得到最优的神经网络模型的参数，例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

(6)感受野(Receptive Field)(6) Receptive Field

在计算机视觉领域的深度神经网络领域的一个术语，用来表示神经网络内部的不同位置的神经元对原图像的感受范围的大小。神经元感受野的值越大，表示其能接触到的原始图像范围就越大，也意味着该神经元可能蕴含更为全局、语义层次更高的特征；而值越小，则表示其包含的特征越趋向于局部和细节。感受野的值可以大致用来判断每一层的抽象层次。A term in the field of deep neural networks in the field of computer vision, used to indicate the size of the sensory range of the original image of neurons in different positions within the neural network. The larger the value of the neuron's receptive field, the larger the range of the original image that it can contact, which also means that the neuron may contain more global and semantic-level features; while the smaller the value, it means that it contains The features tend to be more local and detailed. The value of the receptive field can be roughly used to judge the abstraction level of each layer.

(7)图像(图像质量)增强(7) Image (image quality) enhancement

指对图像的亮度、颜色、对比度、饱和度、动态范围等进行处理，满足某种特定指标的技术，或者称为图像质量增强，相当于提高图像的质量，使图像更清晰。Refers to the technology of processing the brightness, color, contrast, saturation, dynamic range, etc. of the image to meet certain specific indicators, or called image quality enhancement, which is equivalent to improving the quality of the image and making the image clearer.

(8)去雾(8) De-fog

是指图像增强的一种，使存在一定模糊的图像更清晰。例如，可以在存在雾的环境中拍摄图像，此时拍摄到的图像可能不清晰，可以通过本申请提供的方法来进行去雾处理，提高图像的清晰度或者对比度，使图像更清晰。It refers to a kind of image enhancement, which makes the image with certain blurring clearer. For example, an image may be captured in a foggy environment, and the captured image may not be clear at this time. The method provided in this application can be used to perform dehazing processing to improve the clarity or contrast of the image and make the image clearer.

(9)RGB图像(9) RGB image

RGB图像是具有至少三个通道的图像，分别为红(red)、绿(green)和蓝(blue)三种通道。这三种颜色组成了视力所能感知的所有颜色，是运用最广的颜色系统之一。例如，一帧RGB图像是一个M*N*3的彩色像素数组，其中,M*N为图像的尺寸,每个彩色像素是一个三值组，这三个值分别对应红、绿和蓝分量。An RGB image is an image with at least three channels, namely red (red), green (green) and blue (blue) channels. These three colors make up all the colors that vision can perceive and are one of the most widely used color systems. For example, a frame of RGB image is an M*N*3 color pixel array, where M*N is the size of the image, and each color pixel is a three-value group, and the three values correspond to the red, green and blue components respectively. .

通常，CNN是一种常用的神经网络，如本申请以下所提及的用于进行特征提取的网络即可以是CNN或者其他包括卷积层的网络，为便于理解，下面示例性地，对卷积神经网络的结构进行介绍。Generally, CNN is a commonly used neural network. As mentioned below in this application, the network used for feature extraction can be CNN or other networks including convolutional layers. The structure of the product neural network is introduced.

下面结合图2示例性地对CNN的结构进行详细的介绍。如上文的基础概念介绍所述，卷积神经网络是一种带有卷积结构的深度神经网络，是一种深度学习(deep learning)架构，深度学习架构是指通过机器学习的算法，在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构，CNN是一种前馈(feed-forward)人工神经网络，该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。The structure of the CNN will be described in detail below by way of example in conjunction with FIG. 2 . As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. learning at multiple levels of abstraction. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.

如图2所示，卷积神经网络(CNN)200可以包括输入层210，卷积层/池化层220(其中池化层为可选的)，以及全连接层(fully connected layer)230。在本申请以下实施方式中，为便于理解，将每一层称为一个stage。下面对这些层的相关内容做详细介绍。As shown in FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a fully connected layer 230 . In the following embodiments of the present application, for ease of understanding, each layer is referred to as a stage. The relevant contents of these layers are described in detail below.

卷积层/池化层220：Convolutional layer/pooling layer 220:

如图2所示卷积层/池化层220可以包括如示例221-226层，举例来说：在一种实现方式中，221层为卷积层，222层为池化层，223层为卷积层，224层为池化层，225为卷积层，226为池化层；在另一种实现方式中，221、222为卷积层，223为池化层，224、225为卷积层，226为池化层。即卷积层的输出可以作为随后的池化层的输入，也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 2 , the convolutional layer/pooling layer 220 may include layers 221-226 as examples. For example, in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a Convolution layer, 224 layers are pooling layers, 225 are convolution layers, and 226 are pooling layers; in another implementation, 221 and 222 are convolution layers, 223 are pooling layers, and 224 and 225 are volumes Layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.

下面将以卷积层221为例，介绍一层卷积层的内部工作原理。The following will take the convolutional layer 221 as an example to introduce the inner working principle of a convolutional layer.

卷积层221可以包括很多个卷积算子，卷积算子也称为核，其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器，卷积算子本质上可以是一个权重矩阵，这个权重矩阵通常被预先定义，在对图像进行卷积操作的过程中，权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理，从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关，需要注意的是，权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的，在进行卷积运算的过程中，权重矩阵会延伸到输入图像的整个深度。因此，和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出，但是大多数情况下不使用单一权重矩阵，而是应用多个尺寸(行×列)相同的权重矩阵，即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度，这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征，例如一个权重矩阵用来提取图像边缘信息，另一个权重矩阵用来提取图像的特定颜色，又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同，经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同，再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。The convolution layer 221 may include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will result in a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied, That is, multiple isotype matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc. The multiple weight matrices have the same size (row×column), and the size of the feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted feature maps with the same size are combined to form a convolution operation. output.

这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到，通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息，从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .

当卷积神经网络200有多个卷积层的时候，初始的卷积层(例如221)往往提取较多的一般特征，该一般特征也可以称之为低级别的特征；随着卷积神经网络200深度的加深，越往后的卷积层(例如226)提取到的特征越来越复杂，比如高级别的语义之类的特征，语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (eg, 221 ) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 200 deepens, the features extracted by the later convolutional layers (eg, 226) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.

在本申请以下实施方式中，所提及的特征提取的过程，可以是通过卷积来提取特征的过程。如卷积层可以包括卷积核大小为3×3的分离卷积(sep_conv_3x3)、卷积核大小为5×5的分离卷积(sep_conv_5x5)、卷积核大小为3×3且空洞率为2的空洞卷积(dil_conv_3x3)、卷积核大小为5×5且空洞率为2的空洞卷积(dil_conv_5x5)等等。In the following embodiments of the present application, the feature extraction process mentioned may be a process of extracting features through convolution. For example, the convolution layer can include a separate convolution with a kernel size of 3×3 (sep_conv_3x3), a separate convolution with a kernel size of 5×5 (sep_conv_5x5), a kernel size of 3×3 and a dilation rate A dilated convolution of 2 (dil_conv_3x3), a dilated convolution with a kernel size of 5×5 and a dilation ratio of 2 (dil_conv_5x5), etc.

池化层：Pooling layer:

由于常常需要减少训练参数的数量，因此卷积层之后常常需要周期性的引入池化层，池化层也可以称为下采样层，可以用于对图像进行下采样。在如图2中220所示例的221-226各层，可以是一层卷积层后面跟一层池化层，也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中，池化层的目的是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子，以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外，就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样，池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸，池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since the number of training parameters often needs to be reduced, a pooling layer is often introduced periodically after the convolutional layer. The pooling layer can also be called a downsampling layer, which can be used to downsample images. In each layer 221-226 exemplified by 220 in Figure 2, it can be a convolutional layer followed by a pooling layer, or a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

全连接层230：Fully connected layer 230:

在经过卷积层/池化层220的处理后，卷积神经网络200还不足以输出所需要的输出信息。因为如前所述，卷积层/池化层220只会提取特征，并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息)，卷积神经网络200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此，在全连接层230中可以包括多层隐含层(如图2所示的231、232至23n)，该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到，例如该任务类型可以包括图像增强，图像识别，图像分类，图像超分辨率重建等。After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the fully connected layer 230 to generate one or a set of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2), and the parameters contained in the multiple hidden layers may be based on the relevant training data of specific task types Pre-trained, for example, the task type can include image enhancement, image recognition, image classification, image super-resolution reconstruction, etc.

在全连接层230中的多层隐含层之后，也就是整个卷积神经网络200的最后层为输出层240，该输出层240具有类似分类交叉熵的损失函数，具体用于计算预测误差，一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成，反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差，以减少卷积神经网络200的损失，及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation) is completed, the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

需要说明的是，如图2所示的卷积神经网络200仅作为一种卷积神经网络的示例，在具体的应用中，卷积神经网络还可以以其他网络模型的形式存在。例如，仅包括图2中所示的网络结构的一部分，比如，本申请实施例中所采用的卷积神经网络可以仅包括输入层210、卷积层/池化层220和输出层240。It should be noted that the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models. For example, only a part of the network structure shown in FIG. 2 is included, for example, the convolutional neural network adopted in this embodiment of the present application may only include the input layer 210 , the convolutional/pooling layer 220 and the output layer 240 .

本申请中，可以采用图2所示的卷积神经网络200对待处理图像进行处理，得到增强后的图像。如图2所示，待处理图像经过输入层210、卷积层/池化层220以及全连接层的处理后输出增强后更清晰、包含更多纹理信息的图像。In this application, the image to be processed can be processed by using the convolutional neural network 200 shown in FIG. 2 to obtain an enhanced image. As shown in FIG. 2 , the image to be processed is processed by the input layer 210 , the convolution layer/pooling layer 220 and the fully connected layer to output an enhanced image that is clearer and contains more texture information.

本申请实施例提供的用于图像处理的方法可以在服务器上被执行，还可以在终端设备上被执行，或者，本申请以下提及的神经网络，可以部署于服务器，也可以部署于终端上，具体可以根据实际应用场景调整。其中该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer，TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer，LC)、个人数字助理(personal digital assistant，PDA)、个人计算机(personal computer，PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device，WD)或者自动驾驶的车辆等，本申请实施例对此不作限定。The method for image processing provided by the embodiments of this application may be executed on a server or a terminal device, or the neural network mentioned below in this application may be deployed on a server or a terminal , which can be adjusted according to actual application scenarios. The terminal device may be a mobile phone with image processing function, tablet personal computer (TPC), media player, smart TV, laptop computer (LC), personal digital assistant (PDA) ), a personal computer (PC), a camera, a video camera, a smart watch, a wearable device (WD), an autonomous vehicle, etc., which are not limited in this embodiment of the present application.

如图3所示，本申请实施例提供了一种系统架构100。在图3中，数据采集设备160用于采集训练数据。在一些可选的实现中，针对于图像增强来说，训练数据中所包括的样本对可以包括画质较低的图像与清晰的图像，例如，一个样本对中可以包括雾天拍摄到的图像，以及经过大量处理的清晰的图像(或者称为真值图像)。As shown in FIG. 3 , an embodiment of the present application provides a system architecture 100 . In Figure 3, a data collection device 160 is used to collect training data. In some optional implementations, for image enhancement, the sample pair included in the training data may include a lower quality image and a clear image, for example, a sample pair may include an image captured in a foggy day , and a heavily processed sharp image (or the ground truth image).

在采集到训练数据之后，数据采集设备160将这些训练数据存入数据库130，训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。可选地，在本申请以下实施方式中所提及的训练集，可以是从该数据库130中得到，也可以是通过用户的输入数据得到。After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 . Optionally, the training set mentioned in the following embodiments of the present application may be obtained from the database 130 or obtained through user input data.

其中，目标模型/规则101可以为本申请实施例中进行训练后的神经网络。The target model/rule 101 may be the neural network after training in the embodiment of the present application.

下面对训练设备120基于训练数据得到目标模型/规则101进行描述，训练设备120对输入的原始图像进行处理，将输出的图像与原始图像进行对比，直到训练设备120输出的图像与原始图像的差值小于一定的阈值，从而完成目标模型/规则101的训练。The following describes how the training device 120 obtains the target model/rule 101 based on the training data. The training device 120 processes the input original image, and compares the output image with the original image until the image output by the training device 120 is the same as the original image. If the difference is smaller than a certain threshold, the training of the target model/rule 101 is completed.

上述目标模型/规则101能够用于实现本申请实施例的用于图像处理的方法训练得到的神经网络，即，将待处理数据(如图像)通过相关预处理后输入该目标模型/规则101，即可得到处理结果。本申请实施例中的目标模型/规则101具体可以为本申请以下所提及的第一神经网络，该第一神经网络可以是前述的CNN、DNN或者RNN等类型的神经网络。需要说明的是，在实际的应用中，所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集，也有可能是从其他设备接收得到的。另外需要说明的是，训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The above-mentioned target model/rule 101 can be used to realize the neural network obtained by training the method for image processing according to the embodiment of the present application, that is, input the target model/rule 101 into the target model/rule 101 after the data to be processed (such as an image) is subjected to relevant preprocessing, The processing result can be obtained. The target model/rule 101 in the embodiment of the present application may specifically be the first neural network mentioned below in the present application, and the first neural network may be the aforementioned CNN, DNN, or RNN and other types of neural networks. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application Limitations of Examples.

根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中，如应用于图3所示的执行设备110，该执行设备110也可以称为计算设备，所述执行设备110可以是终端，如手机终端，平板电脑，笔记本电脑，增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)，车载终端等，还可以是服务器或者云端设备等。在图5中，执行设备110配置输入/输出(input/output，I/O)接口112，用于与外部设备进行数据交互，用户可以通过客户设备140向I/O接口112输入数据，所述输入数据在本申请实施例中可以包括：客户设备输入的待处理数据。The target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 3 , the execution device 110 may also be referred to as a computing device. It may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (AR)/virtual reality (virtual reality, VR), a vehicle terminal, etc., or a server or a cloud device. In FIG. 5 , the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140 . In this embodiment of the present application, the input data may include: pending data input by the client device.

预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理数据)进行预处理，在本申请实施例中，也可以没有预处理模块113和预处理模块114(也可以只有其中的一个预处理模块)，而直接采用计算模块111对输入数据进行处理。The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112. In this embodiment of the present application, the preprocessing module 113 and the preprocessing module may also be absent. 114 (or only one of the preprocessing modules), and directly use the calculation module 111 to process the input data.

在执行设备110对输入数据进行预处理，或者在执行设备110的计算模块111执行计算等相关的处理过程中，执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .

最后，I/O接口112将处理结果，则将处理结果返回给客户设备140，从而提供给用户，例如若第一神经网络用于进行图像分类，处理结果为分类结果，则I/O接口112将上述得到的分类结果返回给客户设备140，从而提供给用户。Finally, the I/O interface 112 returns the processing result to the client device 140 so as to be provided to the user. For example, if the first neural network is used for image classification and the processing result is a classification result, the I/O interface 112 The classification result obtained above is returned to the client device 140 to be provided to the user.

需要说明的是，训练设备120可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型/规则101，该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。在一些场景中，执行设备110和训练设备120可以是相同的设备，或者位于相同的计算设备内部，为便于理解，本申请将执行设备和训练设备分别进行介绍，并不作为限定。It should be noted that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result. In some scenarios, the execution device 110 and the training device 120 may be the same device, or located inside the same computing device. For ease of understanding, the present application will introduce the execution device and the training device separately, without limitation.

在图3所示情况下，用户可以手动给定输入数据，该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下，客户设备140可以自动地向I/O接口112发送输入数据，如果要求客户设备140自动发送输入数据需要获得用户的授权，则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端，采集如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签作为新的样本数据，并存入数据库130。当然，也可以不经过客户设备140进行采集，而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签，作为新的样本数据存入数据库130。In the case shown in FIG. 3 , the user can manually give input data, and the manual setting can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also act as a data collection terminal to collect the input data input into the I/O interface 112 as shown in the figure and the predicted label output from the I/O interface 112 as new sample data, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but to use the I/O interface 112 to directly use the input data input into the I/O interface 112 and the predicted label output from the I/O interface 112 as shown in the figure as new samples The data is stored in database 130 .

需要说明的是，图3仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在图3中，数据存储系统150相对执行设备110是外部存储器，在其它情况下，也可以将数据存储系统150置于执行设备110中。It should be noted that FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3 , the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .

如图3所示，根据训练设备120训练得到目标模型/规则101，该目标模型/规则101在本申请实施例中可以是本申请中的神经网络，具体的，本申请实施例提供的神经网络可以CNN，深度卷积神经网络(deep convolutional neural networks,DCNN)，循环神经网络(recurrent neural network，RNN)等等。As shown in FIG. 3 , the target model/rule 101 is obtained by training according to the training device 120. In this embodiment of the present application, the target model/rule 101 may be the neural network in the present application. Specifically, the neural network provided in the embodiment of the present application is the neural network in the present application. It can be CNN, deep convolutional neural networks (DCNN), recurrent neural network (RNN) and so on.

参见附图4，本申请实施例还提供了一种系统架构400。执行设备110由一个或多个服务器实现(如服务器集群)，可选的，与其它计算设备配合，例如：数据存储、路由器、负载均衡器等设备；执行设备110可以布置在一个物理站点上，或者分布在多个物理站点上。执行设备110可以使用数据存储系统150中的数据，或者调用数据存储系统150中的程序代码实现本申请以下图7-图16对应的用于图像处理方法或者神经网络训练方法的步骤。Referring to FIG. 4 , an embodiment of the present application further provides a system architecture 400 . The execution device 110 is implemented by one or more servers (such as server clusters), and optionally, cooperates with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 110 can be arranged on a physical site, Or distributed over multiple physical sites. The execution device 110 can use the data in the data storage system 150 or call the program code in the data storage system 150 to implement the steps for the image processing method or the neural network training method corresponding to FIGS. 7-16 in the present application.

用户可以操作各自的用户设备(例如本地设备401和本地设备402)与执行设备110进行交互。每个本地设备可以表示任何计算设备，例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。A user may operate respective user devices (eg, local device 401 and local device 402 ) to interact with execution device 110 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.

每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备110进行交互，通信网络可以是广域网、局域网、点对点连接等方式，或它们的任意组合。具体地，该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于：第五代移动通信技术(5th-Generation，5G)系统，长期演进(long termevolution，LTE)系统、全球移动通信系统(global system for mobile communication，GSM)或码分多址(code division multiple access，CDMA)网络、宽带码分多址(widebandcode division multiple access，WCDMA)网络、无线保真(wireless fidelity，WiFi)、蓝牙(bluetooth)、紫蜂协议(Zigbee)、射频识别技术(radio frequency identification，RFID)、远程(Long Range，Lora)无线通信、近距离无线通信(near field communication，NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。Each user's local device may interact with the execution device 110 through any communication mechanism/standard communication network, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. Specifically, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like. The wireless network includes but is not limited to: 5th-Generation (5G) system, Long Term Evolution (LTE) system, Global System for Mobile Communication (GSM) or Code Division Multiplexing address (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), radio frequency identification Any one or a combination of any one or more of radio frequency identification (RFID), long range (Lora) wireless communication, and near field communication (NFC). The wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.

在另一种实现中，执行设备110的一个方面或多个方面可以由每个本地设备实现，例如，本地设备401可以为执行设备110提供本地数据或反馈计算结果。该本地设备也可以称为计算设备。In another implementation, one or more aspects of the execution device 110 may be implemented by each local device, for example, the local device 401 may provide the execution device 110 with local data or feedback calculation results. The local device may also be referred to as a computing device.

需要注意的，执行设备110的所有功能也可以由本地设备实现。例如，本地设备401实现执行设备110的功能并为用户提供服务，或者为本地设备402的用户提供服务，如提供图像优化的服务。It should be noted that all the functions of the execution device 110 can also be implemented by the local device. For example, the local device 401 implements the functions of the device 110 and provides services to the user, or provides services to the user of the local device 402, such as providing image optimization services.

通常，在一些存在雾气的场景或者光照强度较弱的场景下，拍摄到的图像可能存在模糊或者对比度低等情况，导致图像不清晰。例如，以去雾为例，早期的主流方法为传统方法，对光线传播建模估计，但人为设计的模型以及图片先验知识并不能准确应用到复杂的不同类型的真实图片中。在卷积神经网络应用于大量视觉任务中并取得效果的重大突破之后，基于学习的方法成为主流，替换传统方法中的一些模块为可学习的网络层。而现有的方法则丢弃掉光线估计等人为设计的模块，采用端到端的网络来解决去雾的任务，即图片变换的整个过程由网络学习，算法完全由数据驱动。当前取得SOTA(State-of-the-art)效果的端到端网络对计算量需求很大，而处理分辨率较高的图像需要耗费较多时间，无法实时处理一些分辨率较高的图像。Usually, in some foggy scenes or scenes with low light intensity, the captured images may be blurred or have low contrast, resulting in unclear images. For example, taking dehazing as an example, the early mainstream methods are traditional methods, which model and estimate light propagation, but artificially designed models and image prior knowledge cannot be accurately applied to complex and different types of real images. After a major breakthrough in the application of convolutional neural networks to a large number of vision tasks and achieved results, learning-based methods have become mainstream, replacing some modules in traditional methods with learnable network layers. However, the existing methods discard artificially designed modules such as light estimation, and use an end-to-end network to solve the task of dehazing, that is, the entire process of image transformation is learned by the network, and the algorithm is completely driven by data. The current end-to-end network that achieves the SOTA (State-of-the-art) effect requires a lot of computation, and it takes a lot of time to process images with higher resolutions and cannot process some images with higher resolutions in real time.

一些常用的去雾方式可以例如，使用U-Net作为网络的骨干网络，借鉴了去噪领域的一种方法来渐进地增强去雾效果；且提出特征融合模块，将不同尺度的特征融合在一起，克服原生U-Net丢失空间信息以及缺少非相邻level连接的不足。然而，此种方式的计算复杂度很高，处理图像耗费的时长较长，无法实现实时处理。Some common dehazing methods can be, for example, using U-Net as the backbone network of the network, borrowing a method in the denoising field to gradually enhance the dehazing effect; and proposing a feature fusion module to fuse features of different scales together , to overcome the shortage of native U-Net's loss of spatial information and lack of non-adjacent level connections. However, this method has high computational complexity and takes a long time to process images, so real-time processing cannot be achieved.

又例如，可以处理小分辨率图像，得到存储在双边网格中的转换系数，提取全分辨率图像的特征，得到引导图；在引导图的引导下对双边网格系数做上采样，得到像素变换矩阵，利用像素变换矩阵对输入图像的每个像素做变换，得到最终输出。而在得到引导图时，需要对图像的特征进行压缩，可能导致信息丢失，且对双边网格系数进行上采样的方式实现起来效率低，需要花费较长时长。For another example, a small resolution image can be processed to obtain the conversion coefficients stored in the bilateral grid, the features of the full resolution image can be extracted, and a guide map can be obtained; under the guidance of the guide map, the bilateral grid coefficients can be upsampled to obtain pixels. Transformation matrix, using the pixel transformation matrix to transform each pixel of the input image to get the final output. When obtaining the guide map, the features of the image need to be compressed, which may lead to information loss, and the method of upsampling the bilateral grid coefficients is inefficient to implement and takes a long time.

因此，本申请提供一种图像处理方法，通过轻量化的架构实现高效的图像增强，甚至实现实时的图像处理，提高图像增强的效率。Therefore, the present application provides an image processing method, which realizes efficient image enhancement through a lightweight architecture, and even realizes real-time image processing, thereby improving the efficiency of image enhancement.

首先，示例性地，对本申请应用的一些场景进行示例性介绍。First, exemplarily, some scenarios of the application of the present application are exemplarily introduced.

例如，如图5所示，可以采集各个监控设备采集到的低画质视频数据，并在存储器中存储该低画质视频数据，如智慧城市、户外场景监控、室内监控、车内监控、宠物监控、野外拍摄或者野外监控等。在播放该视频数据时，可以通过本申请提供的图像处理方法对该视频数据进行图像增强，如去雾或者提高对比度，从而得到清晰度更高的视频数据，提高用户的观看体验。For example, as shown in Figure 5, low-quality video data collected by various monitoring devices can be collected, and the low-quality video data can be stored in the memory, such as smart city, outdoor scene monitoring, indoor monitoring, in-vehicle monitoring, pet monitoring Surveillance, field shooting or field monitoring, etc. When playing the video data, image enhancement can be performed on the video data through the image processing method provided in the present application, such as defogging or increasing the contrast, so as to obtain video data with higher definition and improve the user's viewing experience.

又例如，本申请提供的图像处理方法可以应用于视频直播场景，如图6所示，服务器可以向用户使用的客户端发送视频流。当客户端接收到服务器发送的数据流之后，可以通过本申请提供的图像处理方法对对该数据流进行图像增强处理，从而得到清晰度更高的视频数据，提高用户的观看体验。For another example, the image processing method provided by the present application may be applied to a live video scenario. As shown in FIG. 6 , the server may send a video stream to the client used by the user. After receiving the data stream sent by the server, the client can perform image enhancement processing on the data stream by using the image processing method provided in the present application, thereby obtaining video data with higher definition and improving the user's viewing experience.

又例如，在自动驾驶场景中，可以通过智能车上设置的摄像头拍摄车辆周围的图像，并基于该图像为车辆规划出行驶路径或者作出行驶决策等。通过本申请提供的图像处理方法，可以对摄像头拍摄到的图像进行增强，得到更清晰的图像，尤其在雾天场景下，实现去雾效果，从而提高车辆的行车安全性，提高用户体验。For another example, in an autonomous driving scenario, a camera set on a smart car can capture images around the vehicle, and based on the images, a driving path can be planned for the vehicle or a driving decision can be made. With the image processing method provided in the present application, the image captured by the camera can be enhanced to obtain a clearer image, especially in foggy scenes, the fog removal effect can be achieved, thereby improving the driving safety of the vehicle and improving the user experience.

还例如，用户可以使用终端在存在雾气的环境下拍照，拍摄到的图像可能因雾气影响而不清晰，可以通过本申请提供的图像处理方法，对拍摄到的图像进行去雾处理，从而提高图像的清晰度，提高用户体验。Also, for example, a user can use a terminal to take pictures in a foggy environment, and the captured image may not be clear due to the influence of the fog. clarity and improve user experience.

还例如，在一些环境光线差异较大的场景中，用户拍摄到的图像可能因对比度较低而导致不清晰，此时可以通过本申请提供的图像处理方法，对拍摄到的图像进行处理，提高的图像的对比度，使图像更清晰，提高用户体验。Also for example, in some scenes with large differences in ambient light, the image captured by the user may be unclear due to low contrast. In this case, the captured image can be processed by using the image processing method provided in the The contrast of the image makes the image clearer and improves the user experience.

下面结合前述的场景和系统架构，对本申请提供的图像处理的方法进行详细介绍。The image processing method provided by the present application will be introduced in detail below with reference to the foregoing scenarios and system architecture.

并且，本申请提供的图像处理的方法，可以通过神经网络来实现，例如，可以将该神经网络部署于用户的终端上，终端可以运行该神经网络，从而实现本申请提供的图像处理的方法的步骤。In addition, the image processing method provided by the present application can be implemented by a neural network, for example, the neural network can be deployed on the user's terminal, and the terminal can run the neural network, thereby realizing the image processing method provided by the present application. step.

首先，参阅图7，本申请提供的一种图像处理的方法的流程示意图，如下所述。First, referring to FIG. 7 , a schematic flowchart of an image processing method provided by the present application is as follows.

701、获取输入图像。701. Obtain an input image.

该输入图像可以是由终端拍摄到的图像，也可以是接收到的图像，该输入图像具有多个通道。The input image may be an image captured by the terminal or a received image, and the input image has multiple channels.

该输入图像可以是由用户输入得到的，例如，参阅图3，用户可以通过客户设备140向执行设备110发送输入数据，该输入数据中即可携带该输入图像。The input image may be obtained by user input. For example, referring to FIG. 3 , the user may send input data to the execution device 110 through the client device 140 , and the input data may carry the input image.

或者，若本申请提供的图像处理方法由终端来执行，该输入图像还可以是终端拍摄到的图像。例如，该输入图像可以是终端在雾天场景下拍摄到的图像，需要进行去雾处理，提高图像的清晰度。Alternatively, if the image processing method provided by the present application is executed by a terminal, the input image may also be an image captured by the terminal. For example, the input image may be an image captured by the terminal in a foggy scene, and needs to be dehazed to improve the clarity of the image.

702、分别从输入图像的多个通道的信息中提取特征，得到多个引导图。702. Extract features from the information of multiple channels of the input image to obtain multiple guide maps.

其中，输入图像具有多个通道，本申请可以分别从每个通道的维度来提取特征，得到每个通道维度的特征图，即引导图，每个通道可以对应一个特征图，即得到多个引导图。该多个引导图后续可以用于引导对双边网格数据的上采样，相当于起引导上采样的作用。Among them, the input image has multiple channels, and the present application can extract features from the dimensions of each channel to obtain a feature map of each channel dimension, that is, a guide map, and each channel can correspond to a feature map, that is, multiple guide maps can be obtained. picture. The multiple guide maps can subsequently be used to guide the upsampling of the bilateral grid data, which is equivalent to guiding the upsampling.

例如，输入图像中可以包括三个通道的信息，分别从每个通道的信息中提取特征，即可得到每个通道对应的特征，得到每个通道对应的引导图。For example, the input image may include information of three channels, and features are extracted from the information of each channel, respectively, to obtain the corresponding features of each channel, and obtain the guide map corresponding to each channel.

703、获取输入图像对应的双边网格数据。703. Acquire bilateral grid data corresponding to the input image.

其中，该双边网格数据中可以包括空间维度的信息和亮度维度的信息形成的数据，换言之即该双边网格数据包括在预设空间中排列的亮度维度的信息形成的数据。可以理解为，该双边网格数据可以包括至少三个维度的数据。Wherein, the bilateral grid data may include data formed by spatial dimension information and luminance dimension information, in other words, the bilateral grid data includes data formed by luminance dimension information arranged in a preset space. It can be understood that the bilateral grid data may include data of at least three dimensions.

具体地，空间维度可以是预先设定的，也可以是根据输入图像的尺寸确定的，该空间维度包括至少两个维度，即该空间维度可以包括二维空间，也可以包括三维空间等。Specifically, the spatial dimension may be preset or determined according to the size of the input image, and the spatial dimension includes at least two dimensions, that is, the spatial dimension may include a two-dimensional space, a three-dimensional space, and the like.

例如，该空间维度和输入图像的尺寸对应，空间维度中每个位置和输入图像中的一个或者多个像素点对应，相应地，从输入图像中提取到特征之后，可以对该特征进行处理，处理后得到的信息与输入图像的亮度相关，将该特征分配至空间维度，即可得到双边网格数据。For example, the spatial dimension corresponds to the size of the input image, and each position in the spatial dimension corresponds to one or more pixels in the input image. Accordingly, after the feature is extracted from the input image, the feature can be processed, The information obtained after processing is related to the brightness of the input image, and the feature is allocated to the spatial dimension to obtain bilateral grid data.

一种可能的实施方式中，对输入图像进行下采样，得到下采样图像；从下采样图像中提取特征，得到下采样特征，双边网格数据中即可包括该下采样特征。下采样的方式可以包括多种，如双线性插值或者双立方插值等。In a possible implementation manner, the input image is down-sampled to obtain the down-sampled image; features are extracted from the down-sampled image to obtain the down-sampled features, and the bilateral grid data may include the down-sampled features. The downsampling method may include various methods, such as bilinear interpolation or bicubic interpolation.

可以理解为，首先对输入图像进行下采样，降低图像的分辨率，且可以降低后续的计算复杂度，帮助实现轻量化的图像增强；然后从下采样图像中提取特征，并对提取到的特征进行平均池化处理，即可得到与输入图像的亮度相关的信息，即下采样特征，将该下采样特征映射至空间维度，即可得到双边网格数据。It can be understood that the input image is first downsampled to reduce the resolution of the image, and the subsequent computational complexity can be reduced to help achieve lightweight image enhancement; then features are extracted from the downsampled image, and the extracted features By performing the average pooling process, the information related to the brightness of the input image, that is, the down-sampling feature, can be obtained by mapping the down-sampling feature to the spatial dimension to obtain bilateral grid data.

其中，双边网格数据中的空间维度可以是预先设定的维度，也可以是根据输入图像的尺寸确定的维度。例如，可以预先设置一个尺寸，相当于双边网格数据的空间维度的尺寸，下采样图像中的像素点可以与该尺寸具有对应关系，例如，该尺寸可以是100*100，下采样图像的分辨率为200*200，则可以将下采样图像中的每4个像素点与该尺寸中的一个像素点对应起来，在从下采样图像中提取到下采样特征之后，即按照下采样图像中的像素点和该尺寸之间的对应关系，将下采样特征分配至该尺寸中对应的像素点中，即可得到双边网格数据。The spatial dimension in the bilateral grid data may be a preset dimension, or may be a dimension determined according to the size of the input image. For example, a size can be preset, which is equivalent to the size of the spatial dimension of the bilateral grid data, and the pixels in the down-sampled image can have a corresponding relationship with this size. For example, the size can be 100*100, and the resolution of the down-sampled image can be If the ratio is 200*200, then every 4 pixels in the down-sampled image can be corresponded to one pixel in the size, and after the down-sampled features are extracted from the down-sampled image, that is, according to the down-sampled image The corresponding relationship between the pixel points and the size, the downsampling feature is allocated to the corresponding pixel points in the size, and the bilateral grid data can be obtained.

需要说明的是，本申请对步骤702和步骤703的执行顺序不作限定，可以先执行步骤702，也可以先执行步骤703，还可以同时执行步骤702和步骤703，具体可以根据实际应用场景调整。It should be noted that this application does not limit the execution order of step 702 and step 703. Step 702 may be executed first, or step 703 may be executed first, or step 702 and step 703 may be executed simultaneously, which may be adjusted according to actual application scenarios.

704、分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，得到多个特征图。704. Using each guide map in the plurality of guide maps as a guide condition, perform up-sampling on the bilateral grid data to obtain multiple feature maps.

在得到多个引导图之后，即可在该各个通道对应的引导图的分别引导下，对双边网格数据针对各个通道进行上采样，得到多个通道对应的多个特征图。其中，每个引导图可以用于在上采样时，引导从双边网格数据中的亮度维度的信息中选取与对应的通道相关的信息来进行上采样，从而得到每个通道对应的特征图。After multiple guide maps are obtained, the bilateral grid data can be up-sampled for each channel under the guidance of the guide maps corresponding to each channel, to obtain multiple feature maps corresponding to the multiple channels. Wherein, each guide map can be used to guide the selection of information related to the corresponding channel from the information of the brightness dimension in the bilateral grid data for up-sampling during up-sampling, so as to obtain a feature map corresponding to each channel.

其中，上采样的方式可以采用各种插值算法，如双线性插值、双立方插值或者三线性插值法等，从而得到分辨率高于双边网格数据的特征图。Among them, various interpolation algorithms, such as bilinear interpolation, bicubic interpolation, or trilinear interpolation, can be used for the up-sampling method, so as to obtain feature maps with higher resolution than bilateral grid data.

可选地，以多个引导图其中一个引导图(称为第一引导图)的处理过程为例，使用第一引导图作为引导条件，对双边网格数据进行上采样，得到上采样特征，然后融合上采样特征和输入图像，即可得到第一特征图，即多个特征图中的其中一个特征图。融合方式可以包括但不限于：相乘、拼接或者加权融合等。Optionally, taking the processing process of one guide map (referred to as the first guide map) among the multiple guide maps as an example, using the first guide map as the guide condition, up-sampling the bilateral grid data to obtain the up-sampling feature, Then, the up-sampled features and the input image are fused to obtain the first feature map, that is, one of the multiple feature maps. The fusion method may include, but is not limited to, multiplication, splicing, or weighted fusion.

例如，可以对双边网格数据中的亮度维度的信息作为系数，通过插值的方式进行上采样，其中，双边网格数据中空间维度上的每个点可能对应多个系数值，在进行上采样时，在引导图的引导下，从每个点或者每个几个点对应的多个系数中选择其中的一个或者多个来进行插值，得到与引导图尺寸相同的上采样特征。引导图的具体引导方式可以包括：引导图中的每个像素点对应的值可以理解为亮度等级，引导图中的每个点和双边网格数据中空间维度的点具有对应关系，可以根据引导图中每个点的亮度等级从双边网格数据中每个点或者每几个点对应的多个系数中选择与该亮度等级匹配的系数，以进行插值，得到上采样特征。For example, the information of the luminance dimension in the bilateral grid data can be used as a coefficient, and up-sampling can be performed by means of interpolation. When , under the guidance of the guide map, one or more of the coefficients corresponding to each point or several points are selected for interpolation to obtain up-sampling features with the same size as the guide map. The specific guidance method of the guide map may include: the value corresponding to each pixel in the guide map can be understood as a brightness level, and each point in the guide map has a corresponding relationship with the point of the spatial dimension in the bilateral grid data, and can be based on the guide map. In the brightness level of each point in the figure, a coefficient matching the brightness level is selected from the multiple coefficients corresponding to each point or every few points in the bilateral grid data to perform interpolation to obtain the up-sampling feature.

因此，本申请实施方式中，可以以引导图作为引导来对双边网格数据进行上采样，从而得到分辨率更高的上采样后的特征，且本申请通过每个通道的引导图来引导双边网格数据的上采样，可以针对每个通道得到特征图，以使最终得到的输出图像在每个通道可以表现更好，更清晰。Therefore, in the embodiments of the present application, the bilateral grid data can be upsampled by using the guide map as a guide, so as to obtain up-sampled features with higher resolution, and the present application uses the guide map of each channel to guide the bilateral grid data. Upsampling of grid data can obtain feature maps for each channel, so that the final output image can perform better and clearer in each channel.

可选地，融合上采样特征和输入图像的方式可以包括：对上采样进行压缩，得到压缩特征。然后融合压缩特征和输入图像，实现特征重构，得到对应的其中一个特征图。具体的融合方式可以包括逐项乘积、或者加权融合等。该压缩即减少上采样特征的通道数量，使压缩特征的通道数量少于上采样特征的通道数量，从而减少数据量。其中，压缩的具体方式可以采用卷积的方式来进行压缩，相当于从上采样特征中进行进一步的特征提取，可以通过调整卷积核的数量的方式来进行压缩，得到数据量更小的压缩特征。Optionally, the manner of fusing the upsampling feature and the input image may include: compressing the upsampling to obtain the compressed feature. Then, the compressed feature and the input image are fused to realize feature reconstruction, and one of the corresponding feature maps is obtained. A specific fusion manner may include item-by-item product, or weighted fusion, and the like. The compression reduces the number of channels of the upsampling feature, so that the number of channels of the compressed feature is less than the number of channels of the upsampling feature, thereby reducing the amount of data. Among them, the specific method of compression can be compressed by convolution, which is equivalent to further feature extraction from up-sampling features. It can be compressed by adjusting the number of convolution kernels to obtain compression with a smaller amount of data. feature.

因此，本申请实施方式中，可以对上采样特征进行压缩，并融合压缩特征和输入图像，通过融合压缩特征和输入图像来实现特征重构，得到能更准确表征输入图像中的实例在各个通道的特征。Therefore, in the embodiment of the present application, the upsampling feature can be compressed, and the compressed feature and the input image can be fused, and the feature reconstruction can be realized by fusing the compressed feature and the input image, so as to obtain an instance in each channel that can more accurately characterize the input image. Characteristics.

705、融合多个特征图，得到输出图像。705. Fusing multiple feature maps to obtain an output image.

在得到多个特征图之后，即可融合该多个特征图，相当于恢复各个通道，得到多通道的输出图像。输出图像的通道数量和输入图像的通道数量通常是相同的。After multiple feature maps are obtained, the multiple feature maps can be fused, which is equivalent to restoring each channel to obtain a multi-channel output image. The number of channels of the output image and the number of channels of the input image are usually the same.

因此，本申请实施方式中，可以从输入图像的多个通道的维度来提取特征，得到多个引导图，并在多个引导图的引导下分别对低分辨率的双边网格数据进行上采样得到多个特征图，并融合多个特征图和输出图像，相当于通过对低分辨率的双边网格数据进行上采样的方式来平滑输入图像中的噪声，提高输入图像的清晰度，实现输入图像的去雾效果或者提升输入图像的对比度，得到更清晰的输出图像。并且，本申请通过对低分辨率的双边网格数据进行上采样来实现图像增强，因双边网格数据的分辨率较低，所消耗的计算量也就越少，从而实现轻量化的图像增强。Therefore, in the embodiment of the present application, features can be extracted from the dimensions of multiple channels of the input image to obtain multiple guide maps, and under the guidance of the multiple guide maps, the low-resolution bilateral grid data are respectively up-sampled Obtaining multiple feature maps and fusing multiple feature maps with the output image is equivalent to smoothing the noise in the input image by upsampling the low-resolution bilateral grid data, improving the clarity of the input image, and realizing the input The dehazing effect of the image or enhance the contrast of the input image to get a clearer output image. In addition, the present application implements image enhancement by upsampling the low-resolution bilateral grid data. Since the resolution of the bilateral grid data is lower, the amount of computation consumed is less, thereby realizing lightweight image enhancement. .

可选地，融合多个特征图的具体过程可以包括：拼接多个特征图，得到拼接图像，然后对拼接图像进行至少一次特征提取，当存在多个特征提取时，可以是迭代提取，即从上一次提取到的特征中提取当前次的特征，得到至少一个第一特征，融合该至少一个特征和输入图像，即可得到输出图像。因此，本申请实施方式中，可以通过拼接的方式来融合多个特征图，因特征图是在每个通道的引导图的引导下对双边网格数据进行上采样得到，此步骤中通过拼接多个特征图，可以恢复拼接图像的通道，使拼接图像具有多个通道，从而与输入图像融合得到多个通道的输出图像。相当于从各个通道的维度对输入图像进行了增强，得到增强后的输出图像，提高图像的清晰度，提高用户体验。Optionally, the specific process of fusing multiple feature maps may include: splicing multiple feature maps to obtain a spliced image, and then performing feature extraction on the spliced image at least once. The current feature is extracted from the features extracted last time to obtain at least one first feature, and the output image can be obtained by fusing the at least one feature with the input image. Therefore, in the embodiment of the present application, multiple feature maps can be fused by splicing, because the feature maps are obtained by up-sampling the bilateral grid data under the guidance of the guide map of each channel. A feature map can restore the channels of the spliced image, so that the spliced image has multiple channels, so that it can be fused with the input image to obtain an output image of multiple channels. It is equivalent to enhancing the input image from the dimension of each channel, and obtaining the enhanced output image, which improves the clarity of the image and improves the user experience.

可选地，在拼接多个特征图时，还可以对该多个特征图和输入图像进行拼接，得到拼接图像。从而可以通过输入图像中包括的细节，对多个特征图的细节进行补充，如颜色或者亮度等低频信息，使拼接图像的细节更丰富，避免因前述步骤导致的过度平滑，进而提高输出图像的清晰度。Optionally, when splicing multiple feature maps, the multiple feature maps and the input image can also be spliced to obtain a spliced image. In this way, the details of multiple feature maps can be supplemented by the details included in the input image, such as low-frequency information such as color or brightness, so as to enrich the details of the stitched image, avoid excessive smoothing caused by the previous steps, and improve the output image quality. clarity.

前述对本申请提供的图像处理方法的流程进行了详细介绍，为便于理解，下面以具体的应用场景为例，对本申请提供的图像处理方法的流程进行更详细介绍。The flow of the image processing method provided by the present application has been described in detail above. For ease of understanding, the following takes a specific application scenario as an example to describe the flow of the image processing method provided by the present application in more detail.

参阅图8，本申请提供的另一种图像处理方法的流程示意图。Referring to FIG. 8 , a schematic flowchart of another image processing method provided by the present application.

其中，为便于理解，将本申请图像的图像处理方法划分为多个步骤来进行介绍，如图8中所示出的双边网格生成801、引导图生成802、特征重建803和图像重建804等。For ease of understanding, the image processing method for images of the present application is divided into multiple steps for introduction, such as bilateral grid generation 801 , guidance map generation 802 , feature reconstruction 803 , and image reconstruction 804 as shown in FIG. 8 , etc. .

其中，双边网格生成801：可以对全分辨率的输入图像I进行下采样，得到低分辨率图像

然后基于下采样得到的图像

生成双边网格数据g，该双边网格数据包括空间维度和亮度相关的信息形成得到至少三维的数据。Among them, the bilateral grid generation 801: the full-resolution input image I can be down-sampled to obtain a low-resolution image

Then based on the downsampled image

The bilateral grid data g is generated, and the bilateral grid data includes spatial dimension and luminance-related information to form at least three-dimensional data.

引导图生成802：基于全分辨率的输入图像I的每个通道来提取特征，得到每个通道对应的引导图，如图8中所述示出的引导图G₁/G₂/G₃。Guide map generation 802: Extract features based on each channel of the full-resolution input image I, and obtain a guide map corresponding to each channel, such as the guide map G ₁ /G ₂ /G ₃ shown in FIG. 8 .

特征重建803：针对每个通道，在对应的以引导图的引导下，对双边网格进行上采样，得到每个通道对应的特征图F₁/F₂/G₃。Feature reconstruction 803: For each channel, under the guidance of the corresponding guide map, upsample the bilateral grid to obtain the feature map F ₁ /F ₂ /G ₃ corresponding to each channel.

图像重建804：对每个通道对应的特征图进行融合，并将融合后的特征与输入图像融合，得到输出图像O。可选地，在融合特征图时，还可以融合输入图像，以将输入图像中所包括的细节(如颜色或者亮度等低频特征)与各个通道的特征相融合，得到细节更丰富的融合后的图像。Image reconstruction 804: fuse the feature maps corresponding to each channel, and fuse the fused features with the input image to obtain the output image O. Optionally, when fusing the feature maps, the input image can also be fused to fuse the details (such as low-frequency features such as color or brightness) included in the input image with the features of each channel to obtain a fused image with richer details. image.

本申请实施方式中，可以在各个通道的引导图的引导下，基于双边网格数据进行特征重建，从而针对输入图像的各个通道重建到更准确清晰的特征，并进一步与输入图像融合，可以平滑输入图像中的噪声，使输入图像更清晰，实现去雾或者提高对比度的效果。In the embodiment of the present application, under the guidance of the guide map of each channel, feature reconstruction can be performed based on bilateral grid data, so that more accurate and clear features can be reconstructed for each channel of the input image, and further fused with the input image, which can smooth The noise in the input image makes the input image clearer, and achieves the effect of dehazing or improving contrast.

进一步地，下面对各个步骤进行更详细地描述。Further, each step is described in more detail below.

一、双边网格生成1. Bilateral grid generation

其中，对全分辨率的输入图像I进行下采样得到低分辨率图像

即下采样图像，下采样的方式可以包括多种，如双线性插值、三线性插值或者双立方插值等。然后从低分辨率图像

中提取特征，得到下采样特征，并将下采样特征分配至对应的空间中，即可得到双边网格数据。该空间可以是预先设置的，也可以是根据输入图像的尺寸得到的空间，如基于输入图像的尺寸进行缩小，得到二维空间。Among them, the input image I of full resolution is down-sampled to obtain a low-resolution image

That is, the image is down-sampled, and the down-sampling manner may include various methods, such as bilinear interpolation, trilinear interpolation, or bicubic interpolation. then from a low resolution image

The features are extracted from the data, and the down-sampling features are obtained, and the down-sampling features are allocated to the corresponding space, and the bilateral grid data can be obtained. The space may be preset, or may be a space obtained according to the size of the input image, such as reducing the size of the input image to obtain a two-dimensional space.

例如，生成双边网格数据的可以参阅图9，在得到低分辨率图像

之后，将该低分辨率图像

作为多层卷积的输入，提取到的特征可以用于表示低分辨率图像

的亮度，然后通过平均池化对提取到的特征进行池化处理，得到下采样特征。可以预先设定一个二维空间，该二维空间和输入图像或者低分辨率图像

的像素点具有映射关系，如输入图像中的一个或者多个像素点对应该二维空间中的一个点。然后将下采样特征映射至二维空间中的每个像素点，即可得到双边网格数据。即网格数据包括至少三维数据，其中包括二维空间，以及与输入图像的亮度相关的信息等，与输入图像的亮度相关的信息可以通过下采样特征来表示。For example, for generating bilateral grid data, see Figure 9, after obtaining low-resolution images

After that, this low resolution image

As the input of multi-layer convolution, the extracted features can be used to represent low-resolution images

Then, the extracted features are pooled through average pooling to obtain down-sampled features. A two-dimensional space can be preset, the two-dimensional space and the input image or low-resolution image

The pixels have a mapping relationship, such as one or more pixels in the input image correspond to a point in the two-dimensional space. Then the downsampled features are mapped to each pixel in the two-dimensional space, and the bilateral grid data can be obtained. That is, the grid data includes at least three-dimensional data, including a two-dimensional space, and information related to the brightness of the input image, etc. The information related to the brightness of the input image can be represented by down-sampling features.

二、引导图生成Second, guide map generation

其中，输入图像I可以包括多个通道的信息，如RGB图像包括三个通道的信息。可以从输入图像I的每个通道中提取信息，得到每个通道对应的特征图，并将每个通道对应的特征图作为引导图，以用于引导后续对双边网格数据的上采样。The input image I may include information of multiple channels, for example, an RGB image includes information of three channels. Information can be extracted from each channel of the input image I to obtain a feature map corresponding to each channel, and the feature map corresponding to each channel can be used as a guide map to guide subsequent upsampling of bilateral grid data.

示例性地，生成引导图的方式可以参阅图10，输入图像I具有三个通道，如图10中所示出的通道1、通道2和通道3，如RGB通道可以分为R、G、B三个通道。然后对每个通道进行特征提取，得到每个通道对应的特征图，如图10中所示的特征图G₁、G₂和G₃。Exemplarily, please refer to FIG. 10 for the way of generating the guide map. The input image I has three channels, such as channel 1, channel 2 and channel 3 as shown in FIG. 10 . For example, RGB channels can be divided into R, G, B three channels. Then perform feature extraction on each channel to obtain a feature map corresponding to each channel, such as feature maps G ₁ , G ₂ and G ₃ as shown in FIG. 10 .

具体例如，如图11所示，特征提取可以通过多层卷积来实现，如图11中所示，可以将一层或多层卷积层和一层或多层参数化修正线性单元(parametric rectified linearunit，PReLU)层作为特征提取网络，将输入图像的每个通道作为特征提取网络的输入，输出每个通道的特征图，即特征图G₁、G₂和G₃。Specifically, for example, as shown in FIG. 11, feature extraction can be achieved by multi-layer convolution. As shown in FIG. 11, one or more layers of convolution layers and one or more layers of parametric linear units can be modified. The rectified linear unit (PReLU) layer is used as a feature extraction network, and each channel of the input image is used as the input of the feature extraction network, and the feature map of each channel is output, that is, the feature maps G ₁ , G ₂ and G ₃ .

因此，本申请实施方式中，分别对每个通道进行了特征提取，得到了多个引导图，可以提取到输入图像中更多的细节，减少细节丢失。Therefore, in the embodiment of the present application, feature extraction is performed for each channel respectively, and multiple guide maps are obtained, which can extract more details in the input image and reduce the loss of details.

三、特征重建3. Feature reconstruction

示例性地，其中一个通道的特征重建的方式可以如图12所示，在得到引导图和双边网格数据之后，在引导图的引导下，对双边网格数据进行上采样，上采样的方式包括但不限于双线性插值、双立方插值或者三线性插值等方式，得到上采样特征U，然后对上采样特征U进行特征压缩，得到压缩特征C，然后基于输入图像I和压缩特征C来进行特征重构，如对输入图像I和压缩特征C进行逐项乘积，得到当前通道对应的特征图F。例如，输入图像I的分辨率可以是100*100，压缩特征C的分辨率也可以是100*100，输入图像I的像素点和压缩特征C的像素点一一对应，可以将输入图像I中的每个像素点的值和压缩特征C中对应的像素点的值进行相乘，得到特征图F。当然，除了逐项乘积，还可以是加权融合或者拼接等方式来实现特征重构，本申请示例性地以相乘为例进行示例性介绍，并不作为限定，具体的特征重构方式可以根据实际应用场景调整。Exemplarily, the feature reconstruction method of one of the channels can be as shown in FIG. 12. After obtaining the guide map and bilateral grid data, under the guidance of the guide map, the bilateral grid data is upsampled, and the method of upsampling is performed. Including but not limited to bilinear interpolation, bicubic interpolation or trilinear interpolation, etc., to obtain the up-sampling feature U, and then perform feature compression on the up-sampling feature U to obtain the compressed feature C, and then based on the input image I and the compressed feature C. Perform feature reconstruction, such as item-by-item product of the input image I and the compressed feature C, to obtain the feature map F corresponding to the current channel. For example, the resolution of the input image I may be 100*100, and the resolution of the compression feature C may also be 100*100. The pixels of the input image I correspond to the pixels of the compression feature C one-to-one. The value of each pixel of , and the value of the corresponding pixel in the compressed feature C are multiplied to obtain the feature map F. Of course, in addition to item-by-item multiplication, feature reconstruction can also be achieved by weighted fusion or splicing. This application exemplifies multiplication as an example for an exemplary introduction, which is not intended to be a limitation. The specific feature reconstruction method can be based on Actual application scene adjustment.

具体例如，如图13所示，将特征图G₁、G₂和G₃分别作为引导图，引导对双边网格g进行上采样，得到上采样特征U，然后通过卷积来减少上采样特征U的通道数量，得到压缩特征C。然后将压缩特征C和输入图像I相乘，即可得到特征图F。For example, as shown in Figure 13, the feature maps G ₁ , G ₂ and G ₃ are respectively used as guide maps to guide the up-sampling of the bilateral grid g to obtain the up-sampling feature U, and then reduce the up-sampling feature through convolution The number of channels of U to get the compressed feature C. Then, the feature map F can be obtained by multiplying the compressed feature C and the input image I.

因此，本申请实施方式中，可以通过将压缩特征和输入图像相乘的方式来进行特征重构，有利于模型的训练收敛，且计算量少，有利于实现轻量化的图像增强。Therefore, in the embodiment of the present application, feature reconstruction can be performed by multiplying the compressed feature and the input image, which is beneficial to the training convergence of the model, and the amount of calculation is small, which is beneficial to realize light-weight image enhancement.

四、图像重建4. Image reconstruction

其中，以输入图像具有三个通道为例，在前述进行了特征重建之后，即可得到特征图F₁、F₂和F₃，可以将该特征图F₁、F₂和F₃拼接起来，得到拼接图像。然后从对该拼接图像进行至少一次迭代特征提取，得到至少一个特征。将该至少一个特征拼接起来与输入图像进行融合，即可得到最终的输出图像。Among them, taking the input image with three channels as an example, after the feature reconstruction is performed as described above, the feature maps F ₁ , F ₂ and F ₃ can be obtained, and the feature maps F ₁ , F ₂ and F ₃ can be spliced together, Get the stitched image. Then at least one iterative feature extraction is performed from the stitched image to obtain at least one feature. The at least one feature is spliced together and fused with the input image to obtain the final output image.

具体例如，图像重建的方式可以如图14所示，首先将特征图F₁、F₂和F₃拼接起来，得到拼接图像C₁，可选地，在拼接特征图F₁、F₂和F₃时，还可以同时将输入图像拼接进来，使拼接图像中所包括的信息更丰富，避免细节丢失。对C₁进行至少一次迭代特征提取，并记载每次特征提取得到的特征，每次特征提取可以通过一个或者多个卷积堆叠形成的基本单元(block)来实现，每个block可以输出一个特征图。将提取到的至少一个特征进行拼接，得到拼接特征C₂，然后通过一个block融合拼接特征C₂得到融合特征M，将融合特征M和输入图像进行融合，得到全分辨率的增强后的输出图像O。融合特征M和输入图像的方式可以是加权融合，也可以是逐项乘积，具体可以根据实际应用场景调整。Specifically, for example, the image reconstruction method can be shown in Fig. 14. First, the feature maps F ₁ , F ₂ and F ₃ are spliced together to obtain a spliced image C ₁ . Optionally, after splicing the feature maps F ₁ , F ₂ and F ₃ , the input image can also be spliced in at the same time, so that the information included in the spliced image is richer and the loss of details is avoided. Perform at least one iterative feature extraction on _C1 , and record the features obtained by each feature extraction. Each feature extraction can be realized by one or more basic units (blocks) formed by convolution stacking, and each block can output one feature picture. Splicing at least one extracted feature to obtain a splicing feature C ₂ , and then merging the splicing feature C ₂ through a block to obtain a fused feature M, and merging the fused feature M with the input image to obtain a full-resolution enhanced output image O. The method of fusing the feature M and the input image can be weighted fusion or item-by-item product, which can be adjusted according to the actual application scenario.

因此，本申请实施方式中，可以针对每个通道的特征图进行拼接，从而恢复图像的各个通道。且在拼接时，可以拼接输入图像来补充特征图中的细节，使得到的拼接图像细节更丰富。并且，通过相乘的方式来融合特征M和输入图像，可以实现对输入图像的去雾，得到增强后的图像。Therefore, in the embodiment of the present application, the feature maps of each channel can be stitched to restore each channel of the image. And when splicing, the input image can be spliced to supplement the details in the feature map, so that the obtained spliced image has more details. Moreover, by fusing the feature M and the input image by means of multiplication, the dehazing of the input image can be realized, and an enhanced image can be obtained.

前述对本申请提供的图像处理方法的流程进行了详细介绍，下面对本申请提供的执行该图像处理方法的神经网络以及该神经网络的训练方法进行补充描述，该神经网络可以用于执行前述图7-14对应的方法的步骤。The flow of the image processing method provided by the present application has been introduced in detail. The following supplementary description is given to the neural network provided by the present application for executing the image processing method and the training method of the neural network. The neural network can be used to perform the aforementioned Fig. 7- 14 corresponds to the steps of the method.

示例性地，参阅图15，本申请提供的一种神经网络的结构示意图。Exemplarily, refer to FIG. 15 , which is a schematic structural diagram of a neural network provided by the present application.

与前述图8对应的，可以分为多个模块，如图15中所示的双边网格生成网络、引导图生成网络、特征重建网络和图像重建网络等。Corresponding to the aforementioned FIG. 8 , it can be divided into multiple modules, such as the bilateral grid generation network, the guidance map generation network, the feature reconstruction network, and the image reconstruction network as shown in FIG. 15 .

其中，双边网格生成网络可以用于：对全分辨率的输入图像I进行下采样，得到低分辨率图像

然后基于下采样得到的图像

生成双边网格数据，该双边网格数据包括空间维度和亮度相关的信息形成得到至少三维的数据。Among them, the bilateral grid generation network can be used to: downsample the full-resolution input image I to obtain a low-resolution image

Then based on the downsampled image

Bilateral grid data is generated, and the bilateral grid data includes spatial dimension and luminance-related information to form at least three-dimensional data.

引导图生成网络用于：基于全分辨率的输入图像I的每个通道来提取特征，得到每个通道对应的引导图，如图8中所述示出的引导图G₁/G₂/G₃。The guide map generation network is used to: extract features based on each channel of the full-resolution input image I, and obtain a guide map corresponding to each channel, such as the guide map G ₁ /G ₂ /G shown in FIG. 8 . ₃ .

特征重建网络用于：针对每个通道，在对应的以引导图的引导下，对双边网格进行上采样，得到每个通道对应的特征图F₁/F₂/G₃。The feature reconstruction network is used for: for each channel, under the guidance of the corresponding guide map, up-sampling the bilateral grid to obtain the feature map F ₁ /F ₂ /G ₃ corresponding to each channel.

图像重建网络用于：对每个通道对应的特征图进行融合，并将融合后的特征与输入图像融合，得到输出图像O。可选地，在融合特征图时，还可以融合输入图像，以将输入图像中所包括的细节与各个通道的特征相融合，得到细节更丰富的融合后的图像。The image reconstruction network is used to fuse the feature maps corresponding to each channel, and fuse the fused features with the input image to obtain the output image O. Optionally, when fusing the feature maps, the input image may also be fused, so as to fuse the details included in the input image with the features of each channel to obtain a fused image with more details.

基于该神经网络的结构，下面对该神经网络的训练方法进行介绍。Based on the structure of the neural network, the training method of the neural network is introduced below.

参阅图16，本申请提供的一种神经网络训练方法的流程示意图，如下所述。Referring to FIG. 16 , a schematic flowchart of a neural network training method provided by the present application is as follows.

1601、获取训练集。1601. Obtain a training set.

所述训练集中包括多个图像样本以及每个图像样本对应的标签，所述每个图像样本包括多个通道的信息。The training set includes multiple image samples and labels corresponding to each image sample, and each image sample includes information of multiple channels.

具体地，获取训练集的方式可以包括多种，例如，若本申请由前述图2中所示出的训练设备120来执行，则该训练集可以是从数据库130中提取到的信息，或者是由客户设备140发送的数据。Specifically, there may be various ways to obtain the training set. For example, if the application is executed by the training device 120 shown in FIG. 2, the training set may be the information extracted from the database 130, or the Data sent by client device 140 .

例如，训练集中可以包括多个样本对，每个样本对中具有一个图像样本和对应的真值图像(即标签)，真值图像的清晰度高于图像样本。如在拍摄到清晰的多通道的图像之后，将该图像作为真值图像，并对该真值图像进行模糊处理，如增加雾气或者降低对比度等，得到图像样本，与真值图像作为一对样本对。For example, a training set may include multiple sample pairs, each sample pair has an image sample and a corresponding ground-truth image (ie, a label), and the clarity of the ground-truth image is higher than that of the image sample. For example, after a clear multi-channel image is captured, the image is taken as the true value image, and the true value image is blurred, such as increasing fog or reducing contrast, etc., to obtain image samples, and the true value image is used as a pair of samples right.

1602、使用训练集对神经网络进行至少一次迭代训练，得到训练后的神经网络。1602. Perform at least one iterative training on the neural network using the training set to obtain a trained neural network.

其中，可以将训练集中的图像样本作为神经网络的输入，以对神经网络进行至少一次迭代训练，得到训练后的神经网络。训练后的神经网络可以用于进行图像增强，如实现前述图7-14中对应的方法的步骤。Wherein, the image samples in the training set can be used as the input of the neural network to perform at least one iteration training on the neural network to obtain the trained neural network. The trained neural network can be used for image enhancement, such as implementing the steps of the corresponding methods in Figures 7-14 above.

下面以一次迭代过程为例对训练过程进行示例性说明，如下步骤16021-步骤16026。The following takes an iterative process as an example to illustrate the training process, the following steps 16021-16026.

16021、分别从输入图像的多个通道的信息中提取特征，得到多个引导图。16021. Extract features from the information of multiple channels of the input image, respectively, to obtain multiple guide maps.

在一次迭代训练过程中，可以将训练集中的一个或者多个图像样本作为神经网络的输入，本申请示例性地以一个输入图像为例进行示例性介绍。In an iterative training process, one or more image samples in the training set may be used as the input of the neural network, and this application exemplarily takes an input image as an example for exemplary introduction.

16022、获取输入图像对应的双边网格数据。16022. Obtain bilateral grid data corresponding to the input image.

16023、分别以所述多个引导图中的每个引导图作为引导条件，对所述双边网格数据进行上采样，得到多个特征图。16023. Using each guide map in the plurality of guide maps as a guide condition, perform up-sampling on the bilateral grid data to obtain multiple feature maps.

16024、融合多个特征图，得到输出图像。16024. Fusing multiple feature maps to obtain an output image.

其中，步骤16021-步骤16023与前述步骤701-705的过程类似，此处不再赘述。Wherein, steps 16021 to 16023 are similar to the processes of the foregoing steps 701 to 705, and are not repeated here.

16025、根据输出图像和对应的真值图像更新神经网络，得到当前次更新后的神经网络。16025. Update the neural network according to the output image and the corresponding ground-truth image to obtain the current updated neural network.

其中，损失函数可以采用多种，如误差平方均方、交叉熵、对数、指数等损失函数。相当于使用函数计算输出图像和真值图像之间的偏移量。Among them, a variety of loss functions can be used, such as error square mean square, cross entropy, logarithm, exponential and other loss functions. Equivalent to using the function to calculate the offset between the output image and the ground truth image.

在计算得到损失值之后，即可基于损失值计算神经网络的参数的梯度，该梯度可以用于表示更新神经网络的参数时的导数，从而根据该梯度可以更新神经网络的参数，得到更新后的神经网络。After the loss value is calculated, the gradient of the parameters of the neural network can be calculated based on the loss value, and the gradient can be used to represent the derivative when updating the parameters of the neural network, so that the parameters of the neural network can be updated according to the gradient, and the updated Neural Networks.

16026、判断是否符合收敛条件，若是则执行步骤1603，若否，则执行步骤16021。16026. Determine whether the convergence condition is met, if yes, go to step 1603, if not, go to step 16021.

在得到更新后的神经网络之后，可以判断是否符合收敛条件，若符合收敛条件，则可以输出更新后的神经网络，即完成对神经网络的训练。若不符合收敛条件，则可以继续对神经网络进行训练，即重复执行步骤16021，直到满足收敛条件。After the updated neural network is obtained, it can be judged whether the convergence conditions are met, and if the convergence conditions are met, the updated neural network can be output, that is, the training of the neural network is completed. If the convergence conditions are not met, the neural network can continue to be trained, that is, step 16021 is repeatedly executed until the convergence conditions are met.

其中，该收敛条件可以包括以下一项或者多项：对神经网络的训练次数达到预设次数，或者，神经网络的输出精度高于预设精度值，或者，神经网络的平均精度高于预设平均值，或者，神经网络的训练时长超过预设时长等。Wherein, the convergence condition may include one or more of the following: the number of times of training the neural network reaches a preset number, or the output accuracy of the neural network is higher than the preset accuracy value, or the average accuracy of the neural network is higher than the preset value average, or, the training time of the neural network exceeds the preset time, etc.

本申请实施方式中，在融合上采样和输入图像时，可以对上采样进行压缩，得到压缩特征，然后对压缩特征和输入图像进行逐项乘积，通过乘积的方式，可以使输入图像中的信息快速反馈至特征图中，加快神经网络的收敛速度，高效地得到训练好的神经网络。In the embodiment of the present application, when the upsampling and the input image are fused, the upsampling can be compressed to obtain the compressed feature, and then the compressed feature and the input image are multiplied item by item. Quickly feed back to the feature map, speed up the convergence speed of the neural network, and efficiently obtain a trained neural network.

1603、输出更新后的神经网络。1603. Output the updated neural network.

其中，在满足收敛条件之后，即可输出训练后的神经网络，该神经网络可以部署于终端或者服务器中，如部署于手机、相机、智能车或者监控设备等，如在手机拍照、相机拍照、自动驾驶或者智慧城市等场景中，用于对采集到的图像进行增强，提高图像的清晰度，尤其是对实时性以及计算开销要求较高的场景下，可以通过本申请提供的轻量化的模型来实现图像增强。Among them, after the convergence conditions are met, the trained neural network can be output, and the neural network can be deployed in a terminal or server, such as a mobile phone, camera, smart car or monitoring equipment, such as taking pictures on mobile phones, cameras, etc. In scenarios such as autonomous driving or smart cities, it is used to enhance the collected images and improve the clarity of the images, especially in scenarios with high real-time and computational overhead requirements, the lightweight model provided by this application can be used. to achieve image enhancement.

因此，通过本申请提供的方法，可以高效地得到收敛的神经网络，应用在各种轻量化的场景下，尤其在一些弱计算设备中，可以部署在相关设备的计算节点上，对因为雾气而降质的画面进行增强，得到清晰并且颜色正常的结果，或者提升图像的对比度，实现图像增强效果。Therefore, through the method provided in this application, a convergent neural network can be efficiently obtained, which can be applied in various lightweight scenarios, especially in some weak computing devices, and can be deployed on the computing nodes of related devices. The degraded image can be enhanced to obtain a clear and normal color result, or the contrast of the image can be improved to achieve an image enhancement effect.

为进一步便于对本申请提供的方法的图像增强效果进行了解，下面结合一些常用的方式对本申请提供的方法所实现的图像增强效果进行介绍。In order to further facilitate the understanding of the image enhancement effect of the method provided by the present application, the following describes the image enhancement effect achieved by the method provided by the present application in combination with some common methods.

首先，一些常用的图像增强方式可以包括：利用颜色衰变先验的快速单张图片去雾算法(a fast single image haze removal algorithm using color attenuationprior，CAP)、非局部图片去雾(Non-local image dehazing，NLD)、利用暗通道先验的单张图片去雾方法(Single image haze removal using dark channel prior，DCP)、利用边缘约束和上下文正则化的高效去雾方法(Efficient image dehazing with boundaryconstraint and contextual regularization，BCCR)、单张图片去雾的端到端系统(anend-to-end system for single image haze removal，DehazeNet)、一体化去雾网络(All-in-one dehazing network，AOD)、通过多尺度卷积神经网络对单张图片去雾的方法(Single image dehazing via multiscale convolutional neural networks，MSCNN)、基于块图的鲁棒的单张图片去雾方法(robust haze removal based on patch map forsingle images，PMS)、基于注意力机制的多尺度网络的图片去雾方法(attention-basedmulti-scale network for image dehazing，Griddehaze)、域自适应的图片去雾方法(Domain adaptation for image dehazing，DA)以及利用密集特征融合的多尺度增强的去雾网络(multi-scale boosted dehazing network with dense feature fusion，MSBDN)等。示例性地，上述的图像增强方式与本申请提供用的图像处理方法所实现的运行时长和峰值信噪比(peak signal-to-noise ratio，PSNR)可以如图17所示，显然，本申请提供的图像处理方法可以在更低的运行时长的基础上，得到PSNR较高的图像，可以实现轻量化且去雾效果更好的图像，可以实现实时处理4K图片的能力。First, some common image enhancement methods can include: a fast single image haze removal algorithm using color attenuation prior (CAP), non-local image dehazing (Non-local image dehazing) , NLD), Single image haze removal using dark channel prior (DCP), Efficient image dehazing with boundary constraint and contextual regularization , BCCR), an end-to-end system for single image haze removal (DehazeNet), an all-in-one dehazing network (AOD), a multi-scale Single image dehazing via multiscale convolutional neural networks (MSCNN), robust haze removal based on patch map for single images (PMS) ), an attention-based multi-scale network for image dehazing (Griddehaze), a domain adaptation for image dehazing (DA), and the use of dense features Fusion multi-scale boosted dehazing network (multi-scale boosted dehazing network with dense feature fusion, MSBDN) and so on. Exemplarily, the running duration and peak signal-to-noise ratio (PSNR) achieved by the above-mentioned image enhancement method and the image processing method provided by the present application can be shown in FIG. 17 . Obviously, the present application The provided image processing method can obtain images with higher PSNR on the basis of lower running time, realize images with light weight and better dehazing effect, and realize the ability to process 4K images in real time.

更详细，如下表1为上述常用的去雾方法和本申请提供的图像处理方法在O-HAZE数据集的输出结果对比，此处以PSNR、结构相似性指标(Structural similarity indexmeasure，SSIM)来进行对比，In more detail, the following table 1 is a comparison of the output results of the above commonly used dehazing methods and the image processing methods provided in this application in the O-HAZE data set. Here, PSNR and structural similarity index measure (SSIM) are used for comparison. ,

表1Table 1

显然，本申请提供的图像处理方法，可以在实现高PSNR和高SSIM的基础上，耗费更少的计算量，得到去雾效果更好的图像。Obviously, the image processing method provided in this application can obtain an image with better dehazing effect on the basis of realizing high PSNR and high SSIM, and consuming less calculation amount.

更直观地，本申请和一些常用方式的图像增强效果对比可以如图18A、图18B和图18C所示，显然，通过本申请提供的图像处理方法，实现的图像去雾效果更好，所实现的PSNR和SSIM也更优，大大提高了图像的清晰度，提高用户体验。More intuitively, the comparison of the image enhancement effects of the present application and some common methods can be shown in Figure 18A, Figure 18B and Figure 18C. Obviously, the image processing method provided by the present application can achieve better image dehazing effect. The PSNR and SSIM are also better, which greatly improves the clarity of the image and improves the user experience.

前述对本申请提供的方法的流程和神经网络进行了详细介绍，下面对本申请提供的用于执行前述方法步骤的装置的结构进行介绍。The flow of the method and the neural network provided by the present application are described in detail above, and the structure of the apparatus for executing the steps of the foregoing method provided by the present application is described below.

参阅图19，本申请提供的一种图像处理装置的结构示意图。Referring to FIG. 19 , a schematic structural diagram of an image processing apparatus provided by the present application.

该图像处理装置可以包括：The image processing apparatus may include:

收发模块1901，用于获取输入图像，输入图像包括多个通道的信息；The transceiver module 1901 is used to acquire an input image, and the input image includes information of multiple channels;

特征提取模块1902，用于分别从输入图像的多个通道的信息中提取特征，得到多个引导图；The feature extraction module 1902 is used to extract features from the information of multiple channels of the input image to obtain multiple guide maps;

双边生成模块1903，用于获取输入图像对应的双边网格数据，双边网格数据包括在空间维度中排列的亮度维度的信息形成的数据，亮度维度的信息为根据从输入图像中提取到的特征得到，双边网格数据的分辨率低于输入图像的分辨率，空间维度为预设的空间或者根据输入图像确定的空间；The bilateral generation module 1903 is used to obtain the bilateral grid data corresponding to the input image. The bilateral grid data includes the data formed by the information of the brightness dimension arranged in the spatial dimension, and the information of the brightness dimension is based on the features extracted from the input image. It is obtained that the resolution of the bilateral grid data is lower than the resolution of the input image, and the spatial dimension is a preset space or a space determined according to the input image;

引导模块1904，用于分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，得到多个特征图；The guidance module 1904 is used for up-sampling the bilateral grid data with each guidance picture in the plurality of guidance pictures as a guidance condition to obtain a plurality of feature maps;

融合模块1905，用于融合多个特征图，得到输出图像。The fusion module 1905 is used to fuse multiple feature maps to obtain an output image.

在一种可能的实施方式中，引导模块1904，具体可以用于：使用第一引导图作为引导条件，对双边网格数据进行上采样，得到上采样特征，第一引导图是多个引导图中的任意一个；融合上采样特征和输入图像，得到第一特征图，第一特征图包括于多个特征图。In a possible implementation manner, the guide module 1904 may be specifically configured to: use the first guide map as a guide condition to up-sample the bilateral grid data to obtain up-sampling features, where the first guide map is a plurality of guide maps Any one of: fusing the up-sampled feature and the input image to obtain a first feature map, and the first feature map is included in multiple feature maps.

在一种可能的实施方式中，引导模块1904，具体可以用于：对上采样特征进行压缩，得到压缩特征，压缩特征的通道数量少于上采样特征的通道数量；对压缩特征和输入图像进行逐项乘积，得到第一特征图。In a possible implementation manner, the guiding module 1904 can be specifically configured to: compress the up-sampled features to obtain compressed features, and the number of channels of the compressed features is less than the number of channels of the up-sampled features; Item-by-item product to get the first feature map.

在一种可能的实施方式中，双边生成模块1902，具体可以用于：对输入图像进行下采样，得到下采样图像；从下采样图像中提取特征，得到下采样特征，双边网格数据包括下采样特征。In a possible implementation, the bilateral generation module 1902 can be specifically configured to: downsample the input image to obtain a downsampled image; extract features from the downsampled image to obtain downsampled features, and the bilateral grid data includes downsampled features. Sampling features.

在一种可能的实施方式中，融合模块1905，具体可以用于：拼接多个特征图，得到拼接图像；对拼接图像进行至少一次特征提取，得到至少一个第一特征；融合至少一个第一特征和输入图像，得到输出图像。In a possible implementation, the fusion module 1905 can be specifically used for: splicing multiple feature maps to obtain a spliced image; performing feature extraction on the spliced image at least once to obtain at least one first feature; fusing at least one first feature and the input image to get the output image.

在一种可能的实施方式中，融合模块1905，具体可以用于拼接多个特征图和输入图像，得到拼接图像。In a possible implementation manner, the fusion module 1905 may be specifically configured to splicing multiple feature maps and input images to obtain a spliced image.

参阅图20，本申请提供的一种神经网络训练的结构示意图，如下所述。Referring to FIG. 20 , a schematic structural diagram of a neural network training provided by the present application is described as follows.

该训练装置可以包括：The training device may include:

获取模块2001，用于获取训练集，训练集中包括多个图像样本以及每个图像样本对应的真值图像，每个图像样本包括多个通道的信息；An acquisition module 2001 is used to acquire a training set, the training set includes multiple image samples and a ground truth image corresponding to each image sample, and each image sample includes information of multiple channels;

训练模块2002，用于使用训练集对神经网络进行至少一次迭代训练，得到训练后的神经网络；The training module 2002 is used to perform at least one iteration training on the neural network using the training set to obtain the trained neural network;

其中，在任意一次迭代训练过程中，神经网络分别从输入图像的多个通道的信息中提取特征，得到多个引导图，获取输入图像对应的双边网格数据分别以多个引导图中的每个引导图作为引导条件，对双边网格数据进行上采样，得到多个特征图，融合多个特征图，得到输出图像，根据输出图像和输入图像对应的真值图像更新神经网络，得到当前次更新后的神经网络，双边网格数据包括在预设空间中排列的亮度维度的信息形成的数据，亮度维度的信息为根据从输入图像中提取到的特征得到，双边网格数据的分辨率低于输入图像的分辨率。Among them, in any iterative training process, the neural network extracts features from the information of multiple channels of the input image, obtains multiple guide maps, and obtains the bilateral grid data corresponding to the input image as each of the multiple guide maps. Each guide map is used as a guide condition to upsample the bilateral grid data to obtain multiple feature maps, fuse multiple feature maps to obtain the output image, update the neural network according to the true value image corresponding to the output image and the input image, and obtain the current time In the updated neural network, the bilateral grid data includes the data formed by the information of the brightness dimension arranged in the preset space. The information of the brightness dimension is obtained according to the features extracted from the input image, and the resolution of the bilateral grid data is low. the resolution of the input image.

在一种可能的实施方式中，训练模块2002，具体可以用于：使用第一引导图作为引导条件，对双边网格数据进行上采样，得到上采样特征，第一引导图是多个引导图中的任意一个；融合上采样特征和输入图像，得到第一特征图，第一特征图包括于多个特征图。In a possible implementation, the training module 2002 can be specifically configured to: use the first guide map as a guide condition to upsample the bilateral grid data to obtain upsampling features, where the first guide map is a plurality of guide maps Any one of: fusing the up-sampled feature and the input image to obtain a first feature map, and the first feature map is included in multiple feature maps.

在一种可能的实施方式中，训练模块2002，具体可以用于：对上采样特征进行压缩，得到压缩特征，压缩特征的通道数量少于上采样特征的通道数量；对压缩特征和输入图像进行逐项乘积，得到第一特征图。In a possible implementation, the training module 2002 can be specifically used to: compress the up-sampled features to obtain compressed features, and the number of channels of the compressed features is less than the number of channels of the up-sampled features; Item-by-item product to get the first feature map.

在一种可能的实施方式中，训练模块2002，具体可以用于：对输入图像进行下采样，得到下采样图像；从下采样图像中提取特征，得到下采样特征，双边网格数据包括下采样特征。In a possible implementation manner, the training module 2002 can be specifically configured to: downsample the input image to obtain the downsampled image; extract features from the downsampled image to obtain the downsampled feature, and the bilateral grid data includes downsampling feature.

在一种可能的实施方式中，训练模块2002，具体可以用于：拼接多个特征图，得到拼接图像；对拼接图像进行至少一次特征提取，得到至少一个第一特征；融合至少一个第一特征和输入图像，得到输出图像。In a possible implementation manner, the training module 2002 can be specifically used for: splicing multiple feature maps to obtain a spliced image; performing feature extraction on the spliced image at least once to obtain at least one first feature; fusing at least one first feature and the input image to get the output image.

在一种可能的实施方式中，训练模块2002，具体可以用于拼接多个特征图和输入图像，得到拼接图像。In a possible implementation, the training module 2002 can be specifically used to stitch multiple feature maps and input images to obtain a stitched image.

请参阅图21，本申请提供的另一种图像处理装置的结构示意图，如下所述。Please refer to FIG. 21 , which is a schematic structural diagram of another image processing apparatus provided by the present application, as described below.

该训练装置可以包括处理器2101和存储器2102。该处理器2101和存储器2102通过线路互联。其中，存储器2102中存储有程序指令和数据。The training device may include a processor 2101 and a memory 2102. The processor 2101 and the memory 2102 are interconnected by wires. Among them, the memory 2102 stores program instructions and data.

存储器2102中存储了前述图7-图14中的步骤对应的程序指令以及数据。The memory 2102 stores program instructions and data corresponding to the steps in the aforementioned FIG. 7 to FIG. 14 .

处理器2101用于执行前述图7-图14中任一实施例所示的图像处理装置执行的方法步骤。The processor 2101 is configured to execute the method steps executed by the image processing apparatus shown in any of the foregoing embodiments in FIG. 7 to FIG. 14 .

可选地，该图像处理装置还可以包括收发器2103，用于接收或者发送数据。Optionally, the image processing apparatus may further include a transceiver 2103 for receiving or sending data.

本申请实施例中还提供一种计算机可读存储介质，该计算机可读存储介质中存储有程序，当其在计算机上运行时，使得计算机执行如前述图7-图14所示实施例描述的方法中的步骤。Embodiments of the present application also provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the program described in the foregoing embodiments shown in FIG. 7 to FIG. 14 . steps in the method.

可选地，前述的图21中所示的图像处理装置为芯片。Optionally, the aforementioned image processing device shown in FIG. 21 is a chip.

请参阅图22，本申请提供的另一种训练装置的结构示意图，如下所述。Please refer to FIG. 22 , which is a schematic structural diagram of another training device provided by the present application, as described below.

该训练装置可以包括处理器2201和存储器2202。该处理器2201和存储器2202通过线路互联。其中，存储器2202中存储有程序指令和数据。The training device may include a processor 2201 and a memory 2202. The processor 2201 and the memory 2202 are interconnected by wires. Among them, the memory 2202 stores program instructions and data.

存储器2202中存储了前述图15-图16中的步骤对应的程序指令以及数据。The memory 2202 stores program instructions and data corresponding to the steps in the foregoing FIG. 15 to FIG. 16 .

处理器2201用于执行前述图15-图16中任一实施例所示的训练装置执行的方法步骤。The processor 2201 is configured to perform the method steps performed by the training apparatus shown in any of the foregoing embodiments in FIG. 15 to FIG. 16 .

可选地，该训练装置还可以包括收发器2203，用于接收或者发送数据。Optionally, the training device may further include a transceiver 2203 for receiving or transmitting data.

本申请实施例中还提供一种计算机可读存储介质，该计算机可读存储介质中存储有程序，当其在计算机上运行时，使得计算机执行如前述图15-图16所示实施例描述的方法中的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the above-described embodiments shown in FIG. 15-FIG. 16 . steps in the method.

可选地，前述的图22中所示的训练装置为芯片。Optionally, the aforementioned training device shown in FIG. 22 is a chip.

本申请实施例还提供了一种图像处理装置，该训练装置也可以称为数字处理芯片或者芯片，芯片包括处理单元和通信接口，处理单元通过通信接口获取程序指令，程序指令被处理单元执行，处理单元用于执行前述图7-图14中任一实施例所示的方法步骤。The embodiment of the present application also provides an image processing device, the training device may also be called a digital processing chip or a chip, the chip includes a processing unit and a communication interface, the processing unit acquires program instructions through the communication interface, and the program instructions are executed by the processing unit, The processing unit is configured to execute the method steps shown in any of the foregoing embodiments in FIGS. 7-14 .

本申请实施例还提供了一种训练装置，该训练装置也可以称为数字处理芯片或者芯片，芯片包括处理单元和通信接口，处理单元通过通信接口获取程序指令，程序指令被处理单元执行，处理单元用于执行前述图15-图16中任一实施例所示的方法的步骤。The embodiment of the present application also provides a training device, which may also be called a digital processing chip or a chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. The unit is used to perform the steps of the method shown in any of the foregoing embodiments in FIGS. 15-16 .

本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2101、处理器2201，或者处理器2101、处理器2201的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时，该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时，可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中的方法步骤。The embodiments of the present application also provide a digital processing chip. The digital processing chip integrates circuits and one or more interfaces for implementing the above-mentioned processor 2101, processor 2201, or the functions of processor 2101 and processor 2201. When a memory is integrated in the digital processing chip, the digital processing chip can perform the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface. The digital processing chip implements the method steps in the above embodiments according to the program codes stored in the external memory.

本申请实施例中还提供一种包括计算机程序产品，当其在计算机上行驶时，使得计算机执行如前述图7-图14中任一实施例或者图15-图16中任一实施例描述的方法中的步骤。The embodiments of the present application also provide a computer program product, which, when driving on the computer, causes the computer to execute the description in any of the foregoing embodiments in FIGS. 7 to 14 or any of the embodiments in FIGS. 15 to 16 . steps in the method.

本申请实施例提供的图像处理装置或者训练装置可以为芯片，芯片包括：处理单元和通信单元，所述处理单元例如可以是处理器，所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令，以使服务器内的芯片执行上述图7-图14所示实施例描述的用于计算设备的深度学习训练方法。可选地，所述存储单元为所述芯片内的存储单元，如寄存器、缓存等，所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元，如只读存储器(read-only memory，ROM)或可存储静态信息和指令的其他类型的静态存储设备，随机存取存储器(random accessmemory，RAM)等。The image processing apparatus or training apparatus provided in this embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the deep learning training method for a computing device described in the embodiments shown in FIG. 7-FIG. 14 . Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), and the like.

具体地，前述的处理单元或者处理器可以是中央处理器(central processingunit，CPU)、网络处理器(neural-network processing unit，NPU)、图形处理器(graphicsprocessing unit，GPU)、数字信号处理器(digital signal processor，DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(fieldprogrammable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processor ( digital signal processor, DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. . A general purpose processor may be a microprocessor or it may be any conventional processor or the like.

示例性地，请参阅图23，图23为本申请实施例提供的芯片的一种结构示意图，所述芯片可以表现为神经网络处理器NPU 230，NPU 230作为协处理器挂载到主CPU(Host CPU)上，由Host CPU分配任务。NPU的核心部分为运算电路2303，通过控制器2304控制运算电路2303提取存储器中的矩阵数据并进行乘法运算。Exemplarily, please refer to FIG. 23. FIG. 23 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU 230, and the NPU 230 is mounted as a coprocessor to the main CPU ( On the Host CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 2303, which is controlled by the controller 2304 to extract the matrix data in the memory and perform multiplication operations.

在一些实现中，运算电路2303内部包括多个处理单元(process engine,PE)。在一些实现中，运算电路2303是二维脉动阵列。运算电路2303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路2303是通用的矩阵处理器。In some implementations, the arithmetic circuit 2303 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 2303 is a two-dimensional systolic array. The arithmetic circuit 2303 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2303 is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路从权重存储器2302中取矩阵B相应的数据，并缓存在运算电路中每一个PE上。运算电路从输入存储器2301中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器(accumulator)2308中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2302 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2301 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 2308 .

统一存储器2306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller，DMAC)2305，DMAC被搬运到权重存储器2302中。输入数据也通过DMAC被搬运到统一存储器2306中。Unified memory 2306 is used to store input data and output data. The weight data is directly passed through a storage unit access controller (direct memory access controller, DMAC) 2305 , and the DMAC is transferred to the weight memory 2302 . Input data is also moved to unified memory 2306 via the DMAC.

总线接口单元(bus interface unit，BIU)2310，用于AXI总线与DMAC和取指存储器(instruction fetch buffer，IFB)2309的交互。A bus interface unit (BIU) 2310 is used for the interaction between the AXI bus and the DMAC and an instruction fetch buffer (instruction fetch buffer, IFB) 2309 .

总线接口单元2310(bus interface unit，BIU)，用于取指存储器2309从外部存储器获取指令，还用于存储单元访问控制器2305从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 2310 (bus interface unit, BIU) is used for the instruction fetch memory 2309 to obtain instructions from the external memory, and also for the storage unit access controller 2305 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2306或将权重数据搬运到权重存储器2302中或将输入数据数据搬运到输入存储器2301中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2306 , the weight data to the weight memory 2302 , or the input data to the input memory 2301 .

向量计算单元2307包括多个运算处理单元，在需要的情况下，对运算电路的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。主要用于神经网络中非卷积/全连接层网络计算，如批归一化(batch normalization)，像素级求和，对特征平面进行上采样等。The vector calculation unit 2307 includes a plurality of operation processing units, and further processes the output of the operation circuit if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. Mainly used for non-convolutional/fully connected layer network computations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.

在一些实现中，向量计算单元2307能将经处理的输出的向量存储到统一存储器2306。例如，向量计算单元2307可以将线性函数和/或非线性函数应用到运算电路2303的输出，例如对卷积层提取的特征平面进行线性插值，再例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元2307生成归一化的值、像素级求和的值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路2303的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit 2307 can store the processed output vectors to the unified memory 2306 . For example, the vector calculation unit 2307 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2303, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 2307 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 2303, eg, for use in subsequent layers in a neural network.

控制器2304连接的取指存储器(instruction fetch buffer)2309，用于存储控制器2304使用的指令；an instruction fetch buffer 2309 connected to the controller 2304 for storing instructions used by the controller 2304;

统一存储器2306，输入存储器2301，权重存储器2302以及取指存储器2309均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 2306, the input memory 2301, the weight memory 2302 and the instruction fetch memory 2309 are all On-Chip memories. External memory is private to the NPU hardware architecture.

其中，循环神经网络中各层的运算可以由运算电路2303或向量计算单元2307执行。The operation of each layer in the recurrent neural network can be performed by the operation circuit 2303 or the vector calculation unit 2307 .

其中，上述任一处提到的处理器，可以是一个通用中央处理器，微处理器，ASIC，或一个或多个用于控制上述图7-图14中任一实施例或者图15-图16中任一实施例的方法的程序执行的集成电路。Wherein, the processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more of the above-mentioned embodiments for controlling any of the above-mentioned FIG. 7-FIG. 14 or FIG. 15-FIG. 16. An integrated circuit executing the program of the method of any of the embodiments.

另外需说明的是，以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外，本申请提供的装置实施例附图中，模块之间的连接关系表示它们之间具有通信连接，具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

通过以上的实施方式的描述，所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现，当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下，凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现，而且，用来实现同一功能的具体硬件结构也可以是多种多样的，例如模拟电路、数字电路或专用电路等。但是，对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在可读取的存储介质中，如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, removable hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.

所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘(solid state disk，SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

最后应说明的是：以上，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。Finally, it should be noted that: the above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. Any person skilled in the art who is familiar with the technical scope disclosed by the present application can easily think of changes. Or replacement should be covered within the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of image processing, comprising:

acquiring an input image, wherein the input image comprises information of a plurality of channels;

extracting features from information of a plurality of channels of the input image respectively to obtain a plurality of guide graphs, wherein the guide graphs correspond to the channels one by one;

acquiring bilateral grid data corresponding to the input image, wherein the bilateral grid data comprise data formed by brightness dimension information arranged in a space dimension, the brightness dimension information is obtained according to features extracted from the input image, the resolution of the bilateral grid data is lower than that of the input image, and the space dimension is a preset space or a space determined according to the input image;

taking each guide graph in the guide graphs as a guide condition, and performing up-sampling on the bilateral grid data to obtain a plurality of characteristic graphs;

and fusing the plurality of characteristic graphs to obtain an output image.

2. The method of claim 1, wherein the upsampling the bilateral mesh data with each of the plurality of pilot maps as a pilot condition, respectively, comprises:

using a first guide graph as a guide condition, performing upsampling on the bilateral grid data to obtain an upsampling characteristic, wherein the first guide graph is any one of the plurality of guide graphs;

and fusing the up-sampling feature and the input image to obtain a first feature map, wherein the first feature map is included in the plurality of feature maps.

3. The method of claim 2, wherein said fusing the upsampled features with the input image to obtain a first feature map comprises:

compressing the up-sampling features to obtain compressed features, wherein the number of channels of the compressed features is less than that of the up-sampling features;

and carrying out item-by-item product (element-wise product) on the compressed features and the input image to obtain the first feature map.

4. The method according to any one of claims 1-3, wherein the obtaining bilateral mesh data corresponding to the input image comprises:

down-sampling the input image to obtain a down-sampled image;

extracting features from the downsampled image to obtain downsampled features;

and determining the bilateral grid data according to the downsampling characteristics.

5. The method according to any one of claims 1-4, wherein said fusing the plurality of feature maps to obtain the output image comprises:

splicing the plurality of characteristic graphs to obtain a spliced image;

performing at least one time of feature extraction on the spliced image to obtain at least one first feature;

and fusing the at least one first feature and the input image to obtain the output image.

6. The method of claim 5, wherein said stitching the plurality of feature maps to obtain a stitched image comprises:

and splicing the plurality of feature maps and the input image to obtain the spliced image.

7. A neural network training method, comprising:

acquiring a training set, wherein the training set comprises a plurality of image samples and a true value image corresponding to each image sample, and each image sample comprises information of a plurality of channels;

performing at least one iterative training on the neural network by using the training set to obtain a trained neural network;

wherein, in any iterative training process, the neural network extracts features from the information of a plurality of channels of the input image respectively to obtain a plurality of guide maps, the plurality of guide graphs correspond to the plurality of channels one by one, the bilateral grid data corresponding to the input image are obtained, and each guide graph in the plurality of guide graphs is used as a guide condition, up-sampling the bilateral grid data to obtain a plurality of characteristic graphs, fusing the characteristic graphs to obtain an output image, updating the neural network according to the true value images corresponding to the output image and the input image to obtain the updated neural network at the current time, the bilateral mesh data includes data formed of information of a luminance dimension arranged in a preset space, the information of the brightness dimension is obtained according to features extracted from the input image, and the resolution of the bilateral mesh data is lower than that of the input image.

8. The method of claim 7, wherein the upsampling the bilateral mesh data with each of the plurality of pilot maps as a pilot condition, respectively, comprises:

9. An apparatus for image processing, comprising:

the receiving and sending module is used for acquiring an input image, and the input image comprises information of a plurality of channels;

the characteristic extraction module is used for extracting characteristics from information of a plurality of channels of the input image respectively to obtain a plurality of guide graphs, and the guide graphs correspond to the channels one to one;

the bilateral generation module is used for acquiring bilateral grid data corresponding to the input image, wherein the bilateral grid data comprise data formed by brightness dimension information arranged in a space dimension, the brightness dimension information is obtained according to features extracted from the input image, the resolution of the bilateral grid data is lower than that of the input image, and the space dimension is a preset space or a space determined according to the input image;

the guiding module is used for respectively taking each guiding graph in the plurality of guiding graphs as a guiding condition and carrying out up-sampling on the bilateral grid data to obtain a plurality of characteristic graphs;

and the fusion module is used for fusing the plurality of characteristic graphs to obtain an output image.

10. The apparatus according to claim 9, wherein the guidance module is specifically configured to:

11. The apparatus according to claim 10, wherein the guidance module is specifically configured to:

and carrying out item-by-item product on the compressed characteristic and the input image to obtain the first characteristic map.

12. The apparatus according to any one of claims 9 to 11, wherein the bilateral generation module is specifically configured to:

down-sampling the input image to obtain a down-sampled image;

extracting features from the downsampled image to obtain downsampled features;

13. The device according to any one of claims 9 to 12, wherein the fusion module is specifically configured to:

splicing the plurality of characteristic graphs to obtain a spliced image;

14. The apparatus of claim 13,

the fusion module is specifically configured to splice the plurality of feature maps and the input image to obtain the spliced image.

15. An exercise device, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a training set, the training set comprises a plurality of image samples and a true value image corresponding to each image sample, and each image sample comprises information of a plurality of channels;

the training module is used for carrying out at least one iterative training on the neural network by using the training set to obtain the trained neural network;

16. The apparatus of claim 15, wherein the training module is specifically configured to:

17. An image processing apparatus comprising a processor coupled to a memory, the memory storing a program, the program instructions stored by the memory when executed by the processor implementing the method of any of claims 1 to 6.

18. An exercise apparatus comprising a processor coupled to a memory, the memory storing a program, the program instructions stored by the memory when executed by the processor implementing the method of claim 7 or 8.

19. A computer readable storage medium comprising a program which, when executed by a processing unit, performs the method of any of claims 1 to 8.

20. A computer program product, characterized in that it comprises a software code for performing the method according to any one of claims 1 to 8.