WO2024244842A1

WO2024244842A1 - Repair model training method, video repair method, device, and medium

Info

Publication number: WO2024244842A1
Application number: PCT/CN2024/089864
Authority: WO
Inventors: 李岁缠
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2023-05-26
Filing date: 2024-04-25
Publication date: 2024-12-05
Anticipated expiration: 2025-11-26
Also published as: CN116681607A

Abstract

Provided are a repair model training method, a video repair method, a device, and a medium. The repair model training method comprises: obtaining a sample face image and a sample non-face image, and attaching the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image (S201), the image mask being used for representing an area in the target sample image corresponding to the sample face image; degrading the target sample image to obtain a degraded sample image (S202); and using the degraded sample image to train a repair model on the basis of a preset loss function, wherein the loss function comprises a global loss function, and a local loss function which corresponds to the image mask (S203).

Description

Restoration model training method, video restoration method, equipment and medium

本申请要求于2023年05月26日提交中国专利局、申请号为2023106121136、发明名称为“修复模型的训练方法、视频修复方法、设备及介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the China Patent Office on May 26, 2023, with application number 2023106121136 and invention name “Training method of restoration model, video restoration method, device and medium”, all contents of which are incorporated by reference in this application.

Technical Field

本申请实施例涉及计算机技术领域，尤其涉及一种修复模型的训练方法、视频修复方法、设备及介质。The embodiments of the present application relate to the field of computer technology, and in particular to a training method for a restoration model, a video restoration method, a device, and a medium.

Background Art

随着多媒体行业的发展，观看视频的用户越来越多，对视频的清晰度等要求也越来越高。为此，一般的音视频播放设备可以在播放视频时，通过修复模型对视频的画质进行增强处理。With the development of multimedia industry, more and more users watch videos, and their requirements for video clarity are getting higher and higher. Therefore, general audio and video playback devices can enhance the video quality through repair models when playing videos.

然而，相关技术中的视频修复方案功能较为单一，例如，部分修复模型能够去除视频中的压缩编码噪声和瑕疵，还有的修复模型能够从视频帧中提取出人脸图像并对人脸进行单独的修复，并将修复后的人脸合并至视频帧中。However, the functions of video restoration solutions in related technologies are relatively simple. For example, some restoration models can remove compression coding noise and defects in videos, while other restoration models can extract facial images from video frames and repair the faces separately, and then merge the restored faces into video frames.

但是，相关技术中的修复方案若需要同时去除瑕疵并进行人脸修复，需要多个模型协同工作，修复过程的模型部署和推理成本较高。However, if the restoration solutions in related technologies need to remove defects and perform face restoration at the same time, multiple models need to work together, and the model deployment and inference costs of the restoration process are high.

发明内容Summary of the invention

有鉴于此，本申请实施例提供一种修复模型的训练、视频修复方案，以至少部分解决上述问题。In view of this, an embodiment of the present application provides a training method for a restoration model and a video restoration solution to at least partially solve the above-mentioned problems.

根据本申请实施例的第一方面，提供了一种修复模型的训练方法，包括：获取样本人脸图像以及样本非人脸图像，并将所述样本人脸图像贴合至所述样本非人脸图像中，得到目标样本图像以及所述目标样本图像对应的图像遮罩，所述图像遮罩用于表征所述目标样本图像中对应于所述样本人脸图像的区域；将所述目标样本图像进行降质处理，得到降质样本图像；使用所述降质样本图像，基于预设的损失函数训练修复模型，其中，所述损失函数包括全局损失函数以及对应于所述图像遮罩的局部损失函数。According to a first aspect of an embodiment of the present application, a method for training a restoration model is provided, comprising: obtaining a sample face image and a sample non-face image, and fitting the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image, wherein the image mask is used to characterize an area in the target sample image corresponding to the sample face image; degrading the target sample image to obtain a degraded sample image; using the degraded sample image to train a restoration model based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask.

根据本申请实施例的第二方面，提供了一种视频修复方法，包括：获取待修复的视频帧；通过修复模型对所述视频帧进行修复处理，所述修复模型通过如上所述的方法训练得到。According to a second aspect of an embodiment of the present application, a video restoration method is provided, comprising: obtaining a video frame to be restored; and performing restoration processing on the video frame using a restoration model, wherein the restoration model is trained using the method described above.

根据本申请实施例的第三方面，提供了一种电子设备，包括：处理器、存储器、通信接口和通信总线，所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信；所述存储器用于存放至少一可执行指令，所述可执行指令使所述处理器执行如上任一项所述的方法对应的操作。According to a third aspect of an embodiment of the present application, there is provided an electronic device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus; the memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform an operation corresponding to the method described in any one of the above items.

根据本申请实施例的第四方面，提供了一种计算机存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上任一所述的方法。 According to a fourth aspect of an embodiment of the present application, a computer storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, any of the methods described above is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或相关技术中的技术方案，下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请实施例中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the related technologies, the drawings required for use in the embodiments or the related technical descriptions will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the embodiments of the present application. For ordinary technicians in this field, other drawings can also be obtained based on these drawings.

图1为适用本申请实施例的修复模型的训练方法或者视频修复方法的示例性系统的示意图；FIG1 is a schematic diagram of an exemplary system of a training method for a restoration model or a video restoration method applicable to an embodiment of the present application;

图2A为根据本申请实施例的一种修复模型的训练方法的步骤流程图；FIG2A is a flowchart of a method for training a repair model according to an embodiment of the present application;

图2B为图2A所示实施例中的一种目标样本图像及其对应的图像遮罩的示意图；FIG2B is a schematic diagram of a target sample image and its corresponding image mask in the embodiment shown in FIG2A ;

图3为本申请实施例的一种二阶降质链路的结构示意图；FIG3 is a schematic diagram of the structure of a second-order degradation link according to an embodiment of the present application;

图4为本申请实施例提供的一种修复模型的结构示意图；FIG4 is a schematic diagram of the structure of a repair model provided in an embodiment of the present application;

图5为本申请实施例提供的一种视频修复方法的流程示意图；FIG5 is a schematic diagram of a flow chart of a video repair method provided in an embodiment of the present application;

图6为本申请实施例的一种电子设备的结构示意图。FIG. 6 is a schematic diagram of the structure of an electronic device according to an embodiment of the present application.

DETAILED DESCRIPTION

为了使本领域的人员更好地理解本申请实施例中的技术方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请实施例一部分实施例，而不是全部的实施例。基于本申请实施例中的实施例，本领域普通技术人员所获得的所有其他实施例，都应当属于本申请实施例保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. All other embodiments obtained by ordinary technicians in this field based on the embodiments in the embodiments of the present application should fall within the scope of protection of the embodiments of the present application.

下面结合本申请实施例附图进一步说明本申请实施例具体实现。The specific implementation of the embodiment of the present application is further explained below in conjunction with the accompanying drawings of the embodiment of the present application.

图1示出了一种适用本申请实施例的修复模型的训练方法或者视频修复方法的示例性系统。如图1所示，该系统100可以包括云服务端102、通信网络104和/或一个或多个用户设备106，图1中示例为多个用户设备。Fig. 1 shows an exemplary system of a training method for a restoration model or a video restoration method applicable to an embodiment of the present application. As shown in Fig. 1, the system 100 may include a cloud service end 102, a communication network 104 and/or one or more user devices 106, and Fig. 1 shows multiple user devices.

云服务端102可以是用于存储信息、数据、程序和/或任何其他合适类型的内容的任何适当的设备，包括但不限于分布式存储系统设备、服务器集群、云计算服务端集群等。在一些实施例中，云服务端102可以执行任何适当的功能。例如，在一些实施例中，云服务端102可以用于存储视频。作为一种示例，在一些实施例中，云服务端102可以被用于训练修复模型。作为另一示例，在一些实施例中，云服务端102可以被用于部署修复模型。The cloud service end 102 can be any suitable device for storing information, data, programs and/or any other suitable type of content, including but not limited to distributed storage system devices, server clusters, cloud computing service end clusters, etc. In some embodiments, the cloud service end 102 can perform any suitable function. For example, in some embodiments, the cloud service end 102 can be used to store videos. As an example, in some embodiments, the cloud service end 102 can be used to train a repair model. As another example, in some embodiments, the cloud service end 102 can be used to deploy a repair model.

在一些实施例中，通信网络104可以是一个或多个有线和/或无线网络的任何适当的组合。例如，通信网络104能够包括以下各项中的任何一种或多种：互联网、内联网、广域网(Wide Area Network，WAN)、局域网(Local Area Network,LAN)、无线网络、数字订户线路(Digital Subscriber Line，DSL)网络、帧中继网络、异步转移模式(Asynchronous Transfer Mode,ATM)网络、虚拟专用网(Virtual Private Network，VPN)和/或任何其它合适的通信网络。用户设备106能够通过一个或多个通信链路(例如，通信链路112)连接到通信网络104，该通信网络104能够经由一个或多个通信链路(例如，通信链路114)被链接到云服务端102。通信链路可以是适合于在用户设备106和云服务端102之间传送数据的任何通信链路，诸如网络链路、拨号链路、无线链路、硬连线链路、任何其它合适的通信链路或此类链路的任何合适的组合。In some embodiments, the communication network 104 can be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 104 can include any one or more of the following: the Internet, an intranet, a wide area network (WAN), a local area network (LAN), a wireless network, a digital subscriber line (DSL) network, a frame relay network, an asynchronous transfer mode (ATM) network, a virtual private network (VPN), and/or any other suitable communication network. The user device 106 can be connected to the communication network 104 via one or more communication links (e.g., communication link 112), and the communication network 104 can be linked to the cloud service end 102 via one or more communication links (e.g., communication link 114). The communication link can be any communication link suitable for transmitting data between the user device 106 and the cloud service end 102, such as Such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communications link, or any suitable combination of such links.

用户设备106可以包括适合于呈现图像或者视频的任何一个或多个用户设备。在一些实施例中，用户设备106可以包括任何合适类型的设备。例如，在一些实施例中，用户设备106可以包括移动设备、平板计算机、膝上型计算机、台式计算机、可穿戴计算机、游戏控制台、媒体播放器、车辆娱乐系统和/或任何其他合适类型的用户设备。User device 106 may include any one or more user devices suitable for presenting images or videos. In some embodiments, user device 106 may include any suitable type of device. For example, in some embodiments, user device 106 may include a mobile device, a tablet computer, a laptop computer, a desktop computer, a wearable computer, a game console, a media player, a vehicle entertainment system, and/or any other suitable type of user device.

基于上述系统，本申请实施例提供了一种修复模型的训练方法，以下通过多个实施例进行说明。Based on the above system, an embodiment of the present application provides a method for training a repair model, which is described below through multiple embodiments.

参见图2A，示出了本申请实施例提供的一种修复模型的训练方法的步骤流程图，包括：Referring to FIG. 2A , a flowchart of a method for training a repair model provided in an embodiment of the present application is shown, including:

S201、获取样本人脸图像以及样本非人脸图像，并将所述样本人脸图像贴合至所述样本非人脸图像中，得到目标样本图像以及所述目标样本图像对应的图像遮罩；S201, obtaining a sample face image and a sample non-face image, and fitting the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image;

本实施例中，样本人脸图像可以通过任意方式获得的包括人脸的图像，例如可以从开源数据库获得包括各类姿态的人脸数据，也可以为通过人脸识别模型采集到的人脸图像，均在本申请的保护范围内。In this embodiment, the sample facial image may be an image of a face obtained by any means, for example, facial data including various postures may be obtained from an open source database, or it may be a facial image collected by a facial recognition model, all of which are within the scope of protection of this application.

样本非人脸图像可以为通过任意方式获得的不包括人脸的图像，在一实施方式中，样本非人脸图像可以是人可能存在的场景对应的图像。The sample non-face image may be an image obtained in any manner and not including a face. In one embodiment, the sample non-face image may be an image corresponding to a scene where a person may exist.

本实施例中，可以将样本人脸图像随机叠加在样本非人脸图像上得到目标样本图像，并可以根据样本人脸图像叠加至样本非人脸图像上的位置生成图像遮罩，本实施例中，图像遮罩用于表征所述目标样本图像中对应于所述样本人脸图像的区域。In this embodiment, the sample face image can be randomly superimposed on the sample non-face image to obtain the target sample image, and an image mask can be generated according to the position where the sample face image is superimposed on the sample non-face image. In this embodiment, the image mask is used to characterize the area in the target sample image corresponding to the sample face image.

参见图2B，左侧示出了将样本人脸图像叠加在非样本人脸图像上得到的目标样本图像，右侧示出了目标样本图像对应的图像遮罩，示例地，图像遮罩可以为黑白二值图，其中对应于样本非人脸图像的区域为黑色，对应于样本人脸图像的区域为白色。Referring to FIG. 2B , the left side shows a target sample image obtained by superimposing a sample face image on a non-sample face image, and the right side shows an image mask corresponding to the target sample image. By way of example, the image mask can be a black-and-white binary image, in which the area corresponding to the sample non-face image is black, and the area corresponding to the sample face image is white.

S202、将所述目标样本图像进行降质处理，得到降质样本图像；S202, degrading the target sample image to obtain a degraded sample image;

本实施例中，可以采用任意的降质处理方案，对目标样本图像进行降质处理，得到降质样本图像。In this embodiment, any degradation processing scheme may be used to perform degradation processing on the target sample image to obtain a degraded sample image.

降质处理：使用模糊和压缩等方式对高质量图像进行处理，以降低图像质量(例如降低分辨率、降低图像大小等)，获得低质量的降质样本图像。Degradation processing: Use blurring and compression to process high-quality images to reduce image quality (such as reducing resolution, reducing image size, etc.) to obtain low-quality degraded sample images.

在一实施方式中，可以将所述目标样本图像进行在线降质处理和\或离线降质处理，得到所述降质样本图像。In one implementation, the target sample image may be subjected to online degradation processing and/or offline degradation processing to obtain the degraded sample image.

所述在线降质处理包括：通过在线进程对所述目标样本图像进行降质处理，得到降质样本图像。The online quality degradation process includes: performing quality degradation on the target sample image through an online process to obtain a degraded sample image.

所述离线降质处理包括：将所述目标样本图像提交至离线视频压缩编码程序，获得所述离线视频压缩编码程序输出的编码后图像作为所述降质样本图像。The offline degradation processing includes: submitting the target sample image to an offline video compression encoding program, and obtaining an encoded image output by the offline video compression encoding program as the degraded sample image.

当同时采用在线降质处理和离线降质处理时，能够尽量模拟真实视频中的模糊和压缩瑕疵，使得训练出的修复模型能够更好地进行压缩瑕疵的去除和视频清晰度的提升。在线降质处理可以采用压缩降质速度较快的方案，例如JPEG等压缩方案，从而使得在线降质处理的过程占用的资源以及消耗的时间较少，离线降质处理可以采用压缩速度较慢的视频压缩编码程序，例如H264和H265等视频编码压缩程序，使得在满足得到的降质样本图像类型较多的同时，不会较大地增加训练过程耗费的时间和资源。When both online and offline degradation are used, blur and compression defects in real videos can be simulated as much as possible, so that the trained repair model can better remove compression defects and improve video clarity. Online degradation can use a faster compression degradation scheme, such as JPEG, so that the online degradation process occupies less resources and consumes less time. Offline degradation can use a slower compression speed video. Compression coding programs, such as H264 and H265 video coding compression programs, can satisfy the requirements of obtaining a large number of degraded sample image types without significantly increasing the time and resources consumed in the training process.

在一实施方式中，所述在线进程包括至少两阶降质链路，相邻两阶的所述降质链路中，上一阶所述降质链路的输出作为下一阶所述降质链路的输入，最后一阶所述降质链路的输出为所述降质样本图像。In one embodiment, the online process includes at least two levels of degradation links, in which two adjacent levels of degradation links, the output of the previous level degradation link serves as the input of the next level degradation link, and the output of the last level degradation link is the degraded sample image.

通过至少两阶降质链路，能够使得降质处理后得到的降质样本图像与修复模型实际的图像的模糊程度更加接近,提高了训练出的修复模型的准确度。Through at least two-order degradation links, the blur degree of the degraded sample image obtained after degradation processing can be made closer to the actual image of the restoration model, thereby improving the accuracy of the trained restoration model.

在一实施方式中，所述降质链路包括模糊处理模块，用于对输入的所述目标样本图像进行模糊处理；采样模块，用于对模糊处理后的所述目标样本图像进行上采样和\或下采样；压缩模块，用于对采样后的所述目标样本图像进行编码压缩，得到降质后的所述目标样本图像。In one embodiment, the degradation link includes a blur processing module for blurring the input target sample image; a sampling module for upsampling and/or downsampling the blurred target sample image; and a compression module for encoding and compressing the sampled target sample image to obtain the degraded target sample image.

参见图3，示出了一种二阶降质链路的结构示意图，如图3所示，二阶降质链路中的每阶降质链路包括：模糊处理模块、采样模块、压缩模块。其中，压缩模块具体为JPEG压缩模块。Referring to Fig. 3, a schematic diagram of the structure of a second-order degradation link is shown. As shown in Fig. 3, each degradation link in the second-order degradation link includes: a blur processing module, a sampling module, and a compression module. The compression module is specifically a JPEG compression module.

目标样本图像输入至第一阶降质链路，第一阶降质链路中，模糊处理模块对输入的所述目标样本图像进行模糊处理；采样模块，用于对模糊处理后的所述目标样本图像进行上采样和\或下采样；压缩模块，用于对采样后的所述目标样本图像进行编码压缩，得到降质后的所述目标样本图像。The target sample image is input to the first-order degradation link. In the first-order degradation link, the blur processing module blurs the input target sample image; the sampling module is used to upsample and/or downsample the blurred target sample image; the compression module is used to encode and compress the sampled target sample image to obtain the degraded target sample image.

经过一阶降质链路降质后的目标样本图像被输入至第二阶降质链路，再次进行上述的降质处理，最后输出的为降质样本图像。The target sample image degraded by the first-order degradation link is input to the second-order degradation link, and the above-mentioned degradation process is performed again, and finally the degraded sample image is output.

需要说明的是，本实施例中仅以二阶降质链路为例进行举例说明，其他方案，例如三阶及以上的降质链路，也在本申请的保护范围内。It should be noted that, in this embodiment, only a second-order degraded link is used as an example for illustration, and other schemes, such as third-order and above degraded links, are also within the protection scope of this application.

在一实施方式中，降质链路中，模糊处理模块对输入的所述目标样本图像进行的模糊处理包括以下至少之一：高斯模糊、运动模糊。In one implementation, in the degradation link, the blur processing performed by the blur processing module on the input target sample image includes at least one of the following: Gaussian blur and motion blur.

高斯模糊是一个图像与高斯分布的概率密度函数做卷积。高斯模糊可以包括：高斯各向同性模糊、高斯各向异性模糊以及广义高斯模糊等，均在本申请的保护范围内。Gaussian blur is the convolution of an image with a probability density function of a Gaussian distribution. Gaussian blur can include: Gaussian isotropic blur, Gaussian anisotropic blur and generalized Gaussian blur, all of which are within the scope of protection of this application.

运动模糊又称动态模糊，是对图像进行处理后使其中快速移动的物体造成明显的模糊拖动痕迹。Motion blur, also known as dynamic blur, is the process of processing an image so that fast-moving objects in it have obvious blurry and dragging traces.

通过高斯模糊和运动模糊，可以使得模糊处理后的目标样本图像与实际的视频帧更加接近，以使得训练后的修复模型的效果更好。Through Gaussian blur and motion blur, the blurred target sample image can be made closer to the actual video frame, so that the trained restoration model can have better effect.

S203、使用所述降质样本图像，基于预设的损失函数训练修复模型，其中，所述损失函数包括全局损失函数以及对应于所述图像遮罩的局部损失函数。S203: Using the degraded sample image, training a restoration model based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask.

本实施例中，全局损失函数用于根据降质样本图像全局确定损失值，对应于图像遮罩的局部损失函数用于根据降质样本图像中对应于人脸的区域(即对应于样本人脸图像的区域)确定损失值。之后可以将损失值求和并根据求和后的损失值调整修复模型，以对修复模型进行训练。In this embodiment, the global loss function is used to globally determine the loss value according to the degraded sample image, and the local loss function corresponding to the image mask is used to determine the loss value according to the area corresponding to the face in the degraded sample image (i.e., the area corresponding to the sample face image). Afterwards, the loss values can be summed and the restoration model can be adjusted according to the summed loss values to train the restoration model.

本实施例中，全局损失函数和局部损失函数可以为任意适用于图像的损失函数，例如交叉熵损失函数等，全局损失函数和局部损失函数的公式可以相同也可以不同，均在本申请的保护范围内。In this embodiment, the global loss function and the local loss function may be any loss function applicable to the image, such as a cross entropy loss function, etc. The formulas of the global loss function and the local loss function may be the same or different, and are all in this application. Please stay within the protection range.

本实施例中，可以采用任意的训练方式对修复模型进行训练，例如监督训练方式、半监督训练方式或者无监督训练方式等，均在本申请的保护范围内。In this embodiment, any training method can be used to train the repair model, such as a supervised training method, a semi-supervised training method, or an unsupervised training method, all of which are within the protection scope of this application.

需要说明的是，本申请实施例中，用于训练修复模型的除了上述对应有图像遮罩的降质样本图像之外，还可以加入不对应有图像遮罩的图像作为样本，此时可以基于全局损失函数训练修复模型。It should be noted that in the embodiment of the present application, in addition to the above-mentioned degraded sample images corresponding to the image masks, images without corresponding image masks can also be added as samples for training the restoration model. In this case, the restoration model can be trained based on the global loss function.

在一实施方式中，所述修复模型的最小特征图的大小为输入图像块大小的1/32，所述修复模型的输入图像块大小大于或等于所述目标样本图像的1/3。In one embodiment, the size of the minimum feature map of the restoration model is 1/32 of the input image block size, and the input image block size of the restoration model is greater than or equal to 1/3 of the target sample image.

由于在实际的修复场景中，输入视频的分辨率可能较大(如1920x1080)，因此其中包含的人像尺度也会比较大；因此为了训练与测试的一致性，在训练时，修复模型的输入图像块大小大于或等于所述目标样本图像的1/3，例如设置的输入图像块大小patch size可以为576；同时为了使得网络有更大的感受野以捕获大尺度的人像语义，修复模型的最小特征图(feature map)的大小为输入图像块大小的1/32。In actual restoration scenarios, the resolution of the input video may be relatively large (such as 1920x1080), so the scale of the portraits contained therein will also be relatively large; therefore, for the consistency of training and testing, during training, the input image block size of the restoration model is greater than or equal to 1/3 of the target sample image, for example, the set input image block size patch size can be 576; at the same time, in order to make the network have a larger receptive field to capture large-scale portrait semantics, the minimum feature map size of the restoration model is 1/32 of the input image block size.

在一实施方式中，修复模型包括：多阶编码器和解码器，同阶的所述编码器和解码器之间跳跃连接，其中，所述编码器和所述解码器的阶数大于等于5，以使得所述修复模型的最小特征图(feature map)的大小为输入图像块大小的1/32。In one embodiment, the restoration model includes: a multi-order encoder and a decoder, and jump connections are formed between the encoders and decoders of the same order, wherein the order of the encoder and the decoder is greater than or equal to 5, so that the size of the minimum feature map of the restoration model is 1/32 of the input image block size.

参见图4，示出了本申请实施例提供的一种修复模型的结构示意图，图中左侧包括5阶编码器encoder，右侧包括5阶解码器decoder，相同阶数的编码器和解码器之间跳跃连接，在图4中通过从编码器指向解码器的虚线箭头所示。Referring to FIG4 , there is shown a schematic diagram of the structure of a repair model provided in an embodiment of the present application, wherein the left side of the figure includes a 5-order encoder encoder, and the right side includes a 5-order decoder decoder, and encoders and decoders of the same order are jump-connected, as indicated in FIG4 by a dotted arrow pointing from the encoder to the decoder.

5阶编码器对应的图像块分别为输入图像块的1/2、1/4、1/8、1/16、1/32，原始的降质样本图像输入至修复模型后，通过编码器进行5次卷积和下采样操作，每次下采样操作使得图像块大小变为原来的1/2，之后可以通过解码器进行5次卷积和上采样操作，每次上采样操作使得图像块大小变为原来的2倍，使得解码器输出的修复后的图像大小与输入的图像大小相同。The image blocks corresponding to the 5-order encoder are 1/2, 1/4, 1/8, 1/16, and 1/32 of the input image blocks, respectively. After the original degraded sample image is input into the restoration model, the encoder performs 5 convolution and downsampling operations. Each downsampling operation makes the image block size become 1/2 of the original size. Then, the decoder can perform 5 convolution and upsampling operations. Each upsampling operation makes the image block size become twice of the original size, so that the size of the restored image output by the decoder is the same as the input image size.

本实施例提供的方案，通过获取样本人脸图像以及样本非人脸图像，并将所述样本人脸图像贴合至所述样本非人脸图像中，得到目标样本图像以及所述目标样本图像对应的图像遮罩，图像遮罩可以用于表征所述目标样本图像中对应于所述样本人脸图像的区域；将所述目标样本图像进行降质处理，得到降质样本图像；使用所述降质样本图像，基于预设的损失函数训练修复模型，其中，所述损失函数包括全局损失函数以及对应于所述图像遮罩的局部损失函数，使得训练后的修复模型不仅能够实现视频清晰度的提升，还可以针对人像进行适应修复，不需要单独的人像修复模型和压缩去除模型，进而可实现更低的模型部署和推理成本。The solution provided in this embodiment obtains a sample face image and a sample non-face image, and fits the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image, wherein the image mask can be used to characterize the area in the target sample image corresponding to the sample face image; degrades the target sample image to obtain a degraded sample image; uses the degraded sample image to train a restoration model based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask, so that the trained restoration model can not only improve the video clarity, but also perform adaptive restoration for portraits, without the need for a separate portrait restoration model and a compression removal model, thereby achieving lower model deployment and inference costs.

本实施例提供的方案可以由任意具有数据处理能力的电子设备执行，包括但不限于服务器、pc、平板电脑、手机等。The solution provided in this embodiment can be executed by any electronic device with data processing capabilities, including but not limited to servers, PCs, tablet computers, mobile phones, etc.

参见图5，示出了本申请实施例提供的一种视频修复方法的流程示意图，如图所示，其包括：Referring to FIG5 , a schematic flow chart of a video repair method provided in an embodiment of the present application is shown, as shown in the figure, which includes:

S501、获取待修复的视频帧； S501, obtaining a video frame to be repaired;

本实施例中，视频帧可以为通过任意方式获得的视频帧，例如，剪辑前后的视频中的视频帧，压缩编码前后的视频帧等，均在本申请的保护范围内。In this embodiment, the video frames may be video frames obtained in any manner, for example, video frames in the video before and after clipping, video frames before and after compression encoding, etc., all of which are within the protection scope of the present application.

在播放视频的过程中，可以先将获得的视频帧作为待修复视频帧进行修复，再播放修复后的视频帧，以提高向用户播放的视频的质量。During the video playback process, the obtained video frames may be firstly repaired as the video frames to be repaired, and then the repaired video frames may be played, so as to improve the quality of the video played to the user.

S502、通过修复模型对所述视频帧进行修复处理。S502: Perform repair processing on the video frame using a repair model.

修复模型通过如上所述的方法训练得到。The repair model is trained using the method described above.

本实施例提供的视频修复方法可以适用于多种不同的场景，如：常规视频或者游戏的存储和流式传输。在一实施方式中，可以通过本申请实施例提供的修复模型，在编码前对视频进行修复，之后再进行编码，形成对应的视频码流，以在视频流服务或者其他类似的应用中存储和传输；又如：视频会议、视频直播等低延时场景。在一实施方式中，可以通过视频采集设备采集会议视频数据，并在编码后发送至会议终端，再通过本申请实施例提供的修复模型，对会议终端解码后的视频进行修复，从而得到播放的会议视频画面；还如：虚拟现实场景，可以对采集到的视频数据进行编码，形成对应的视频码流，并发送至虚拟现实相关设备(如VR虚拟眼镜等)，通过VR设备通过本申请实施例提供的修复模型，对视频码流进行解码得到的视频画面进行修复，并基于修复后的视频画面实现对应的VR功能，等等。The video repair method provided in this embodiment can be applied to a variety of different scenarios, such as: storage and streaming of conventional videos or games. In one embodiment, the video can be repaired before encoding through the repair model provided in the embodiment of the present application, and then encoded to form a corresponding video stream for storage and transmission in video streaming services or other similar applications; another example: low-latency scenarios such as video conferencing and live video. In one embodiment, the conference video data can be collected by a video acquisition device, and sent to the conference terminal after encoding, and then the video decoded by the conference terminal can be repaired through the repair model provided in the embodiment of the present application, so as to obtain the played conference video screen; another example: virtual reality scene, the collected video data can be encoded to form a corresponding video stream, and sent to virtual reality related equipment (such as VR virtual glasses, etc.), and the video screen obtained by decoding the video stream through the VR device through the repair model provided in the embodiment of the present application can be repaired, and the corresponding VR function can be realized based on the repaired video screen, and so on.

另外需要说明的是，本实施例提供的方案尤其适用于窄带高清的视频播放场景。一般来说，视频内容从采集到终端播放，要经历多个视频编辑、处理、重编码的操作，每一次处理/编码操作都都会对视频的画质产生影响，造成画质损失。即使采用采集质量较好的视频采集设备，由于中间环节的画质损失，最终传输至用户的也可能质量较差。It should also be noted that the solution provided in this embodiment is particularly suitable for narrowband high-definition video playback scenarios. Generally speaking, from video acquisition to terminal playback, video content must undergo multiple video editing, processing, and re-encoding operations. Each processing/encoding operation will affect the quality of the video and cause quality loss. Even if a video acquisition device with better acquisition quality is used, the quality of the video ultimately transmitted to the user may be poor due to the loss of quality in the intermediate links.

为此，在最后的转码过程中，一般会包含是视频修复处理，以修复传输过程造成的画质损失，在不增加传输视频的带宽的基础上提高视频质量。To this end, the final transcoding process generally includes video repair processing to repair the image quality loss caused by the transmission process and improve the video quality without increasing the bandwidth of the transmitted video.

本实施例提供的方案，可以用于上述场景中，以在更低的模型部署和推理成本的前提下，提高视频质量。The solution provided in this embodiment can be used in the above scenario to improve video quality under the premise of lower model deployment and inference costs.

根据本申请实施例提供的修复模型的训练方案，通过获取样本人脸图像以及样本非人脸图像，并将所述样本人脸图像贴合至所述样本非人脸图像中，得到目标样本图像以及所述目标样本图像对应的图像遮罩，图像遮罩可以用于表征所述目标样本图像中对应于所述样本人脸图像的区域；将所述目标样本图像进行降质处理，得到降质样本图像；使用所述降质样本图像，基于预设的损失函数训练修复模型，其中，所述损失函数包括全局损失函数以及对应于所述图像遮罩的局部损失函数，使得训练后的修复模型不仅能够实现视频清晰度的提升，还可以针对人像进行适应修复，不需要单独的人像修复模型和压缩去除模型，进而可实现更低的模型部署和推理成本。According to the training scheme of the restoration model provided in the embodiment of the present application, by acquiring a sample face image and a sample non-face image, and fitting the sample face image to the sample non-face image, a target sample image and an image mask corresponding to the target sample image are obtained, and the image mask can be used to characterize the area in the target sample image corresponding to the sample face image; the target sample image is degraded to obtain a degraded sample image; and the degraded sample image is used to train the restoration model based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask, so that the trained restoration model can not only improve the video clarity, but also perform adaptive restoration for portraits, without the need for a separate portrait restoration model and a compression removal model, thereby achieving lower model deployment and inference costs.

参照图6，示出了根据本申请实施例五的一种电子设备的结构示意图，本申请具体实施例并不对电子设备的具体实现做限定。6 , a schematic diagram of the structure of an electronic device according to Embodiment 5 of the present application is shown. The specific embodiments of the present application do not limit the specific implementation of the electronic device.

如图6所示，该电子设备可以包括：处理器(processor)602、通信接口(Communications Interface)604、存储器(memory)606、以及通信总线608。As shown in Figure 6, the electronic device may include: a processor (processor) 602, a communication interface (Communications Interface) 604, a memory (memory) 606, and a communication bus 608.

其中：in:

处理器602、通信接口604、以及存储器606通过通信总线608完成相互间的通信。 The processor 602 , the communication interface 604 , and the memory 606 communicate with each other via a communication bus 608 .

通信接口604，用于与其它电子设备或服务器进行通信。The communication interface 604 is used to communicate with other electronic devices or servers.

处理器602，用于执行程序610，具体可以执行上述修复模型的训练方法或者视频修复方法实施例中的相关步骤。The processor 602 is used to execute the program 610, and specifically can execute the training method of the above-mentioned restoration model or the relevant steps in the embodiment of the video restoration method.

在一实施方式中，程序610可以包括程序代码，该程序代码包括计算机操作指令。In one implementation, the program 610 may include program code, which includes computer operating instructions.

处理器602可能是CPU，或者是特定集成电路ASIC(Application Specific Integrated Circuit)，或者是被配置成实施本申请实施例的一个或多个集成电路。智能设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。Processor 602 may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. The one or more processors included in the smart device may be processors of the same type, such as one or more CPUs; or may be processors of different types, such as one or more CPUs and one or more ASICs.

存储器606，用于存放程序610。存储器606可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 606 is used to store the program 610. The memory 606 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk storage.

程序610可包括多条计算机指令，程序610具体可以通过多条计算机指令使得处理器602执行前述多个方法实施例中任一实施例所描述的修复模型的训练方法或者视频修复方法对应的操作。The program 610 may include multiple computer instructions. Specifically, the program 610 may enable the processor 602 to execute operations corresponding to the training method of the restoration model or the video restoration method described in any of the aforementioned method embodiments through the multiple computer instructions.

程序610具体可以通过多条计算机指令使得处理器602执行：获取样本人脸图像以及样本非人脸图像，并将所述样本人脸图像贴合至所述样本非人脸图像中，得到目标样本图像以及所述目标样本图像对应的图像遮罩的操作，图像遮罩可以用于表征所述目标样本图像中对应于所述样本人脸图像的区域；The program 610 may specifically enable the processor 602 to execute the following operations through a plurality of computer instructions: obtaining a sample face image and a sample non-face image, and fitting the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image, wherein the image mask may be used to characterize an area in the target sample image corresponding to the sample face image;

将所述目标样本图像进行降质处理，得到降质样本图像的操作；An operation of degrading the target sample image to obtain a degraded sample image;

使用所述降质样本图像，基于预设的损失函数训练修复模型的操作，其中，所述损失函数包括全局损失函数以及对应于所述图像遮罩的局部损失函数。The degraded sample image is used to train an operation of a restoration model based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask.

在一实施方式中，所述将所述目标样本图像进行降质处理，得到降质样本图像，包括：将所述目标样本图像进行在线降质处理和\或离线降质处理，得到所述降质样本图像；In one embodiment, the step of degrading the target sample image to obtain the degraded sample image comprises: performing online degrading and/or offline degrading on the target sample image to obtain the degraded sample image;

其中，所述在线降质处理包括：通过在线进程对所述目标样本图像进行模糊处理，得到降质样本图像；The online quality degradation process comprises: performing a fuzzy process on the target sample image through an online process to obtain a degraded sample image;

在一实施方式中，对输入的所述目标样本图像进行的模糊处理包括以下至少之一：高斯模糊、运动模糊。In one embodiment, the blur processing performed on the input target sample image includes at least one of the following: Gaussian blur and motion blur.

在一实施方式中，所述修复模型包括多阶编码器和解码器，同阶的所述编码器和解码器之间跳跃连接，其中，所述编码器和所述解码器的阶数大于等于5，以使得所述修复模型的最小特征图的大小为输入图像块大小的1/32。In one embodiment, the repair model includes a multi-stage encoder and a decoder, and the encoder and decoder of the same stage A jump connection is provided between the encoders and the decoders, wherein the orders of the encoder and the decoder are greater than or equal to 5, so that the size of the minimum feature map of the restoration model is 1/32 of the size of the input image block.

程序610具体还可以通过多条计算机指令使得处理器602执行：获取待修复的视频帧的操作；通过修复模型对所述视频帧进行修复处理的操作，其中，所述修复模型通过如上所述的方法训练得到。The program 610 can also specifically enable the processor 602 to execute, through a plurality of computer instructions: an operation of obtaining a video frame to be repaired; an operation of repairing the video frame using a repair model, wherein the repair model is trained by the method described above.

程序610中各步骤的具体实现可以参见上述方法实施例中的相应步骤和单元中对应的描述，并具有相应的有益效果，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。The specific implementation of each step in program 610 can refer to the corresponding description of the corresponding steps and units in the above method embodiment, and has corresponding beneficial effects, which will not be repeated here. Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the above method embodiment, which will not be repeated here.

本申请实施例还提供了一种计算机存储介质，其上存储有计算机程序，该程序被处理器执行时实现前述多个方法实施例中任一实施例所描述的方法。该计算机存储介质包括但不限于：只读光盘(Compact Disc Read-Only Memory，CD-ROM)、随机存储器(Random Access Memory，RAM)、软盘、硬盘或磁光盘等。The present application also provides a computer storage medium on which a computer program is stored. When the program is executed by a processor, the method described in any of the above-mentioned method embodiments is implemented. The computer storage medium includes, but is not limited to, a compact disc read-only memory (CD-ROM), a random access memory (RAM), a floppy disk, a hard disk, or a magneto-optical disk.

本申请实施例还提供了一种计算机程序产品，包括计算机指令，该计算机指令指示计算设备执行上述多个方法实施例中的任一修复模型的训练方法或者视频修复方法对应的操作。An embodiment of the present application also provides a computer program product, including computer instructions, which instruct a computing device to execute operations corresponding to any repair model training method or video repair method in the above-mentioned multiple method embodiments.

此外，需要说明的是，本申请实施例所涉及到的与用户有关的信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于对模型进行训练的样本数据、用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据，并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准，并提供有相应的操作入口，供用户选择授权或者拒绝。In addition, it should be noted that the user-related information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to sample data used to train the model, data used for analysis, stored data, displayed data, etc.) involved in the embodiments of the present application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with the relevant laws, regulations and standards of the relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

需要指出，根据实施的需要，可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤，也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤，以实现本申请实施例的目的。It should be pointed out that, according to the needs of implementation, the various components/steps described in the embodiments of the present application can be split into more components/steps, or two or more components/steps or partial operations of components/steps can be combined into new components/steps to achieve the purpose of the embodiments of the present application.

上述根据本申请实施例的方法可在硬件、固件中实现，或者被实现为可存储在记录介质(诸如CD-ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码，或者被实现通过网络下载的原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码，从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如专用集成电路(Application Specific Integrated Circuit，ASIC)或现场可编辑门阵列(Field Programmable Gate Array，FPGA))的记录介质上的这样的软件处理。可以理解，计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如，随机存储器(Random Access Memory，RAM)、只读存储器(Read-Only Memory，ROM)、闪存等)，当所述软件或计算机代码被计算机、处理器或硬件访问且执行时，实现在此描述的方法。此外，当通用计算机访问用于实现在此示出的方法的代码时，代码的执行将通用计算机转换为用于执行在此示出的方法的专用计算机。The above-mentioned method according to the embodiment of the present application can be implemented in hardware, firmware, or implemented as software or computer code that can be stored in a recording medium (such as a CD-ROM, RAM, floppy disk, hard disk or magneto-optical disk), or implemented as a computer code originally stored in a remote recording medium or a non-temporary machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein can be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor or programmable or dedicated hardware (such as an application-specific integrated circuit (ASIC) or a field programmable gate array (FPGA)). It can be understood that a computer, a processor, a microprocessor controller or programmable hardware includes a storage component (for example, a random access memory (RAM), a read-only memory (ROM), a flash memory, etc.) that can store or receive software or computer code, and when the software or computer code is accessed and executed by a computer, a processor or hardware, the method described herein is implemented. In addition, when a general-purpose computer accesses a computer program used to implement the method described herein, the method described herein is implemented. When code for the method shown is provided, execution of the code transforms a general-purpose computer into a special-purpose computer for performing the method shown herein.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及方法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请实施例的范围。Those of ordinary skill in the art will appreciate that the units and method steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.

以上实施方式仅用于说明本申请实施例，而并非对本申请实施例的限制，有关技术领域的普通技术人员，在不脱离本申请实施例的精神和范围的情况下，还可以做出各种变化和变型，因此所有等同的技术方案也属于本申请实施例的范畴，本申请实施例的专利保护范围应由权利要求限定。 The above implementation methods are only used to illustrate the embodiments of the present application, and are not limitations on the embodiments of the present application. Ordinary technicians in the relevant technical field can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application. Therefore, all equivalent technical solutions also belong to the scope of the embodiments of the present application. The scope of patent protection of the embodiments of the present application should be limited by the claims.

Claims

A training method for a repair model, comprising:

Acquire a sample face image and a sample non-face image, and fit the sample face image to the sample non-face image to obtain a target sample image and an image mask corresponding to the target sample image, wherein the image mask is used to characterize an area in the target sample image corresponding to the sample face image;

Degrading the target sample image to obtain a degraded sample image;

Using the degraded sample image, a restoration model is trained based on a preset loss function, wherein the loss function includes a global loss function and a local loss function corresponding to the image mask.

The method according to claim 1, wherein the step of degrading the target sample image to obtain the degraded sample image comprises:

Performing online degradation processing and/or offline degradation processing on the target sample image to obtain the degraded sample image;

The online quality degradation process comprises: performing a fuzzy process on the target sample image through an online process to obtain a degraded sample image;

The offline degradation processing includes: submitting the target sample image to an offline video compression encoding program, and obtaining an encoded image output by the offline video compression encoding program as the degraded sample image.

The method according to claim 2, wherein the online process includes at least two-order degradation links, and in the degradation links of two adjacent orders, the output of the degradation link of the previous order is used as the input of the degradation link of the next order, and the output of the degradation link of the last order is the degraded sample image.

According to the method of claim 3, the degradation link includes a blur processing module for blurring the input target sample image; a sampling module for upsampling and/or downsampling the blurred target sample image; and a compression module for encoding and compressing the sampled target sample image to obtain the degraded target sample image.

The method according to claim 4, wherein the blur processing performed on the input target sample image comprises at least one of the following: Gaussian blur and motion blur.

The method according to claim 1, wherein the size of the minimum feature map of the restoration model is 1/32 of the input image block size, and the input image block size of the restoration model is greater than or equal to 1/3 of the target sample image.

The method according to claim 6, wherein the restoration model comprises a multi-order encoder and decoder, and the encoders and decoders of the same order are jump-connected, wherein the order of the encoder and the decoder is greater than or equal to 5, so that the size of the minimum feature map of the restoration model is 1/32 of the input image block size.

A video repair method, comprising:

Get the video frame to be repaired;

The video frame is repaired by a repair model, and the repair model is trained by the method according to any one of claims 1 to 7.

An electronic device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform an operation corresponding to the method according to any one of claims 1 to 7.

A computer storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.