HK40015762A

HK40015762A - Method, apparatus, terminal and storage medium for video noise reduction

Info

Publication number: HK40015762A
Application number: HK42020006044.0A
Authority: HK
Inventors: 李本超; 李峰; 刘程浩; 刘毅; 艾通
Original assignee: 腾讯科技（深圳）有限公司
Filing date: 2020-04-17
Publication date: 2020-09-04

Description

Video noise reduction method, device, terminal and storage medium

Technology neighborhood

The present application relates to the field of multimedia technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for video denoising.

Background

With the development of multimedia technology, the office collaboration product becomes an indispensable meeting tool for more and more large and medium-sized enterprises, wherein the remote video conference is an important component in the office collaboration product, and great convenience is provided for the enterprises. In the communication process of the remote video conference, the video collected by the camera often contains a great deal of noise, and if the noise is not reduced, the video conference effect is poor.

In the related art, when processing noise contained in a video, firstly, a current video image is subjected to time-domain filtering according to a previous video image subjected to noise reduction, and a time-domain filtering weight of each pixel point of the current video image is obtained. And performing spatial filtering based on direction statistics on the current video image to obtain a spatial filtering result of each pixel point of the current video image, and then performing weighted fusion on the spatial filtering result and the previous video image according to the temporal filtering weight to obtain a noise reduction result of the current video image.

In the process of filtering a video image, when the subsequent pixel points are sequentially filtered, the pixel points in the neighborhood often include the pixel points which have been previously filtered, so that the subsequent pixel points have dependency on the processed pixel points, and the filtering process is a serial process, which results in a slow algorithm operation speed.

Disclosure of Invention

The embodiment of the application provides a video denoising method, a video denoising device, a terminal and a storage medium, which are used for solving the problem of low arithmetic operation speed in the related technology. The technical scheme is as follows:

in one aspect, a video denoising method is provided, including:

performing spatial filtering for removing pixel dependence on pixel points of a target image in a video to be processed to obtain a first image;

according to the frame difference between the first image and the first noise reduction image, performing time domain filtering on pixel points of the target image in parallel to obtain a second image, wherein the first noise reduction image is an image which is subjected to noise reduction processing and corresponds to a previous frame of image of the target image;

and fusing the first image and the second image according to the gain coefficient corresponding to the pixel point of the second image to obtain a second noise-reduced image which is subjected to noise reduction processing and corresponds to the target image.

In another aspect, a video noise reduction apparatus is provided, including:

the spatial filtering module is used for performing spatial filtering for removing pixel dependence on pixel points of a target image in a video to be processed to obtain a first image;

the time domain filtering module is used for performing time domain filtering on pixel points of the target image in parallel according to a frame difference between the first image and the first noise reduction image to obtain a second image, wherein the first noise reduction image is an image which is subjected to noise reduction processing and corresponds to a previous frame image of the target image;

and the fusion module is used for fusing the first image and the second image according to the gain coefficient corresponding to the pixel point of the second image to obtain a second noise-reduced image which corresponds to the target image and is subjected to noise reduction processing.

In an optional implementation manner, the spatial filtering module is further configured to obtain, for all pixel points of a target image in the video to be processed, an initial pixel value of at least one neighborhood pixel point of each pixel point; and performing spatial filtering on the pixel points according to the initial pixel value of the at least one neighborhood pixel point.

In another optional implementation manner, the apparatus further includes:

the interface calling module is used for calling an image processing interface of the graphics processor, and the image processing interface is used for carrying out spatial filtering for removing pixel dependence on pixel points of a target image in a video to be processed in parallel;

and the parallel acquisition module is used for acquiring each pixel point of the target image in the video to be processed in parallel.

In another optional implementation manner, the time-domain filtering module is further configured to obtain each pixel point of the target image in parallel; for any pixel point of the target image, determining a second variance of the pixel point according to a corresponding first variance of the pixel point in the first noise-reduced image, a frame difference between the first image and the first noise-reduced image and a variance bias coefficient; determining a first gain coefficient corresponding to the pixel point according to the second variance, the first gain offset coefficient corresponding to the pixel point and a motion compensation coefficient; and determining a first pixel value of the pixel point after time domain filtering according to the first gain coefficient, the initial pixel value of the pixel point and the corresponding noise reduction pixel value of the pixel point in the first noise reduction image.

In another optional implementation manner, the apparatus further includes:

a first determining module, configured to determine the motion compensation coefficient according to the frame difference.

In another optional implementation manner, the apparatus further includes:

the acquisition module is used for acquiring a second gain coefficient and a second gain offset coefficient corresponding to the pixel point in the first noise reduction image;

and the second determining module is used for determining the first gain offset coefficient corresponding to the pixel point according to the second gain coefficient and the second gain offset coefficient.

In another optional implementation manner, the time-domain filtering module is further configured to use a product of a first gain coefficient corresponding to the pixel point and a first pixel value of the pixel point as a first fusion value for any pixel point of the second image; taking a product of a first gain coefficient corresponding to the pixel point and a second pixel value of the pixel point as a second fusion value, wherein the second pixel value is a pixel value of the pixel point after spatial filtering; and summing the first fusion value and the second fusion value to obtain a noise reduction pixel value corresponding to the pixel point.

In another alternative implementation, the spatial filtering and the temporal filtering process the luminance components of the pixel points separately.

In another aspect, a terminal is provided, where the terminal includes a processor and a memory, and the memory is used to store at least one program code, where the at least one program code is loaded and executed by the processor to implement the operations performed in the video noise reduction method in the embodiments of the present application.

In another aspect, a storage medium is provided, where at least one program code is stored, and the at least one program code is used for being executed by a processor and implementing a video noise reduction method in the embodiment of the present application.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

in the embodiment of the application, the spatial filtering for removing the pixel dependence is performed on the pixel points of the target image, so that the dependency relationship does not exist among the pixel points in the target image, and the time-domain filtering is performed on the pixel points of the target image in parallel according to the frame difference between the first image and the first noise-reduction image obtained by the spatial filtering, so that the video noise reduction process is converted from the serial processing to the parallel processing, and the noise reduction process is accelerated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for a person skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a video image captured by a low-performance camera configured in a notebook computer according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a video conference provided in an embodiment of the present application;

fig. 3 is a block diagram of a video noise reduction system according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a video denoising method provided in an embodiment of the present application;

FIG. 5 is a diagram of an image before pixel dependency is removed by filtering according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an image after pixel dependency is removed by filtering according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating a spatial filtering effect comparison according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a pre-and post-noise reduction process according to an embodiment of the present disclosure;

fig. 9 is a schematic key flow diagram of a video denoising method according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating an algorithm flow of a video denoising method according to an embodiment of the present application;

fig. 11 is a block diagram of a video noise reduction apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of a terminal according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The embodiments of the present application mainly relate to a scene in which a video is subjected to noise reduction processing, and a description will be given of an example in which a teleconference video is subjected to noise reduction processing. The remote video conference is an important part of various functions of an office collaborative product, has very strict requirements on acquired videos, and generally needs to use a high-definition camera for video acquisition. When the camera with weak performance is used for video acquisition, the acquired video generally has noise, and if the noise is not processed, the experience of a video conference is poor. For example, referring to fig. 1, fig. 1 is a video image captured by a low-performance camera configured in a notebook computer, and as can be seen from fig. 1, the image contains a large amount of noise. Optionally, the embodiment of the present application may also be applied to noise reduction processing of a video acquired by a camera of a mobile phone in a video call process, or noise reduction processing of a video acquired by a monitoring device, and the like, and the embodiment of the present application does not limit this.

The following briefly introduces a video denoising method provided by an embodiment of the present application. In order to enable the video collected by the camera to meet the requirement of the remote video conference, the collected video is usually subjected to noise reduction processing. At present, there are various methods for performing noise reduction processing on a video, and these methods generally implement noise reduction processing on a video by running a video noise reduction related algorithm through a CPU of a terminal. Because the office coordination product not only includes the function of the teleconference video, but also includes various functions such as process approval, project management and the like, if the teleconference function occupies most of the CPU resources, other functions of the office coordination product cannot be used normally, or the office coordination product cannot be used in most scenes due to high requirements on the processing capacity of the CPU. The video denoising method provided by the embodiment of the present application enables the pixels in the image to meet the requirement of parallel computing by removing the dependency relationship between the pixels, and since the parallel computing capability of a GPU (Graphics Processing Unit) is stronger than that of a CPU (central Processing Unit), the video denoising method provided by the embodiment of the present application replaces the CPU with an image Processing interface such as Metal (an image Processing interface provided by apple inc.) or DirectX (an image Processing interface provided by microsoft corporation) provided by the GPU, so as to implement parallel Processing on each pixel. Therefore, the processing speed of the video in the noise reduction processing is improved, and the occupation of a CPU is reduced. The video denoising method provided by the embodiment of the application can realize rapid video denoising with very low CPU occupancy rate, and the denoised video stream is transmitted to a far-end for displaying, so that a great amount of CPU resources are provided for enabling other functions of an office collaborative product while good video conference experience is ensured. Fig. 2 shows the above flow, and fig. 2 is a schematic flow chart of a video conference provided in an embodiment of the present application. As shown in fig. 2, a video image acquired by the camera is displayed locally after being subjected to operations such as noise reduction, for example, the video image is displayed on a screen of a notebook computer, the video image subjected to noise reduction is encoded by an encoder and transmitted to a remote end through a network, a decoder of the remote end decodes the video image, and the decoded video image is displayed at the remote end, which may also be the notebook computer.

Fig. 3 is a block diagram of a video noise reduction system 300 according to an embodiment of the present application, where the video noise reduction system 300 may be used to implement video noise reduction, and includes: a terminal 310 and a video service platform 320.

The terminal 310 may be connected to the video service platform 320 through a wireless network or a wired network. The terminal 310 may be at least one of a smartphone, a camcorder, a desktop computer, a tablet computer, an MP4 player, and a laptop portable computer. The terminal 310 is installed and operated with an application program supporting the remote video conference. Illustratively, the terminal 310 may be a terminal used by a user, and an account of the user is logged in an application program run by the terminal.

The video service platform 320 includes at least one of a server, a plurality of servers, and a cloud computing platform. The video service platform 320 is used for providing background services of the remote video conference, such as user management, video stream forwarding and the like. Optionally, the video service platform 320 includes: the system comprises an access server, a data management server, a user management server and a database. The access server is used to provide an access service for the terminal 310. And the data management server is used for forwarding the video stream uploaded by the terminal and the like. The number of the data management servers may be one or more, and when the number of the data management servers is multiple, there are at least two data management servers for providing different services, and/or there are at least two data management servers for providing the same service, such as providing the same service in a load balancing manner or providing the same service in a manner of a primary server and a mirror server, which is not limited in the embodiments of the present application. The database is used for storing account information of the user. The account information is data information that the user has authorized to collect.

Terminal 310 may refer broadly to one of a plurality of terminals, and the present embodiment is illustrated with only local terminal 310 and two remote terminals 310. Those skilled in the art will appreciate that the number of terminals may be greater or fewer. For example, the number of the remote terminals may be only one, or the number of the remote terminals may be tens or hundreds, or more. The number and type of the terminals 310 are not limited in the embodiments of the present application.

Fig. 4 is a flowchart of a video denoising method according to an embodiment of the present application, as shown in fig. 4. The method comprises the following steps:

401. the terminal carries out spatial filtering for removing pixel dependence on pixel points of a target image in a video to be processed to obtain a first image.

In this embodiment of the application, the terminal may implement spatial filtering on the pixel point of the target image based on the first filter, that is, the target image is input into the first filter, and then the output of the first filter is the first image after spatial filtering. The first filter may be an improved bilateral filter, and the first filter may process pixel points of the target image in parallel. The first filter is explained below:

in the field of image denoising, a bilateral filtering algorithm is a nonlinear edge-preserving filtering algorithm and is a compromise processing method combining the spatial proximity and the pixel value similarity of an image. The bilateral filtering algorithm considers spatial information and gray level similarity at the same time so as to achieve the purpose of edge-preserving denoising, and has the characteristics of simplicity, non-iteration and locality, wherein the edge-preserving denoising refers to replacing the original pixel value of a pixel point by the average value of at least one neighborhood pixel point of the currently processed pixel point. In the process of filtering an image to be processed by a bilateral filter based on a bilateral filtering algorithm, for a pixel point of the image to be processed, a filtering template is generally used for scanning the whole image in a mode of firstly scanning from left to right and then from top to bottom (or firstly scanning from top to bottom and then from left to right or in other sequential modes of completing the scanning of the whole image). When spatial filtering is performed on a pixel point, the spatial filtering is usually implemented by performing linear or nonlinear processing on a neighborhood pixel point of the currently processed pixel point. In the process of filtering an image to be processed, when a subsequent pixel in a processing sequence is subjected to filtering processing, the pixel in the neighborhood of the pixel often comprises a pixel which is subjected to spatial filtering processing before, so that the pixel in the subsequent sequence has a dependency relationship on the pixel which is subjected to filtering processing, and the spatial filtering processing of the whole image is changed into a serial processing process due to the dependency relationship. The principle of the method can be seen in formulas (1) and (2).

Wherein the content of the first and second substances,expressing the pixel value of a pixel point which is currently processed in an image after spatial filtering, I (p) expressing the pixel value of the pixel point which is currently processed in the image, I (q) expressing the pixel value of a neighborhood pixel point of the pixel point which is currently processed in the image, p expressing the coordinate of the pixel point which is currently processed in the image, q expressing the coordinate of the neighborhood pixel point of the pixel point which is currently processed in the image, omega (p, q) expressing the weight related to the position of the pixel point, g (·) expressing a Gaussian function, and sigma_sAnd σ_rThe sub-table represents the variance of the gaussian function.

It should be noted that i (q) corresponding to the neighborhood pixel point before the currently processed pixel point is the pixel value after spatial filtering, and i (q) corresponding to the neighborhood pixel point after the currently processed pixel point is the original pixel value of the neighborhood pixel point.

For example, referring to fig. 5, fig. 5 is a schematic diagram before pixel dependency is removed by image filtering according to an embodiment of the present application. In fig. 5, the currently processed pixel is a central pixel, the central pixel corresponds to 12 neighborhood pixels, the neighborhood pixels located on the left side and above the central pixel are processed pixels, and the neighborhood pixels located on the right side and below the central pixel are unprocessed pixels.

Because the spatial filtering process is a serial process and consumes a long time compared with a parallel process, in the embodiment of the application, the processing process is improved for the first time, namely the bilateral filter is improved, the pixel dependence between pixel points is removed, and the first filter is obtained. The first filter is also based on a bilateral filtering algorithm, and is different in that when the pixel point of the target image is subjected to filtering processing through the formulas (1) and (2), the pixel value of the neighborhood pixel point of the pixel point, namely the value of I (q), is the most original pixel value of the image, namely the pixel value after filtering processing is not used. Therefore, each pixel does not depend on the pixel arranged before the current pixel in the processing sequence, and the influence of the pixel arranged before the current pixel in the processing sequence on the current pixel after filtering is eliminated.

For example, referring to fig. 6, fig. 6 is a schematic diagram of an image filter according to an embodiment of the present application after removing pixel dependency. In fig. 6, the currently processed pixel point is a central pixel point, the central pixel point corresponds to 12 neighborhood pixel points, and the 12 neighborhood pixel points are all unprocessed pixel points, that is, the pixel values of the neighborhood pixel points are all initial pixel values.

Because the pixel dependence among the pixel points is removed, the processing process of the terminal for performing spatial filtering on each pixel point based on the first filter is the same. The method comprises the following steps: for all pixel points of the target image, the terminal can obtain the initial pixel value of at least one neighborhood pixel point of each pixel point. And then the terminal can carry out spatial filtering on the pixel point through a first filter according to the initial pixel value of the at least one neighborhood pixel point to obtain the pixel value of the pixel point after spatial filtering. And when the terminal finishes processing all the pixel points of the target image based on the first filter, obtaining a first image, wherein the pixel value of each pixel point in the first image is the pixel value after spatial filtering. Referring to fig. 7, fig. 7 is a schematic diagram illustrating a spatial filtering effect comparison according to an embodiment of the present application. Fig. 7 shows an example of a target image, an image filtered by a bilateral filter, and an image filtered by a first filter.

It should be noted that, because the parallel computing capability of the GPU is stronger than that of the CPU, in the embodiment of the present application, the processing procedure is improved for the second time, that is, the terminal may transfer the step of performing spatial filtering on the pixel points of the target image to the GPU by calling an image processing interface provided by the GPU, such as Metal or DirectX. Correspondingly, the terminal can also call an image processing interface of the graphic processor, and the image processing interface is used for performing pixel dependency removal airspace filtering on the pixel points of the target image in the video to be processed in parallel and acquiring each pixel point of the target image in the video to be processed in parallel, so that the pixel dependency removal airspace filtering on the pixel points of the target image in the video to be processed in parallel is realized, the whole airspace filtering process is accelerated, the resources of a CPU (central processing unit) are saved, and the occupancy rate of the CPU is reduced.

402. The terminal acquires a first noise reduction image which is an image subjected to noise reduction processing and corresponding to a previous frame image of the target image.

In the embodiment of the application, after the terminal performs spatial filtering on the pixel points of the target image, the terminal may also perform temporal filtering on the pixel points of the target image. Before performing time-domain filtering on pixel points of a target image, a terminal may obtain a first noise-reduced image which is subjected to noise reduction processing and corresponds to a previous frame image of the target image. And performing subsequent steps of temporally filtering the target image based on the first noise-reduced image and the first image.

403. The terminal determines a frame difference between the first image and the first noise-reduced image.

In this embodiment of the application, after the terminal acquires the first noise reduction image, the terminal may store the noise reduction pixel value of each pixel of the first noise reduction image after noise reduction processing in a form of a two-dimensional array, correspondingly, the terminal may also store the filtering pixel value of each pixel of the first image in a form of a two-dimensional array, the pixel of the first image corresponds to each pixel of the first noise reduction image one to one, and the size of the two-dimensional array is a product of the height of the target image and the width of the image. The terminal may calculate, for any pixel point, a difference between a noise reduction pixel value corresponding to the pixel point in the first noise reduction image and a filtering pixel value corresponding to the pixel point in the first image, and use the difference as a pixel frame difference corresponding to the pixel point. Resulting in a frame difference between the first image and the first noise reduced image, which may be in the form of a two-dimensional array.

404. And the terminal performs time domain filtering on the pixel points of the target image in parallel according to the frame difference between the first image and the first noise reduction image to obtain a second image.

In this embodiment, after obtaining the frame difference between the first image and the first noise-reduced image, the terminal may input the frame difference between the first image and the first noise-reduced image and the target image into a second filter, and perform time-domain filtering based on the second filter, where an output of the second filter is the second image. The second filter may be an improved kalman filter based on a kalman filtering algorithm, that is, a third improvement in the embodiment of the present application is to improve the kalman filter based on the kalman filtering algorithm to obtain the second filter. The second filter is explained below:

the time-domain filtering process based on the Kalman filtering algorithm mainly comprises two steps, namely prediction and correction. When the prediction step is carried out, the terminal predicts the corresponding pixel value and variance of any pixel point in the target image based on the corresponding noise reduction pixel value and variance of the pixel point in the first noise reduction image. During the correction step, the terminal determines a gain coefficient corresponding to each pixel point, and determines a first pixel value of the pixel point after time-domain filtering according to the gain coefficient, a pixel value of the pixel point corresponding to the target image and a noise-reduced pixel value of the pixel point corresponding to the first noise-reduced image. The principle of implementation of the above steps can be seen in the following formulas (3) to (7).

Wherein the content of the first and second substances,the pixel values representing the predicted pixel points in the target image,and representing the corresponding noise reduction pixel value of the pixel point in the first noise reduction image.

Wherein the content of the first and second substances,and representing the variance of the predicted pixel points in the target image. P_k-1And expressing the corresponding variance of the pixel points in the first noise reduction image, and Q expresses a variance offset coefficient.

Wherein, K_kThe gain coefficient corresponding to the pixel point is shown, and R represents the gain offset coefficient.

Wherein x is_kRepresenting the temporally filtered pixel value, z, of a pixel point_kAnd expressing the pixel value of the pixel point in the target image.

Wherein, P_kAnd the variance of the pixel points needed in the next frame of image is represented.

In order to make the algorithm operation speed faster, the video denoising method provided by the embodiment of the application optimizes the formula (4), and introduces a frame difference when calculating the variance to obtain the formula (8).

Where Δ represents a frame difference between the first image and the first noise-reduced image.

In order to solve the problem of motion jitter in the noise reduction filtering process, the video noise reduction method provided by the embodiment of the application adds the formula (9) and the formula (10), and optimizes the formula (5) to obtain the formula (11).

R_κ＝1+R_k-1(1+K_k-1)^-1 (9)；

Wherein R is_kRepresenting the pixel point in the corresponding gain offset coefficient R of the target image_k-1Representing the gain offset coefficient, K, of the pixel point in the first noise-reduced image_k-1And representing the corresponding gain coefficient of the pixel point in the first noise reduction image.

Wherein, U_kRepresenting the motion compensation coefficients.

Accordingly, this step can be realized by the following sub-steps 4041 to 4043. Since the terminal can perform time-domain filtering on the pixel points of the target image in parallel, in sub-steps 4041 to 4044, an example is described by taking any pixel point in the target image as an example, and the processing manner of other pixel points is the same as that of the pixel point. And when the terminal finishes processing all the pixel points of the target image, obtaining a second image.

4041. And the terminal determines a second variance of the pixel point according to a corresponding first variance of the pixel point in the first noise reduction image, a frame difference between the first image and the first noise reduction image and a variance bias coefficient.

For example, the corresponding first variance of the pixel point in the first noise-reduced image is P_k-1The frame difference between the first image and the first noise-reduced image is Δ, and the variance offset coefficient is Q, according to the above equation (8), that isThe second variance of the pixel point can be calculated

4042. The terminal obtains a second gain coefficient and a second gain offset coefficient corresponding to the pixel point in the first noise reduction image, and determines a first gain offset coefficient corresponding to the pixel point according to the second gain coefficient and the second gain offset coefficient.

For example, the second gain coefficient corresponding to the pixel point in the first noise-reduced image is K_k-1The second gain offset coefficient corresponding to the pixel point in the first noise reduction image is R_k-1According to the formula (9), the first gain offset coefficient R corresponding to the pixel point can be calculated_k。

4043. And the terminal determines the motion compensation coefficient corresponding to the pixel point according to the frame difference.

For example, the frame difference is Δ, and the motion compensation coefficient U corresponding to the pixel point can be calculated according to the formula (10)_k。

4044. And the terminal determines a first gain coefficient corresponding to the pixel point according to the second variance, the first gain offset coefficient corresponding to the pixel point and the motion compensation coefficient.

For example, the second variance obtained for sub-steps 4041 through 4043 is determined according to equation (11) aboveFirst gain offset coefficient R_kAnd a motion compensation coefficient U_kCalculating to obtain a first gain coefficient K corresponding to the pixel point_k。

It should be noted that the terminal obtains the first gain coefficient K corresponding to the pixel point_kThen, the second variance can also be obtained according to equation (7)To determine the third variance P that the pixel point needs to use in the next frame of image_k。

405. And the terminal fuses the first image and the second image according to the gain coefficient corresponding to the pixel point of the second image to obtain a second noise reduction image which is corresponding to the target image and has undergone noise reduction processing.

In the embodiment of the application, the terminal further obtains a gain coefficient corresponding to the pixel point of the second image in the process of obtaining the second image by performing time-domain filtering on the pixel point of the target image. For any pixel point, the terminal may use a product of a first gain coefficient corresponding to the pixel point and a first pixel value of the pixel point as a first fusion value, and use a product of the first gain coefficient corresponding to the pixel point and a second pixel value of the pixel point as a second fusion value, where the second pixel value is a pixel value of the pixel point after spatial filtering. And the terminal sums the first fusion value and the second fusion value to obtain a noise reduction pixel value corresponding to the pixel point. Accordingly, the above summation process can be implemented according to equation (12).

Wherein the content of the first and second substances,representing the noise reduction pixel values corresponding to the pixel points.

And when all the pixel points are fused, obtaining the target image after noise reduction. For example, referring to fig. 8, fig. 8 is a schematic diagram illustrating a pre-and post-comparison of a noise reduction process according to an embodiment of the present application. Fig. 8 includes a target image before noise reduction and a target image after noise reduction, and it can be known from the figure that noise in the target image after noise reduction is significantly reduced compared with the target image before noise reduction, that is, the video noise reduction method provided in the embodiment of the present application effectively implements noise reduction processing on the target image.

It should be noted that, the above steps 401 to 405 are optional implementation manners of the video denoising method provided in this embodiment of the application, and the corresponding video denoising method may also not be executed according to the sequence of the above steps 401 to 405, or alternatively, a third filter may also be optionally provided, where the structure of the third filter is the same as that of the first filter, and the third filter, the first filter, and the second filter perform parallel processing on each pixel point in the target image by calling an image processing interface of the GPU, so as to implement denoising processing on the target image.

For example, referring to fig. 9, fig. 9 is a schematic key flow diagram of a video denoising method according to an embodiment of the present application. As shown in fig. 9, the image includes three parts of input, noise reduction processing, and output, the target image f is input^CAnd a first noise-reduced imageIn the noise reduction portion, the first filter and the third filter are respectively used as image noise reduction filters F₁And F₂To indicate. Kalman filter F for second filter_kTo indicate. Parallel acceleration is performed by an image processing interface of the GPU. When the terminal carries out noise reduction on the target image, the terminal passes through an image noise reduction filter F₁For the target image f^CProcessing to obtain a first imageCalculating a first noise-reduced image from the processing resultAnd a first imageFrame difference f between^DDifference of frame f^DAnd a target image f^CInput Kalman filter F_kThe Kalman filter F_kOutput result second image and image noise reduction filter F₂The output results are fused to obtain a second noise reduction image which is corresponding to the target image and has undergone noise reduction processingOptionally, the second noise-reduced image may be stored in a Kalman filter to participate in the subsequent imageAnd (6) operation. Algorithm flow corresponding to the flow shown in fig. 9 can be seen in fig. 10, and fig. 10 is a schematic diagram of the algorithm flow of a video denoising method provided in the embodiment of the present application. Wherein, the initialization parameters comprise: p is 0, Q is 0.05, R is 0, K is 0, and the pixel value obtained by F1 for the previous frame image is initialized to zero. The spatial filtering of the target image comprises:wherein the arrows represent assignments. Temporally filtering the target image includes:the time-domain filtering of any pixel point of the target image comprises the following steps:calculating a frame difference; r_k←1+R_k-1(1+K_k-1)^-1Calculating a gain offset coefficient;taking the corresponding noise reduction pixel value in the first noise reduction image as the pixel value of a predicted pixel point in the target image;calculating a second variance;calculating a motion compensation coefficient;calculating a first gain factor;calculating a pixel value of a pixel point after time domain filtering;calculating a noise reduction pixel value;calculating the variance used for the next frame of image and returning

It should be further noted that, in the video denoising method provided in this embodiment of the present application, when performing spatial filtering on an image, dependency relationships of each pixel are removed, so that the GPU can perform parallel computation on each pixel, and when performing temporal filtering, there is no problem of pixel dependency as well, and the GPU can also perform parallel computation on each pixel, so that the whole video denoising process can be performed in parallel. When the complex noise reduction processing process is completed by migrating to the GPU, the CPU occupancy rate of the computer end is very low. In addition, in order to further speed up the processing procedure of noise reduction, the video noise reduction method provided in the embodiment of the present application is a fourth improvement, that is, when the format of the input image is set to adopt the ycbcr (yuv) format, and when the image is subjected to noise reduction processing, the first filter and the second filter perform spatial filtering and temporal filtering on the luminance component of the target image respectively, that is, only the Y channel representing the luminance detail information is subjected to noise reduction processing.

In order to show the effect of the video noise reduction method provided by the embodiment of the application in the aspect of saving the CPU occupancy rate more clearly, a comparison experiment is also carried out, and in the comparison experiment, two notebook computers of different models are adopted for comparison. The comparative results can be seen in table 1.

TABLE 1

As can be seen from table 1, the occupancy rate of the CPU is significantly reduced when decoupling and GPU parallel computing are used, compared to when decoupling and GPU parallel computing are not used.

Fig. 11 is a block diagram of a video noise reduction apparatus according to an embodiment of the present application. The apparatus is used for executing the steps of the video denoising method, and referring to fig. 11, the apparatus includes: a spatial filtering module 1101, a temporal filtering module 1102, and a fusion module 1103.

The spatial filtering module 1101 is configured to perform spatial filtering for removing pixel dependency on a pixel point of a target image in a video to be processed to obtain a first image;

the time domain filtering module 1102 is configured to perform time domain filtering on pixel points of the target image in parallel according to a frame difference between the first image and the first noise-reduced image to obtain a second image, where the first noise-reduced image is an image that has been subjected to noise reduction processing and corresponds to a previous frame image of the target image;

the fusion module 1103 is configured to fuse the first image and the second image according to a gain coefficient corresponding to a pixel point of the second image, so as to obtain a second noise-reduced image which is corresponding to the target image and has undergone noise reduction processing.

In an optional implementation manner, the spatial filtering module 1101 is further configured to obtain, for all pixel points of a target image in the video to be processed, an initial pixel value of at least one neighborhood pixel point of each pixel point; and performing spatial filtering on the pixel points according to the initial pixel value of at least one neighborhood pixel point.

In another optional implementation manner, the apparatus further includes:

In another optional implementation manner, the time-domain filtering module 1102 is further configured to obtain each pixel point of the target image in parallel; for any pixel point of the target image, determining a second variance of the pixel point according to a corresponding first variance of the pixel point in the first noise reduction image, a frame difference between the first image and the first noise reduction image and a variance bias coefficient; determining a first gain coefficient corresponding to the pixel point according to the second variance, the first gain offset coefficient corresponding to the pixel point and the motion compensation coefficient; and determining a first pixel value of the pixel point after time domain filtering according to the first gain coefficient, the initial pixel value of the pixel point and the corresponding noise reduction pixel value of the pixel point in the first noise reduction image.

In another optional implementation manner, the apparatus further includes:

a first determining module for determining a motion compensation coefficient according to the frame difference.

In another optional implementation manner, the apparatus further includes:

In another optional implementation manner, the time-domain filtering module 1102 is further configured to use a product of a first gain coefficient corresponding to a pixel point and a first pixel value of the pixel point as a first fusion value for any pixel point of the second image; taking the product of the first gain coefficient corresponding to the pixel point and the second pixel value of the pixel point as a second fusion value, wherein the second pixel value is the pixel value of the pixel point after spatial filtering; and summing the first fusion value and the second fusion value to obtain a noise reduction pixel value corresponding to the pixel point.

It should be noted that: in the above embodiment, when the device runs an application program, only the division of the functional modules is described as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 12 is a block diagram of a terminal 1200 according to an embodiment of the present application. The terminal 1200 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1200 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, terminal 1200 includes: a processor 1201 and a memory 1202.

The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement the video denoising method provided by the method embodiments herein.

In some embodiments, the terminal 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, display 1205, camera assembly 1206, audio circuitry 1207, positioning assembly 1208, and power supply 1209.

The peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1204 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to acquire touch signals on or over the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this point, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one, providing the front panel of the terminal 1200; in other embodiments, the display 1205 can be at least two, respectively disposed on different surfaces of the terminal 1200 or in a folded design; in still other embodiments, the display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 1200. Even further, the display screen 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display panel 1205 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided at different locations of terminal 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.

The positioning component 1208 is configured to locate a current geographic Location of the terminal 1200 to implement navigation or LBS (Location Based Service). The Positioning component 1208 can be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union galileo System.

The power supply 1209 is used to provide power to various components within the terminal 1200. The power source 1209 may be alternating current, direct current, disposable or rechargeable. When the power source 1209 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1200 also includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: acceleration sensor 1211, gyro sensor 1212, pressure sensor 1213, fingerprint sensor 1214, optical sensor 1215, and proximity sensor 1216.

The acceleration sensor 1211 can detect magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1200. For example, the acceleration sensor 1211 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1201 may control the display screen 1205 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1212 may detect a body direction and a rotation angle of the terminal 1200, and the gyro sensor 1212 may collect a 3D motion of the user on the terminal 1200 in cooperation with the acceleration sensor 1211. The processor 1201 can implement the following functions according to the data collected by the gyro sensor 1212: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1213 may be disposed on the side frames of terminal 1200 and/or underlying display 1205. When the pressure sensor 1213 is disposed on the side frame of the terminal 1200, the user's holding signal of the terminal 1200 can be detected, and the processor 1201 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1213. When the pressure sensor 1213 is disposed at a lower layer of the display screen 1205, the processor 1201 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1205. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1214 is used for collecting a fingerprint of the user, and the processor 1201 identifies the user according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 1201 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1214 may be provided on the front, back, or side of the terminal 1200. When a physical button or vendor Logo is provided on the terminal 1200, the fingerprint sensor 1214 may be integrated with the physical button or vendor Logo.

The optical sensor 1215 is used to collect the ambient light intensity. In one embodiment, the processor 1201 may control the display brightness of the display 1205 according to the ambient light intensity collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display luminance of the display panel 1205 is increased; when the ambient light intensity is low, the display brightness of the display panel 1205 is turned down. In another embodiment, processor 1201 may also dynamically adjust the camera head 1206 shooting parameters based on the ambient light intensity collected by optical sensor 1215.

A proximity sensor 1216, also known as a distance sensor, is typically disposed on the front panel of the terminal 1200. The proximity sensor 1216 is used to collect a distance between the user and the front surface of the terminal 1200. In one embodiment, when the proximity sensor 1216 detects that the distance between the user and the front surface of the terminal 1200 gradually decreases, the processor 1201 controls the display 1205 to switch from the bright screen state to the dark screen state; when the proximity sensor 1216 detects that the distance between the user and the front surface of the terminal 1200 gradually becomes larger, the processor 1201 controls the display 1205 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 1200 and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components may be used.

The embodiment of the present application further provides a storage medium, where the storage medium is applied to a terminal, and at least one program code is stored in the storage medium, where the at least one program code is used for being executed by a processor and implementing the video noise reduction method in the embodiment of the present application.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a storage medium, such as a read-only memory, a magnetic disk or an optical disk.

The present application is intended to cover various modifications, alternatives, and equivalents, which may be included within the spirit and scope of the present application.

Claims

1. A method for video denoising, the method comprising:

determining a frame difference between the first image and a first noise-reduced image, wherein the first noise-reduced image is an image which is subjected to noise reduction processing and corresponds to a previous frame of image of the target image, the frame difference is used for indicating a pixel frame difference of each pixel point, and the pixel frame difference is a difference value between a noise-reduced pixel value corresponding to the pixel point in the first noise-reduced image and a filtering pixel value corresponding to the pixel point in the first image;

determining a motion compensation coefficient according to the frame difference;

according to the frame difference and the motion compensation coefficient, performing time domain filtering on pixel points of the target image in parallel to obtain a second image;

fusing the first image and the second image according to a gain coefficient corresponding to a pixel point of the second image to obtain a second noise-reduced image which is subjected to noise reduction processing and corresponds to the target image;

the step of performing time-domain filtering on the pixel points of the target image comprises the following steps:

for any pixel point of the target image, determining a second variance corresponding to the pixel point in the target image according to a first variance corresponding to the pixel point in the first noise-reduced image, a pixel frame difference corresponding to the pixel point and a variance bias coefficient;

determining a first gain coefficient corresponding to the pixel point according to the second variance, a first gain offset coefficient corresponding to the pixel point and the motion compensation coefficient, wherein the first gain offset coefficient is a gain offset coefficient corresponding to the pixel point in the target image, and the first gain coefficient is a gain coefficient corresponding to the pixel point in the target image;

and determining a first pixel value of the pixel point after time domain filtering according to the first gain coefficient, the initial pixel value of the pixel point and the corresponding noise reduction pixel value of the pixel point in the first noise reduction image.

2. The method according to claim 1, wherein the performing pixel-independent spatial filtering on pixel points of a target image in the video to be processed comprises:

acquiring initial pixel values of at least one neighborhood pixel of each pixel for all pixels of a target image in the video to be processed;

and performing spatial filtering on the pixel points according to the initial pixel value of the at least one neighborhood pixel point.

3. The method of claim 2, wherein prior to performing de-pixelwise dependent spatial filtering on pixels of a target image in the video to be processed, the method further comprises:

and calling an image processing interface of the graphic processor, wherein the image processing interface is used for carrying out spatial filtering for removing pixel dependence on pixel points of a target image in the video to be processed in parallel.

4. The method of claim 1, wherein before determining the first gain factor corresponding to the pixel point according to the second variance, the first gain bias factor corresponding to the pixel point, and the motion compensation factor, the method further comprises:

acquiring a second gain coefficient and a second gain offset coefficient corresponding to the pixel point in the first noise reduction image;

and determining a first gain offset coefficient corresponding to the pixel point according to the second gain coefficient and the second gain offset coefficient.

5. The method according to claim 1, wherein the fusing the first image and the second image according to the gain coefficient corresponding to the pixel point of the second image to obtain a second noise-reduced image which is corresponding to the target image and has undergone noise reduction processing, comprises:

for any pixel point of the second image, taking the product of a first gain coefficient corresponding to the pixel point and a first pixel value of the pixel point as a first fusion value;

taking a product of a first gain coefficient corresponding to the pixel point and a second pixel value of the pixel point as a second fusion value, wherein the second pixel value is a pixel value of the pixel point after spatial filtering;

and summing the first fusion value and the second fusion value to obtain a noise reduction pixel value corresponding to the pixel point.

6. The method according to any of claims 1-5, wherein the spatial filtering and the temporal filtering process the luminance component of a pixel point separately.

7. A video noise reduction apparatus, the apparatus comprising:

a time domain filtering module, configured to determine a frame difference between the first image and a first noise-reduced image, where the first noise-reduced image is an image that has undergone noise reduction processing and corresponds to a previous frame of image of the target image, the frame difference is used to indicate a pixel frame difference of each pixel point, and the pixel frame difference is a difference between a noise-reduced pixel value corresponding to the pixel point in the first noise-reduced image and a filtered pixel value corresponding to the pixel point in the first image;

the time domain filtering module is further configured to determine a motion compensation coefficient according to the frame difference;

the time domain filtering module is further configured to perform time domain filtering on the pixel points of the target image in parallel according to the frame difference and the motion compensation coefficient to obtain a second image;

the fusion module is used for fusing the first image and the second image according to the gain coefficient corresponding to the pixel point of the second image to obtain a second noise-reduced image which corresponds to the target image and is subjected to noise reduction processing;

8. The apparatus according to claim 7, wherein the spatial filtering module is further configured to obtain, for all pixels of a target image in the video to be processed, an initial pixel value of at least one neighborhood pixel of each of the pixels; and performing spatial filtering on the pixel points according to the initial pixel value of the at least one neighborhood pixel point.

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 7, wherein the temporal filtering module is further configured to, for any pixel point of the second image, take a product of a first gain coefficient corresponding to the pixel point and a first pixel value of the pixel point as a first fusion value; taking a product of a first gain coefficient corresponding to the pixel point and a second pixel value of the pixel point as a second fusion value, wherein the second pixel value is a pixel value of the pixel point after spatial filtering; and summing the first fusion value and the second fusion value to obtain a noise reduction pixel value corresponding to the pixel point.

11. A terminal, characterized in that the terminal comprises a processor and a memory for storing at least one piece of program code, which is loaded by the processor and which performs the video denoising method according to any one of claims 1 to 6.

12. A storage medium for storing at least one program code for execution by a processor and implementing a video noise reduction method according to any one of claims 1 to 6.