[go: up one dir, main page]

CN111784578B - Image processing, model training method and device, equipment, storage medium - Google Patents

Image processing, model training method and device, equipment, storage medium Download PDF

Info

Publication number
CN111784578B
CN111784578B CN202010599465.9A CN202010599465A CN111784578B CN 111784578 B CN111784578 B CN 111784578B CN 202010599465 A CN202010599465 A CN 202010599465A CN 111784578 B CN111784578 B CN 111784578B
Authority
CN
China
Prior art keywords
image
images
resolution
frames
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010599465.9A
Other languages
Chinese (zh)
Other versions
CN111784578A (en
Inventor
张弓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010599465.9A priority Critical patent/CN111784578B/en
Publication of CN111784578A publication Critical patent/CN111784578A/en
Application granted granted Critical
Publication of CN111784578B publication Critical patent/CN111784578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例公开了图像处理、模型训练方法及装置、设备、存储介质,其中,所述方法包括:获取待处理图像;调用已训练的至少一个图像重建模型;其中,所述图像重建模型是基于包括多帧样本图像和每帧对应的真值图像的训练样本集合进行训练后得到的,所述真值图像是由多帧参考图像融合得到的,所述真值图像的分辨率大于对应样本图像的分辨率;通过所述至少一个图像重建模型,对所述待处理图像进行超分辨率处理,得到目标图像。

The embodiments of the present application disclose image processing and model training methods and devices, equipment, and storage media, wherein the method comprises: obtaining an image to be processed; calling at least one trained image reconstruction model; wherein the image reconstruction model is obtained after training based on a training sample set including multiple frames of sample images and a true value image corresponding to each frame, wherein the true value image is obtained by fusion of multiple frames of reference images, and the resolution of the true value image is greater than the resolution of the corresponding sample image; and super-resolution processing is performed on the image to be processed through the at least one image reconstruction model to obtain a target image.

Description

Image processing and model training method and device apparatus, storage medium
Technical Field
Embodiments of the present application relate to image processing technology, and relate to, but are not limited to, image processing, model training methods and apparatuses, devices, and storage media.
Background
The image is used as an important information form for sensing the world, and the richness and the detail of the content directly determine the detail degree of the sensed content. The higher the pixel density on a per-unit scale of the image, the clearer the image, the more detail it expresses, and the more information is perceived, i.e., the higher resolution image. Super-resolution reconstruction of images has been studied in many ways. Such as remote sensing image fields, satellite imaging fields, medical image fields, and some high definition display fields.
However, in the related art, the limitation is made to the defect of the original image, resulting in lower definition of the high-resolution image obtained by the super-resolution method.
Disclosure of Invention
The image processing and model training method, the device, the equipment and the storage medium can obtain the high-resolution image with higher definition, wherein the image processing and model training method, the device, the equipment and the storage medium are realized in the following way:
The image processing method comprises the steps of obtaining an image to be processed, calling at least one trained image reconstruction model, wherein the image reconstruction model is obtained after training based on a training sample set comprising a plurality of frames of sample images and true value images corresponding to each frame, the true value images are obtained through fusion of a plurality of frames of reference images, the resolution of the true value images is larger than that of the corresponding sample images, and super-resolution processing is conducted on the image to be processed through the at least one image reconstruction model to obtain a target image.
The model training method comprises the steps of carrying out downsampling processing on an original image set according to each preset downsampling parameter value to obtain a downsampling image set with a corresponding downsampling parameter value, generating a training sample set according to the downsampling image set and a true value image corresponding to the downsampling image, wherein the true value image is obtained by fusion of multiple frames of reference images, the resolution of the true value image is larger than that of the corresponding sample image, and training an original deep learning model by utilizing the training sample set with each downsampling parameter value to obtain an image reconstruction model with the corresponding downsampling parameter value.
The image processing device comprises an acquisition module and a calling module, wherein the acquisition module is used for acquiring an image to be processed, the calling module is used for calling at least one trained image reconstruction model, the image reconstruction model is obtained after training based on a training sample set comprising a plurality of frames of sample images and true images corresponding to each frame, the true images are obtained through fusion of a plurality of frames of reference images, the resolution of the true images is larger than that of the corresponding sample images, and the super-processing module is used for performing super-resolution processing on the image to be processed through the at least one image reconstruction model to obtain a target image.
The model training device comprises a downsampling module, a sample generation module and a model training module, wherein the downsampling module is used for performing downsampling processing on an original image set according to each preset downsampling parameter value to obtain a downsampled image set corresponding to the downsampling parameter value, the sample generation module is used for generating a training sample set according to the downsampled image set and a true value image corresponding to the downsampled image, the true value image is obtained by fusion of multiple frames of reference images, the resolution of the true value image is larger than that of the corresponding sample image, and the model training module is used for training an original deep learning model by utilizing the training sample set under each downsampling parameter value to obtain an image reconstruction model corresponding to the downsampling parameter value.
The electronic device provided by the embodiment of the application comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes the steps in any one of the image processing methods in the embodiment of the application when executing the program, or realizes the steps in any one of the model training methods in the embodiment of the application when executing the program.
The computer readable storage medium provided by the embodiment of the present application stores a computer program thereon, and is characterized in that the computer program when executed by a processor implements the steps in any of the image processing methods of the embodiment of the present application, or the computer program when executed by a processor implements the steps in any of the model training methods of the embodiment of the present application.
In the embodiment of the application, the image reconstruction model for performing super-resolution processing on the image to be processed is obtained by training a training sample set comprising a plurality of frame sample images and a truth image corresponding to each frame, wherein the truth image is formed by fusing a plurality of frame reference images instead of directly taking a single frame reference image as the truth image, so that the signal-to-noise ratio of the truth image obtained by fusion is higher and the details are clearer, the performance of the trained model can be improved, and the image to be processed can be reconstructed into a target image with higher definition when the model is used.
Drawings
FIG. 1 is a schematic view of an application scenario of an image reconstruction model;
FIG. 2 is a schematic view of another application scenario of an image reconstruction model;
FIG. 3 is a schematic diagram of an audio/video playing interface;
FIG. 4A is a schematic diagram of an implementation flow of a model training method according to an embodiment of the present application;
FIG. 4B is a flowchart illustrating a method for generating a training sample set according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a process for training multiple initial models (e.g., original deep learning models) according to an embodiment of the present application;
FIG. 6 is a schematic diagram of another implementation flow of a model training method according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating an implementation flow of an image processing method according to an embodiment of the present application;
FIG. 8 is a flowchart illustrating another implementation of an image processing method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a high resolution reconstructed image 902 versus an original image 901;
FIG. 10 is a schematic flow chart of generating real training data of a super-resolution network based on multi-frame fusion according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a flow chart for implementing multi-frame fusion according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a flowchart for implementing homography matrix calculation according to an embodiment of the present application;
FIG. 13A is a schematic diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 13B is a schematic diagram illustrating another embodiment of an image processing apparatus according to the present application;
FIG. 14A is a schematic diagram of a model training apparatus according to an embodiment of the present application;
FIG. 14B is a schematic diagram of another embodiment of a model training apparatus;
Fig. 15 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are illustrative of the application and are not intended to limit the scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
It should be noted that the term "first\second\third" in relation to embodiments of the present application is merely to distinguish similar or different objects and does not represent a specific ordering for the objects, it being understood that the "first\second\third" may be interchanged in a specific order or sequence, where allowed, to enable embodiments of the present application described herein to be practiced in an order other than that illustrated or described herein.
The super-resolution processing is to process the low-resolution image to obtain a corresponding high-resolution image. For example, for an image with 540P resolution, after super resolution processing is performed, an image with 1080P resolution can be obtained. In the super-resolution processing process, the representation value of each pixel position in the low-resolution image is firstly acquired, then, based on the representation value of each pixel position in the low-resolution image, each representation value is calculated based on a trained image reconstruction model, multi-channel data is output to obtain a plurality of values related to each representation value, the values can be used as the representation values of new pixel positions on the super-resolution image, and finally, the high-resolution image can be generated by arranging the representation values of the new pixel positions.
The application scenario of the embodiment of the present application in the image reconstruction model described below is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided by the embodiment of the present application. As can be known by those skilled in the art, with the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is applicable to similar technical problems.
The trained, available image reconstruction model is deployed in an image playback application installed on a terminal with image display and/or video playback functions, such as a cell phone, tablet, personal computer (Personal Computer, PC) or wearable device. The video playback application may be, for example, various applications capable of playing video and/or various applications capable of displaying images (e.g., camera applications), and the like. The user opens the image playing application on the terminal and can request to play video or display images.
Fig. 1 is a schematic view of an application scenario of an image reconstruction model, as shown in fig. 1, when a user 101 requests to play a video through an image playing application, a terminal 102 requests a server 103 providing a video playing service for the video, after receiving the video returned by the server 103, the terminal 102 extracts a low resolution video therein, determines each video frame 104 in the low resolution video as a low resolution image, and inputs the low resolution image as an image to be processed into at least one image reconstruction model, the at least one image reconstruction model performs super resolution processing on the image to be processed, correspondingly obtains at least one frame of high resolution image, obtains a target image when the high resolution image is one, and fuses the images to obtain the target image when the high resolution image is a plurality of images. Each target image corresponds to a video frame. And after the target image is obtained, the image playing application sequentially plays the target image according to factors such as playing time sequence and the like, so that the playing of the high-resolution video is realized.
Similarly, when the terminal aims at a single image, the single image can be directly used as an image to be processed, and then the obtained target image is displayed to a user. Therefore, on one hand, the terminal only needs to download the video data with low resolution, the occupation of bandwidth is saved, the flow of the terminal can be saved, on the other hand, the storage space of the server can be saved, and furthermore, the user can watch higher-definition video conveniently.
The user can request display switching between the low resolution image and the high resolution image by clicking the switching button. Of course, the terminal can also receive user operation through an application interface, and only when the user selects the high-resolution play button on the application interface, the terminal performs super-resolution processing on the low-resolution image to obtain a high-resolution target image and plays the target image to the user.
Fig. 2 is a schematic diagram of another application scenario of an image reconstruction model, where the image reconstruction model is configured in an application server 201 that provides a video playing service or an image display service, and a corresponding video playing application is installed on a terminal 202. The application server 201 may store low resolution video or image therein, and when the user 203 requests to view the low resolution video or image through the video playing application installed on the terminal 202, the application server 201 may transmit the data of the super resolution processed video or image to the terminal 202 and display the data to the user through the application installed on the terminal 202.
While watching a low-resolution video, if the user wishes to watch a high-resolution video, the user may click a high-resolution play button set on the application interface, at this time, the terminal 202 sends a K-time high-resolution play request to the application server 201, the application server 201 determines low-resolution video data in response to the high-resolution play request, acquires an image to be processed (for example, a single low-resolution picture or a video frame in the low-resolution video) therefrom, performs super-resolution processing through at least one image reconstruction model, and finally outputs the target image 204. And the target image is transmitted to the terminal 202, and the terminal 202 displays the target image with high resolution through the video and audio playing application or sequentially plays the target image with high resolution according to the playing time sequence and the like, so that the super-resolution video is played.
Fig. 3 is a schematic diagram of an audio-visual playing interface, after the user clicks the "super-resolution" play button 301 at the lower right corner, the audio-visual player performs super-resolution processing on the video or image according to the manner corresponding to fig. 1 or the related application server performs super-resolution processing on the video or image according to the manner corresponding to fig. 2, so as to provide the super-resolution video or image for the user. The "super resolution" button 301 in the lower right corner may be a plurality of buttons of other types, such as an icon button 302 of "2 times super resolution" and an icon button 303 of "4 times super resolution", etc.
In some embodiments, the server may also switch between the low resolution image and the high resolution image based on the network resource information between the server and the terminal, automatically convert the low resolution image to a high resolution target image to be sent to the user when the network resource satisfies a condition (e.g., when the bandwidth is sufficient), and send the low resolution image when the network resource does not satisfy the condition (e.g., when the bandwidth is small).
The image processing method of the embodiment of the present application is described in detail below in two aspects. In both aspects, the training of the model is included on the one hand, and the process of super-resolution processing of the image to be processed based on the trained at least one image reconstruction model is included on the other hand.
The model training method provided by the embodiment of the application can be applied to electronic equipment, wherein the electronic equipment can be a terminal or a server, and the image processing method provided by the embodiment of the application can be applied to the electronic equipment.
Fig. 4A is a schematic flow chart of an implementation of the model training method according to an embodiment of the present application, as shown in fig. 4A, the method at least may include the following steps 401 to 403:
in step 401, the original image set is subjected to downsampling processing according to each preset downsampling parameter value, so as to obtain a downsampled image set with a corresponding downsampling parameter value.
Each of the predetermined downsampling parameter values may include one or more different downsampling parameter values. The method comprises the steps of obtaining one image reconstruction model through corresponding training when the method comprises one method, and obtaining a plurality of image reconstruction models through corresponding training when the method comprises a plurality of methods. For various examples, as shown in fig. 5, the preset downsampling parameter values are 2×2, 3×3 and 4×4, downsampling each image 501 in the original image set according to 2×2, and correspondingly obtaining a downsampled image set 502, downsampling each image 501 in the original image set according to 3×3, and correspondingly obtaining a downsampled image set 503, and downsampling each image 501 in the original image set according to 4×4, and correspondingly obtaining a downsampled image set 504. It should be noted that, the term "binning" means merging, and the term "2×2 binning" means merging 2×2 pixels into 1 pixel, for example, taking the average value of the pixels of the 4 pixels as the pixel value of the pixel, so that the term "binning" may also be understood as downsampling.
In some embodiments, the electronic device may perform downsampling on an original image in the original image set according to the downsampling parameter value to obtain the downsampled image, and perform gaussian blur processing on the original image with the downsampling parameter value as a size of a gaussian kernel to obtain a gaussian blurred image, and perform bicubic interpolation on the gaussian blurred image according to the downsampling parameter value to obtain the downsampled image.
And step 402, generating a training sample set corresponding to the downsampled image set according to each downsampled image set and a truth image (Ground Truth, GT) corresponding to the downsampled image set, wherein the truth image is obtained by fusing multiple frames of reference images, and the resolution of the truth image is larger than that of the corresponding sample image.
It will be appreciated that in supervised learning, each downsampled image corresponds to a frame of truth image, and in semi-supervised learning, each image of a portion of the downsampled images in the set corresponds to a frame of truth image.
In implementing step 402, the device may first determine a second displacement of each downsampled image with respect to a corresponding truth image, then displace each downsampled image pixel by pixel according to the corresponding second displacement to align to the corresponding truth image, thereby obtaining a corresponding sample image, and generate the training sample set using each sample image and the corresponding truth image as a data pair. In this way, through the alignment mode, when the super-resolution processing is performed on the low-resolution image, the target image with local no blurring and no deformation can be obtained by the trained image reconstruction model.
The method of determining the second displacement is the same as the method of determining the first displacement. For example, the electronic device may detect feature points of two frames of images and extract feature description information of each feature point, then match the feature points of the two frames of images according to the feature description information of each feature point of the two frames of images, so as to find a plurality of feature point matching pairs between the two frames of images, determine a homography matrix between the two frames of images according to pixel coordinates of the feature points of the feature point matching pairs, and finally determine displacement of one frame of images relative to another frame of images, that is, displacement of each pixel point in one frame of images relative to a corresponding pixel point in another frame of images according to the homography matrix.
In some embodiments, the device may post-process the downsampled image prior to aligning the downsampled image to the corresponding truth image and then align the post-processed downsampled image to the corresponding truth image. The post-processing is not described in detail here, see in particular the examples of implementation of the post-processing below.
And step 403, training the original deep learning model by using the training sample set under each downsampling parameter value to obtain an image reconstruction model under the corresponding downsampling parameter value.
The structure of the original deep learning model may be varied, for example, the model is a recurrent neural network (Recurrent Neural Networks, RNN), a recurrent neural network (Recursive Neural Networks, RNN), a convolutional network (Convolutional Neural Networks, CNN), or the like.
It can be appreciated that the super-resolution conversion magnification of the image reconstruction model obtained by training is different for the training sample sets under different downsampling parameter values. Still taking fig. 5 as an example, a training sample set 512 obtained at downsampling parameter values of 2×2 canning, which includes a true image and a sample image aligned therewith, trains a resulting image reconstruction model 522 with 2 times the super-resolution conversion magnification, i.e., with 2 times the magnification capability. The model 522, when used, is capable of transforming each pixel location in a low resolution image to be processed into 2 x 2 pixel locations, and similarly, trains the resulting image reconstruction model 523 with 3-fold magnification capability for a training sample set 513 obtained at a downsampling parameter value of 3 x 3binning, and trains the resulting image reconstruction model 524 with 4-fold magnification capability for a training sample set 514 obtained at a downsampling parameter value of 4 x 4 binning.
In the embodiment of the application, a model training method is provided, in which, in a training sample set for training an original deep learning model, a true value image corresponding to a sample image is obtained by fusion of multiple frames of reference images instead of only taking a single frame of reference image as the true value image, so that the signal to noise ratio of the obtained true value image is higher, the details are clearer, the performance of the image reconstruction model obtained by training is better, and when the model is used, the image to be processed can be reconstructed into a target image with higher quality. I.e. the signal-to-noise ratio of the output target image is higher and the details are clearer.
In the research, it is found that a true value image obtained by fusing multiple frames of reference images may deviate slightly from a corresponding downsampled image, and the deviation may cause problems of local blurring or deformation of an output target image when the trained image reconstruction model processes an image to be processed. To solve the above problem, in some embodiments, for the step 402, a training sample set is generated according to the set of downsampled images and the truth image corresponding to the downsampled images, as shown in fig. 4B, where a device such as a terminal or a server may be implemented by the following steps 4021 to 4023:
step 4021, determining a second displacement of each of the downsampled images relative to the corresponding truth image.
When the method is implemented, the device can find a plurality of feature point matching pairs between the downsampled image and the corresponding truth image, then solve a homography matrix between the downsampled image and the corresponding truth image through direct linear Transformation (DIRECT LINEAR Transformation, DLT) based on the feature point matching pairs, and finally determine the displacement of the downsampled image relative to the corresponding truth image according to the homography matrix.
Step 4022, shifting each downsampled image pixel by pixel according to the corresponding second shift to align to the corresponding truth image, thereby obtaining a corresponding sample image;
In step 4023, the training sample set is generated using each of the sample images and the corresponding truth image as a data pair.
An embodiment of the present application further provides a model training method, and fig. 6 is a schematic flow chart of another implementation of the model training method according to the embodiment of the present application, as shown in fig. 6, where the method at least may include the following steps 601 to 610:
In step 601, N frame candidate images are acquired, N being an integer greater than 1.
In general, the candidate image is a high resolution image. The N-frame candidate image may be a multi-frame image acquired for an actual scene by a camera (e.g., a single-lens reflex or a mobile phone having a high-definition photographing function, etc.). Of course, the N frame candidate image may also be a simulated high resolution image. Compared with the latter, the image reconstruction model obtained according to the former has better universality, and the reconstructed high-resolution image (namely the target image) has better image quality for the truly acquired image to be processed.
Step 602, determining the definition of each candidate image.
Sharpness may be characterized by a variety of image parameters. In some embodiments, the sharpness is characterized by sharpness, and each candidate image is subjected to sharpness estimation, so that the sharpness of the corresponding image is obtained. For example, the sharpness SHARPNESS of an image may be determined by a sharpness estimation model as shown in the following equation (1):
in formula (1), the entire image is divided into k 1*k2 windows, where I max,k,l and I min,k,l represent the maximum luminance value and the minimum luminance value in the kth window, respectively.
And 603, determining an image with definition meeting a specific condition in the N frames of candidate images as the reference image, thereby obtaining a multi-frame reference image.
In some embodiments, an image of the N frame candidate images having a sharpness greater than a particular threshold is determined as the reference image.
For example, the camera continuously collects 30 candidate images, 8 images with higher sharpness are selected as reference images, and a true image is obtained based on fusion. It will be appreciated that the purpose of multi-frame fusion is to be able to obtain true images with higher signal-to-noise ratios and clearer details. Then, the higher the definition of the image used for fusion, the clearer the true image obtained by fusion, and further the better the performance of the model obtained by training. In other words, when the model is used online, a high-resolution target image with clearer details and higher signal-to-noise ratio can be reconstructed.
In step 604, one frame of the multi-frame reference image is selected as the first reference image.
When the method is implemented, any frame in the multi-frame image can be selected as a first reference image, namely, an image to be referred to is aligned.
Step 605 determines a first displacement of each other of the multiple frames of reference pictures relative to the first reference picture.
The method for determining the first displacement is the same as the method for determining the second displacement described above, and therefore will not be described here.
Step 606, shifting each of the other reference images pixel by pixel according to the corresponding first shift to align the other reference images to the first reference image, thereby obtaining a corresponding second reference image;
and step 607, fusing the first reference image and each second reference image to obtain a true value image.
It should be noted that, the method of image fusion, i.e. the method of implementing step 607, may be various. The device can realize image fusion in a space domain and can realize image fusion in a frequency domain.
In the spatial domain implementation, for example, the sharpness of each region of the first reference image and each of the second reference images may be determined first, the weight of each pixel position in the corresponding region is determined according to the sharpness of each region, and the first reference image and each of the second reference images are weighted and averaged according to the pixel position and the weight of the pixel position to obtain the true image.
The values may be varied. For example, the value may be a brightness value V at a pixel position, a brightness value L at a pixel position, or brightness channel data (i.e., Y channel value) at a pixel position, or any one or more of Red (Red, R), green (Green, G), blue (Blue, B) values at a pixel position, or gray values of each channel of a multispectral camera, or gray values of each channel of a special camera (infrared camera, ultraviolet camera, depth camera), or the like.
It will be appreciated that sharpness has a certain mapping relation to weights. The greater the sharpness of an image region, the greater the weighting of pixel locations in that region. In this way, the influence of the blurred region in the reference image on the sharpness of the true image can be reduced when image fusion is performed.
If the image is not uniform in definition of each region, the definition of a focused part is high, and the definition of a virtual focus part is blurred, if the weight is not used, the image information of the virtual focus region can influence the definition of the region corresponding to the true image more directly by numerical averaging. Therefore, here, the lower the weight of the blurred region, the higher the weight of the region with high definition, so that each region of the fused truth image is as clear as possible.
When the image fusion is realized in the frequency domain, in some embodiments, the device may perform multi-layer filtering on the first reference image and the second reference image to be fused by adopting wavelet transformation, decompose and form a high-frequency sub-image and a low-frequency sub-image corresponding to each image respectively, perform high-frequency component fusion according to the high-frequency sub-image corresponding to each image respectively to form a high-frequency component fusion coefficient, perform low-frequency component fusion according to the low-frequency sub-image corresponding to each image respectively to form a low-frequency component fusion coefficient, and perform wavelet inverse transformation according to the high-frequency component corresponding to the high-frequency component fusion coefficient and the low-frequency component corresponding to the low-frequency component fusion coefficient to generate a true value image. In other embodiments, the apparatus may further perform a discrete fourier transform (Discrete Fourier Transform, DFT) or a discrete cosine transform (Discrete Cosine Transform, DCT) on each reference image to convert the image signal to the frequency domain, perform image averaging in the frequency domain, and then inverse transform to the spatial domain to obtain the true image.
In step 608, the original image set is subjected to downsampling according to each preset downsampling parameter value, so as to obtain a downsampled image set with a corresponding downsampling parameter value.
It should be noted that, the execution sequence of the step of acquiring the truth image (corresponding to step 601 to step 607) and the step of acquiring the set of downsampled images (i.e. step 608) is not limited, and step 608 may be performed first to obtain the set of downsampled images with different downsampling parameter values, and then the truth image is determined through step 601 to step 607. Step 608 may be performed first, and then steps 601 to 607 may be performed. The step of determining the truth image and the step of determining the set of sampled images may also be performed in parallel.
Step 609, generating a training sample set according to the downsampled image set and the truth image corresponding to the downsampled image;
And step 610, training the original deep learning model by using the training sample set under each downsampling parameter value to obtain an image reconstruction model under the corresponding downsampling parameter value.
An embodiment of the present application provides an image processing method, and fig. 7 is a schematic flow chart of an implementation of the image processing method according to the embodiment of the present application, as shown in fig. 7, the method may include the following steps 701 to 703:
Step 701, acquiring an image to be processed.
As described in the application scenario above, the image to be processed may be any video frame image in the low resolution video, or may be a single low resolution image, for example, an image captured by the user through the terminal.
Step 702, invoking at least one trained image reconstruction model, wherein the image reconstruction model is obtained after training based on a training sample set comprising a plurality of frame sample images and a truth image corresponding to each frame, the truth image is obtained by fusing a plurality of frame reference images, and the resolution of the truth image is larger than that of the corresponding sample image.
It will be appreciated that different image reconstruction models have different magnifications. However, in the online phase, i.e. when these models are in use, not all image reconstruction models are called but part of the models in order to reduce the computational effort. In some embodiments, the device may determine a magnification to be applied to the image to be processed, and select the at least one image reconstruction model matching the magnification to be applied from a plurality of trained image reconstruction models according to the magnification to be applied. Therefore, the visual requirement of a user on the image resolution can be met, the calculated amount can be reduced, and the real-time requirement of high-resolution image display or high-resolution video display can be met.
And 703, performing super-resolution processing on the image to be processed through the at least one image reconstruction model to obtain a target image, wherein the resolution of the target image is greater than that of the image to be processed.
It will be appreciated that the at least one image reconstruction model may be one, or two or more. When the at least one image reconstruction model is a model, the model is directly used for converting the image to be processed, and a high-resolution image (called a high-resolution image for short) output by the model is taken as a target image. When the at least one image reconstruction model is two or more, in some embodiments, the device may perform super-resolution processing on the image to be processed by using each model in the at least one image reconstruction model to obtain a high-resolution image output by the corresponding model, and fuse each high-resolution image to obtain the target image.
In the embodiment of the application, the image reconstruction model for performing super-resolution processing on the image to be processed is obtained by training a training sample set comprising a plurality of frame sample images and a truth image corresponding to each frame, wherein the truth image is formed by fusing a plurality of frame reference images instead of directly taking a single frame reference image as the truth image, so that the signal-to-noise ratio of the truth image obtained by fusion is higher and the details are clearer, thereby improving the performance of the model after training, and reconstructing the image to be processed into a target image with higher quality when the model is used, namely the signal-to-noise ratio of the obtained target image is higher and the details are clearer.
An embodiment of the present application provides a further image processing method, and fig. 8 is a schematic flowchart of an implementation of the image processing method according to the embodiment of the present application, as shown in fig. 8, where the method may include the following steps 801 to 805:
Step 801, acquiring an image to be processed;
step 802, determining a magnification to be performed on the image to be processed.
In general, a user selects a magnification desired to be amplified through an application interface, that is, the apparatus receives an operation instruction indicating the magnification, and determines the magnification to be amplified of the image to be processed according to the operation instruction.
Step 803, selecting at least one image reconstruction model matched with the to-be-amplified magnification from a plurality of trained image reconstruction models according to the to-be-amplified magnification, wherein the image reconstruction model is obtained after training based on a training sample set comprising a plurality of frames of sample images and true images corresponding to each frame, the true images are obtained by fusion of a plurality of frames of reference images, and the resolution of the true images is larger than that of the corresponding sample images;
Step 804, performing super-resolution processing on the image to be processed by using each model in the at least one image reconstruction model to obtain a high-resolution image output by the corresponding model;
And step 805, fusing each high-resolution image to obtain the target image.
From the above description, different image reconstruction models have different magnification capabilities, and corresponding magnifications are different. For example, the pre-trained image reconstruction model includes model 1, model 2, and model 3, wherein model 1 has a 2-fold magnification capability, model 2 has a 4-fold magnification capability, and model 3 has a 6-fold magnification capability. If the to-be-amplified multiplying power of the to-be-processed image is 2.3, the definition of the obtained target image cannot meet the user requirement if only the model 1 is selected to reconstruct the to-be-processed image, and the obtained target image cannot meet the user requirement if only the model 2 is selected to reconstruct the to-be-processed image, based on the above, in order to enable the target image to meet the image effect actually corresponding to the to-be-amplified multiplying power as much as possible, the model 1 and the model 2 can be selected to reconstruct the to-be-processed image respectively, and then the high-resolution images output by the model 1 and the model 2 are fused to obtain the target image.
In some embodiments, if the magnification to be amplified is different from the magnification of each of the plurality of image reconstruction models, the apparatus may select two models closest to the magnification to be amplified as the at least one image reconstruction model to reconstruct the image to be processed respectively, and then fuse the high-resolution images output by the two models, so as to obtain the target image.
In other embodiments, if the magnification to be amplified is the same as that of a certain model in the plurality of image reconstruction models, the model can be used as the at least one image reconstruction model, and accordingly, the image to be processed is reconstructed through the model, and the output high-resolution image is used as the target image.
The single image super-resolution SISR refers to that a signal processing method is adopted to restore a low-resolution image into a high-resolution image under the condition of not changing shooting hardware equipment.
The super-resolution method mainly adopts a data-driven deep learning model to reconstruct required details so as to obtain accurate super-resolution.
Therefore, the quality of the data pair (DATA PAIRS) greatly affects the quality of reconstructing high resolution by the deep learning model, in the conventional deep learning model based on super resolution of a single Image of deep learning, bicubic (Bicubic) downsampling or Gaussian (Gaussian) fuzzy extraction downsampling are two most commonly used degradation models, namely, a process of using a high resolution Image (High Resolution Image, HR Image) which is shot in a single direction as a training truth value (Ground Truth) of the deep learning model, and a low resolution Image (Low Resolution Image, LRimage) obtained by downsampling the HR Image by Bicubic or Gaussian fuzzy extraction as a training Input (Input) of the model, so as to form the data pair.
The two degradation models perform well in processing simulation data sets with the same degradation process. However, in a complex real imaging system, these degradation models cannot accurately simulate the real degradation process, so that the performance of the super-resolution algorithm is significantly reduced on a real image, such as an image acquired by a smart phone.
Therefore, how to construct training data conforming to the degradation model of a real imaging system is a key for further improving the performance of a super-resolution deep learning model and reconstructing a more real high-resolution image.
The related methods for generating the super-division network training data are basically single-frame simulation methods, and Bicubic or Gaussian fuzzy downsampling is adopted to obtain a low-resolution image. The super-division network trained by the training data obtained through the two degradation models often has good performance in processing the simulation data with the same degradation process, but the performance in processing the truly acquired image is obviously reduced. In recent years, in order to better simulate a complex real degradation process, there is also a process of adding more other degradation factors (such as noise, blurring, quantization) on the basis of downsampling to more closely approximate the real degradation.
Although various simulated degradation methods improve the reconstruction performance of super-resolution algorithms based on deep learning on noisy or blurred low-resolution images, accurate modeling of the degradation process in a real imaging system is still not possible. As shown in fig. 9, there is a problem in The related literature in that The super-resolution model (for example ESRGAN) Of The State Of The Art (SOTA) at present looks very unrealistic for The high-resolution reconstructed image 902 obtained by The smartphone compared with The original image 901.
Based on this, in a related art, the residual channel attention mechanism is used to effectively enhance the useful features, suppress the noise and simultaneously ensure that the multi-scale high-frequency image features can be accurately recovered, but the training data pair is to generate the corresponding low-resolution image I LR by shrinking the high-resolution image I HR through bicubic interpolation, which has a better image effect on the same bicubic interpolation simulation, but has a greatly reduced effect on the truly acquired image.
In another related art, high/low resolution images are actually acquired through a change in field of view, thereby implicitly modeling a real degradation model in a data-driven manner. However, three problems are that the high-resolution image acquired by ① single frames still has noise and blurring, the high-resolution image and the low-resolution image are not enough to be taken as Ground Truth of the super-resolution neural network, the ② high-resolution image and the low-resolution image are acquired by two times, the difficulty of aligning the image content is increased, the difficulty of aligning brightness and color is high, local blurring and even deformation of a result of super-resolution processing can be caused by local alignment failure, and the ③ is required to acquire images in each Zoom focal segment for the Zoom application of a mobile phone, so that the workload is huge and difficult to realize.
Based on this, an exemplary application of the embodiment of the present application in one practical application scenario will be described below.
In order to solve the technical problems, the embodiment of the application provides a method for generating super-resolution network real training data based on multi-frame fusion. On the one hand, compared with a high-definition image acquired by a single frame, the method can enable the obtained Ground Truth signal-to-noise ratio to be higher and the details to be clearer, and on the other hand, the method can directly reflect the degradation effect (noise, quantization and the like) generated by the real acquisition equipment (smart phone, single inverse and the like) on the image without simulation by utilizing the real acquired image. Meanwhile, the mode of combining real equipment acquisition and analog post-processing enables acquisition of a plurality of image pairs with magnification to be possible.
Fig. 10 shows a main flow of generating super-resolution network real training data based on multi-frame fusion. Mainly comprises the following steps of one to four:
step one, binning, i.e. downsampling, here using gaussian blur followed by bicubic interpolation to obtain Bining results.
And step two, multi-frame fusion, namely synthesizing a training Ground Truth with higher definition and lower noise by using continuously acquired multi-frame images.
Step three, post-processing, which belongs to an optional step, simulates an image compression coding process.
And fourthly, aligning, namely slightly shifting the image after multi-frame fusion and the image after Binning, wherein the post-processed image is required to be aligned in order to prevent local blurring or deformation of the model.
The details of the above steps are as follows:
for Binning in step one:
binning is a down-sampling process, as shown in equation (2):
Db{u}=(Hbinning*Hblur)(u) (2);
In equation (1), H blur represents gaussian blur, which represents convolution operation, the gaussian kernel size is the same as the downsampling (downsample) size, and if downsample is 2×2, the gaussian kernel size is also 2×2, and H binning represents the neighboring pixel merging process, that is, downsampling. Here, the downsampling is performed by using a neighboring pixel averaging method, and a hardware downsampling (hardware binning) process of the analog sampling device, b represents a bin parameter, that is, a size of downsample, and if downsample is 2×2, a bin parameter b=2 represents that neighboring 2×2 pixels are averaged.
For multi-frame fusion in step two:
Multi-frame synthesis uses a large number of input frames to obtain a Ground Truth image with lower noise and clearer detail. As shown in fig. 11, the process includes a reference frame (sharpness estimate), image registration, and multi-frame averaging process.
Wherein for sharpness estimation, sharpness estimation of an image can be performed using a sharpness reference model based on local luminance features, as shown in the following equation (3):
The whole image is divided into k 1*k2 windows, where I max,k,l and I min,k,l represent the maximum luminance value and the minimum luminance value in the kth window, respectively.
For image registration, the homography matrix estimation based on SIFT feature point detection may be used to perform image registration, where the homography matrix H K from the kth image Y K to the reference frame image Y 0 is estimated, as shown in the following formula (4):
In the formula (3), (x, Y) is the coordinate of any point in Y K, the 3×3 matrix is H K, and the converted point (x ', Y') is the coordinate of (x, Y) registered to the reference frame image Y 0, w=w=1.
Thus, the displacement [ mv xk,mvyk ] of each point in Y K relative to Y 0 can be calculated according to the homography matrix H K to form a two-channel offset vector diagram with the same size as Y 0 and Y K, and the main flow is as shown in fig. 12, and the method comprises the following steps 121 to 124:
step 121, feature point detection is performed on images Y 0 and Y K respectively;
Step 122, determining a feature point description of the feature point of each image;
Step 123, based on the feature point description, performing feature point matching on the two frames of images;
Step 124, calculating a homography matrix based on the feature point matching result.
The Scale-INVARIANT FEATURE TRANSFORM (SIFT) is an original algorithm for detecting key points, and is essentially to search key points (feature points) in different Scale spaces, calculate the size, direction and Scale information of the key points, and describe the feature points by using the information to form the key points. The key points searched by SIFT are all very prominent, and stable characteristic points which are not transformed due to factors such as illumination, affine transformation and noise are avoided. After the feature points are obtained, the gradient histogram of the points around the feature points is used to form feature vectorsThis feature vector is a description of the current feature point. The characteristic points of the two images Y 0 and Y k are respectively calculated to obtain two groups of characteristic pointsAnd
The feature point matching process is to find 4 or more feature point pairs closest to each other on two graphs of Y 0 and Y k from the two sets of feature points obtained above. In implementation, the nearest feature point pair can be searched for by the euclidean distance shown in the following formula (5).
After finding the nearest feature point pair, the coefficients of homography matrix H K can be solved by direct linear Transformation (DIRECT LINEAR Transformation, DLT), thus obtaining the displacement [ mv x,mvy ] of each point in Y K relative to Y 0. Assuming that the coordinates of the feature points on Y K obtained by feature point matching are (x 1,y1),(x2,y2),...,(xt,yt) and the coordinates of the feature points on the corresponding Y 0 are (x' 1,y`1),(x`2,y`2),...,(x`t,y`t), the homography matrix is applied to the corresponding point pair, and the equation is rewritten, so that the equation shown in the following formula (6) can be obtained:
Or AH K =0 (6);
In equation (6), a is a matrix with corresponding point-to-two times the number of rows, the coefficients of these corresponding point-to-equation are stacked into a matrix, and the least squares solution of H can be found using a singular value decomposition (Singular Value Decomposition, SVD) algorithm to calculate the displacement [ mv xk,mvyk ] in each frame Y K relative to Y 0. After estimating the displacement, Y K can be displaced pixel by pixel according to [ mv xk,mvyk ] to obtain a k frame aligned result, denoted as Y k`.
And for multi-frame averaging, carrying out numerical averaging on the registered images on a Y channel according to pixel positions to obtain a final multi-frame synthesis result.
For the post-processing in step three, the following explanation is given here.
This step is optional depending on whether the trained model processes the YUV- > YUV domain image super-resolution process or the jpg- > jpg format image after compression. If the model is a super-resolution model of YUV- > YUV, the image coding compression process does not need to be simulated, and if the model is a super-resolution model of jpg- > jpg, the details of reconstruction damaged by the image coding compression also need to be considered.
The following formula (7) shows:
Xncompress=Cc{Xn} (7);
In the formula (7), n represents a binding parameter, and C C { x } represents image compression with the compression strength of C. The image can be encoded and compressed by adopting JPEG2000 compression standard, in order to ensure the robustness of the model, aiming at the super-resolution model of jpg- > jpg, the intensity of compression encoding, namely the quantization parameter (Quantization Parameters, QP) is randomly selected from 0,10,20,30,40, namely the compression artifacts (compression artifacts) with different degrees are generated from the process of completely not compressing to strongly compressing
For the alignment in step four, the following explanation is given here:
The image after multi-frame fusion and the image after Binning are possibly slightly offset, in order to prevent local blurring or deformation of the model, the image registration method in the second step is adopted to align Xn compression to Ground Truth Y gt, and a training input image Xn SR input of the final super-resolution neural network is obtained.
In the embodiment of the application, a method for generating the real training data of the super-resolution neural network based on multi-frame fusion is adopted, on one hand, compared with a high-definition image acquired by a single frame, the multi-frame fusion can enable the obtained Ground Truth to have higher signal-to-noise ratio and clearer details, and on the other hand, the real acquired image is utilized to directly reflect the degradation effects (noise, blurring, quantization and the like) generated by real acquisition equipment (smart phone, single shot and the like) on the image without simulation. Meanwhile, the mode of combining real equipment acquisition and analog post-processing enables acquisition of a plurality of image pairs with magnification to be possible.
The embodiment of the application provides a method for generating real training data of a super-resolution neural network by utilizing multi-frame synthesis, which ensures that the super-resolution model based on deep learning has better performance on a real acquired image than data generated by simulation degradation before, improves the upper limit of the performance of the deep learning applied on a super-resolution algorithm, considers artifacts generated by image compression coding in processing an LR image, and expands the training data to be applied to the super-resolution model for processing coded images such as jpg images.
In an embodiment of the present application, the manner of image registration includes, but is not limited to, the following:
As one possible implementation, image registration may employ fast robustness features (Speed Up Robust Feature, SURF), corner points, or other features for feature point detection and description.
As one possible implementation, the optical flow vector for each pixel from the current frame to the reference frame is solved based on the luminance information around each point of the adjacent frame, and the motion vector for the pixel is calculated based on the optical flow vector.
In an embodiment of the present application, the manner of multi-frame averaging includes, but is not limited to, the following:
As a possible implementation, the multi-frame averaging may also be weighted based on sharpness estimates of image portions.
As a possible implementation, the multi-frame averaging may also be performed in the frequency domain, for example, by performing a wavelet transform (Wavelet Transform, WT) or a discrete fourier transform, or by performing a discrete cosine transform to convert the image signal into the frequency domain, then performing image averaging, and then converting back into the spatial domain.
In an embodiment of the present application, the modes of post-processing image compression encoding include, but are not limited to, the following modes:
As a possible implementation, the compression coding may also employ the h.265/HEVC video coding standard, for data applications of the video super resolution model.
Compression encoding may also employ, as one possible implementation, the JPEG-XR or MPEG series (e.g., MPEG-2) image compression standards.
Based on the foregoing embodiments, the image processing apparatus and the model training apparatus provided in the embodiments of the present application may include each module included, and each unit included in each module may be implemented by a processor in an electronic device, or may of course be implemented by a specific logic circuit, where in the implementation process, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 13A is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, as shown in fig. 13A, the apparatus 130 includes an obtaining module 131, a calling module 132, and a super-resolution processing module 133, where:
An acquiring module 131, configured to acquire an image to be processed;
The invoking module 132 is configured to invoke at least one trained image reconstruction model, where the image reconstruction model is obtained after training based on a training sample set including multiple frames of sample images and true images corresponding to each frame, the true images are obtained by fusing multiple frames of reference images, and resolution of the true images is greater than resolution of the corresponding sample images;
And the super-processing module 133 is configured to perform super-resolution processing on the image to be processed through the at least one image reconstruction model, so as to obtain a target image.
In some embodiments, a module 132 is invoked for determining a to-be-amplified ratio of the to-be-processed image, selecting the at least one image reconstruction model matched with the to-be-amplified ratio from a plurality of trained image reconstruction models according to the to-be-amplified ratio, wherein the amplification ratios corresponding to different image reconstruction models are different.
In some embodiments, the super-processing module 133 is configured to perform super-resolution processing on the image to be processed by using each model in the at least one image reconstruction model to obtain a high-resolution image output by the corresponding model, and fuse each high-resolution image to obtain the target image.
In some embodiments, as shown in fig. 13B, the apparatus 130 further includes a downsampling module 134, a sample generating module 135 and a model training module 136, where the downsampling module 134 is configured to perform downsampling processing on an original image set according to each preset downsampling parameter value to obtain a downsampled image set with a corresponding downsampling parameter value, the sample generating module 135 is configured to generate a training sample set according to the downsampled image set and a true image corresponding to the downsampled image, and the model training module 136 is configured to train the original deep learning model by using the training sample set with each downsampling parameter value to obtain an image reconstruction model with a corresponding downsampling parameter value.
In some embodiments, the downsampling module 134 is configured to take the downsampling parameter value as a size of a gaussian kernel, perform gaussian blur processing on the original image to obtain a gaussian blurred image, and perform bicubic interpolation on the gaussian blurred image according to the downsampling parameter value to obtain the downsampled image.
In some embodiments, the sharpness is characterized by sharpness, the obtaining module 131 is further configured to obtain N frame candidate images, where N is an integer greater than 1, determine the sharpness of each of the candidate images, and determine an image in the N frame candidate images, where the sharpness satisfies a specific condition, as the reference image.
In some embodiments, the obtaining module 131 is configured to perform sharpness estimation on each of the candidate images to obtain sharpness of a corresponding image, and determine an image with sharpness greater than a specific threshold value in the N frames of candidate images as the reference image.
In some embodiments, the obtaining module 131 is further configured to select one frame of image of the multiple frames of reference images as a first reference image, determine a first displacement of each other reference image of the multiple frames of reference images except the first reference image relative to the first reference image, shift each other reference image pixel by pixel according to the corresponding first displacement to align the other reference images with the first reference image, thereby obtaining a corresponding second reference image, and fuse the first reference image with each second reference image, thereby obtaining the true value image.
In some embodiments, an obtaining module 131 is configured to detect a feature point of each reference image and extract feature description information of each feature point, perform feature point matching on the other reference images of an i-th frame and the first reference image according to the feature description information of each feature point of the other reference images of the i-th frame and the first reference image to obtain a feature point matching pair set, where i is an integer greater than 0 and less than or equal to the total number of the other reference images, determine a homography matrix between two frames of images according to pixel coordinates of feature points in the feature point matching pair set, and determine a first displacement of the other reference images of the i-th frame relative to the first reference image according to the homography matrix.
An embodiment of the present application provides a model training device, fig. 14A is a schematic structural diagram of the model training device of the embodiment of the present application, and as shown in fig. 14A, the device 140 includes a downsampling module 141, a sample generating module 142 and a model training module 143, where,
The downsampling module 141 is configured to downsample the original image set according to each preset downsampling parameter value, so as to obtain a downsampled image set with a corresponding downsampling parameter value;
The sample generation module 142 is configured to generate a training sample set according to the downsampled image set and a truth image corresponding to the downsampled image, where the truth image is obtained by fusing multiple frames of reference images, and a resolution of the truth image is greater than a resolution of a corresponding sample image;
the model training module 143 is configured to train the original deep learning model by using the training sample set under each of the downsampling parameter values, so as to obtain an image reconstruction model under the corresponding downsampling parameter values.
In some embodiments, the downsampling module 141 is configured to perform gaussian blur processing on the original image with the downsampling parameter value as a size of a gaussian kernel to obtain a gaussian blurred image, and perform bicubic interpolation on the gaussian blurred image according to the downsampling parameter value to obtain the downsampled image.
In some embodiments, as shown in fig. 14B, the apparatus 140 further includes a determining module 144 configured to obtain N frame candidate images, where N is an integer greater than 1, determine a sharpness of each of the candidate images, and determine an image in the N frame candidate images, where the sharpness satisfies a specific condition, as the reference image.
In some embodiments, the sharpness is characterized by sharpness, and the determining module 144 is configured to perform sharpness estimation on each of the candidate images to obtain sharpness of a corresponding image, and determine an image with sharpness greater than a specific threshold in the N frame candidate images as the reference image.
In some embodiments, as shown in fig. 14B, the apparatus 140 further includes an image fusion module 145 configured to select one of the multiple frames of reference images as a first reference image, determine a first displacement of each other of the multiple frames of reference images except the first reference image relative to the first reference image, displace each of the other reference images pixel by pixel according to the corresponding first displacement to align with the first reference image, thereby obtaining a corresponding second reference image, and fuse the first reference image with each of the second reference images, thereby obtaining the true value image.
In some embodiments, the image fusion module 145 is configured to perform feature point detection on each of the reference images and extract feature description information of each feature point, perform feature point matching on the other reference images of an i-th frame and the first reference image according to the feature description information of each feature point of the other reference images of the i-th frame and the first reference image to obtain a feature point matching pair set, where i is an integer greater than 0 and less than or equal to the total number of the other reference images, determine a homography matrix between two frames of images according to pixel coordinates of feature points in the feature point matching pair set, and determine a first displacement of the other reference images of the i-th frame relative to the first reference image according to the homography matrix.
The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, please refer to the description of the embodiments of the method of the present application.
It should be noted that, in the embodiment of the present application, if the above-mentioned image processing method is implemented in the form of a software functional module, and sold or used as a separate product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the related art, embodied in the form of a software product stored in a storage medium, including several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the application are not limited to any specific combination of hardware and software.
Correspondingly, as shown in fig. 15, the electronic device 150 provided by the embodiment of the present application may include a memory 151 and a processor 152, where the memory 151 stores a computer program that can be run on the processor 152, and the processor 152 implements the steps in the method provided in the above embodiment when executing the program.
The memory 151 is configured to store instructions and applications executable by the processor 152, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by the respective modules in the processor 152 and the electronic device 150, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).
Accordingly, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method provided in the above embodiments.
It should be noted here that the description of the storage medium and the device embodiments above is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, please refer to the description of the method embodiments of the present application.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions of actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one place or distributed on a plurality of network units, and may select some or all of the units according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of hardware plus a form of software functional unit.
It will be appreciated by those of ordinary skill in the art that implementing all or part of the steps of the above method embodiments may be implemented by hardware associated with program instructions, where the above program may be stored in a computer readable storage medium, where the program when executed performs the steps comprising the above method embodiments, where the above storage medium includes various media that may store program code, such as a removable storage device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Or the above-described integrated units of the application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the related art, embodied in the form of a software product stored in a storage medium, including several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. The storage medium includes various media capable of storing program codes such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.
The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.
The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.
The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1.图像处理方法,其特征在于,所述方法包括:1. An image processing method, characterized in that the method comprises: 获取待处理图像;Get the image to be processed; 调用已训练的至少一个图像重建模型;其中,所述图像重建模型是基于包括多帧样本图像和每帧对应的真值图像的训练样本集合进行训练后得到的,所述真值图像是由多帧参考图像融合得到的,所述真值图像的分辨率大于对应样本图像的分辨率;Calling at least one trained image reconstruction model; wherein the image reconstruction model is obtained by training based on a training sample set including multiple frames of sample images and a true value image corresponding to each frame, the true value image is obtained by fusing multiple frames of reference images, and the resolution of the true value image is greater than the resolution of the corresponding sample image; 通过所述至少一个图像重建模型,对所述待处理图像进行超分辨率处理,得到目标图像;Performing super-resolution processing on the image to be processed by using the at least one image reconstruction model to obtain a target image; 其中,所述参考图像的获得方法包括:The method for obtaining the reference image includes: 获取N帧候选图像,N为大于1的整数;Obtain N frames of candidate images, where N is an integer greater than 1; 确定每一所述候选图像的清晰度;Determining the clarity of each of the candidate images; 将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像;Determine an image whose definition meets a specific condition among the N frames of candidate images as the reference image; 其中,所述清晰度通过锐度表征,所述确定每一所述候选图像的清晰度,包括:对每一所述候选图像进行锐度估计,得到对应图像的锐度;The clarity is represented by sharpness, and the determining the clarity of each candidate image includes: estimating the sharpness of each candidate image to obtain the sharpness of the corresponding image; 相应地,所述将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像,包括:Accordingly, the step of determining an image whose definition meets a specific condition among the N frames of candidate images as the reference image includes: 将所述N帧候选图像中锐度大于特定阈值的图像,确定为所述参考图像。An image whose sharpness is greater than a specific threshold among the N frames of candidate images is determined as the reference image. 2.根据权利要求1所述的方法,其特征在于,所述调用已训练的至少一个图像重建模型,包括:2. The method according to claim 1, characterized in that calling at least one trained image reconstruction model comprises: 确定所述待处理图像的待放大倍率;Determining the magnification of the image to be processed; 根据所述待放大倍率,从已训练的多个图像重建模型中,选取与所述待放大倍率匹配的所述至少一个图像重建模型;其中,不同的所述图像重建模型对应的放大倍率不同。According to the magnification to be obtained, at least one image reconstruction model matching the magnification to be obtained is selected from a plurality of trained image reconstruction models; wherein different image reconstruction models correspond to different magnifications. 3.根据权利要求1所述的方法,其特征在于,所述通过所述至少一个图像重建模型,对所述待处理图像进行超分辨率处理,得到目标图像,包括:3. The method according to claim 1, characterized in that the step of performing super-resolution processing on the image to be processed by using the at least one image reconstruction model to obtain a target image comprises: 利用所述至少一个图像重建模型中的每一模型,对所述待处理图像进行超分辨率处理,得到对应模型输出的高分图像;Using each model of the at least one image reconstruction model, super-resolution processing is performed on the image to be processed to obtain a high-resolution image output by the corresponding model; 将每一所述高分图像进行融合,得到所述目标图像。Each of the high-resolution images is fused to obtain the target image. 4.根据权利要求2所述的方法,其特征在于,所述多个图像重建模型的训练过程包括:4. The method according to claim 2, characterized in that the training process of the multiple image reconstruction models comprises: 按照每一预设的降采样参数值,对原始图像集合进行降采样处理,得到对应降采样参数值下的降采样图像集合;According to each preset downsampling parameter value, the original image set is downsampled to obtain a downsampled image set under the corresponding downsampling parameter value; 根据所述降采样图像集合和所述降采样图像对应的真值图像,生成训练样本集合;Generate a training sample set according to the downsampled image set and the true value images corresponding to the downsampled images; 利用每一所述降采样参数值下的训练样本集合,对原始深度学习模型进行训练,得到对应降采样参数值下的图像重建模型。The original deep learning model is trained using the training sample set under each of the downsampling parameter values to obtain an image reconstruction model under the corresponding downsampling parameter value. 5.根据权利要求4所述的方法,其特征在于,按照所述降采样参数值,对所述原始图像集合中的原始图像进行降采样处理,得到所述降采样图像,包括:5. The method according to claim 4, characterized in that, performing downsampling processing on the original images in the original image set according to the downsampling parameter value to obtain the downsampled image comprises: 将所述降采样参数值作为高斯核的大小,对所述原始图像进行高斯模糊处理,得到高斯模糊图像;Using the downsampling parameter value as the size of the Gaussian kernel, performing Gaussian blur processing on the original image to obtain a Gaussian blurred image; 按照所述降采样参数值,对所述高斯模糊图像进行双三次插值,得到所述降采样图像。According to the downsampling parameter value, bicubic interpolation is performed on the Gaussian blurred image to obtain the downsampling image. 6.根据权利要求1至5任一项所述的方法,其特征在于,所述真值图像的获得方法包括:6. The method according to any one of claims 1 to 5, characterized in that the method for obtaining the true value image comprises: 选取所述多帧参考图像中的一帧图像作为第一参考图像;Selecting one frame of the multiple reference images as a first reference image; 确定所述多帧参考图像中除所述第一参考图像外的每一其他参考图像相对于所述第一参考图像的第一位移;Determine a first displacement of each other reference image in the multiple reference image frames except the first reference image relative to the first reference image; 将每一所述其他参考图像按照对应的第一位移,逐像素进行位移,以向所述第一参考图像对齐,从而得到对应的第二参考图像;Displacing each of the other reference images pixel by pixel according to the corresponding first displacement to align with the first reference image, thereby obtaining a corresponding second reference image; 将所述第一参考图像和每一所述第二参考图像融合,得到所述真值图像。The first reference image is fused with each of the second reference images to obtain the true value image. 7.根据权利要求6所述的方法,其特征在于,所述确定所述多帧参考图像中除所述第一参考图像外的每一其他参考图像相对于所述第一参考图像的第一位移,包括:7. The method according to claim 6, wherein determining a first displacement of each other reference image in the multiple reference image frames except the first reference image relative to the first reference image comprises: 对每一所述参考图像进行特征点检测,并提取每一特征点的特征描述信息;Performing feature point detection on each of the reference images, and extracting feature description information of each feature point; 根据第i帧所述其他参考图像和所述第一参考图像的每一特征点的特征描述信息,将所述第i帧所述其他参考图像与所述第一参考图像进行特征点匹配,得到特征点匹配对集合;其中,i为大于0且小于或等于所述其他参考图像总数的整数;According to the feature description information of each feature point of the other reference image and the first reference image in the i-th frame, feature point matching is performed on the other reference image in the i-th frame and the first reference image to obtain a feature point matching pair set; wherein i is an integer greater than 0 and less than or equal to the total number of the other reference images; 根据所述特征点匹配对集合中特征点的像素坐标,确定两帧图像之间的单应性矩阵;Determining a homography matrix between two frames of images according to pixel coordinates of feature points in the feature point matching pair set; 根据所述单应性矩阵,确定所述第i帧所述其他参考图像相对于所述第一参考图像的第一位移。A first displacement of the other reference image in the i-th frame relative to the first reference image is determined according to the homography matrix. 8.模型训练方法,其特征在于,所述方法包括:8. A model training method, characterized in that the method comprises: 按照每一预设的降采样参数值,对原始图像集合进行降采样处理,得到对应降采样参数值下的降采样图像集合;According to each preset downsampling parameter value, the original image set is downsampled to obtain a downsampled image set under the corresponding downsampling parameter value; 根据所述降采样图像集合和所述降采样图像对应的真值图像,生成训练样本集合;其中,所述真值图像是由多帧参考图像融合得到的,所述真值图像的分辨率大于对应样本图像的分辨率;Generate a training sample set according to the downsampled image set and the true value image corresponding to the downsampled image; wherein the true value image is obtained by fusing multiple frames of reference images, and the resolution of the true value image is greater than the resolution of the corresponding sample image; 利用每一所述降采样参数值下的训练样本集合,对原始深度学习模型进行训练,得到对应降采样参数值下的图像重建模型;Using the training sample set under each of the downsampling parameter values, the original deep learning model is trained to obtain an image reconstruction model under the corresponding downsampling parameter value; 其中,所述参考图像的获得方法包括:The method for obtaining the reference image includes: 获取N帧候选图像,N为大于1的整数;Obtain N frames of candidate images, where N is an integer greater than 1; 确定每一所述候选图像的清晰度;Determining the clarity of each of the candidate images; 将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像;Determine an image whose definition meets a specific condition among the N frames of candidate images as the reference image; 其中,所述清晰度通过锐度表征,所述确定每一所述候选图像的清晰度,包括:对每一所述候选图像进行锐度估计,得到对应图像的锐度;The clarity is represented by sharpness, and the determining the clarity of each candidate image includes: estimating the sharpness of each candidate image to obtain the sharpness of the corresponding image; 相应地,所述将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像,包括:Accordingly, the step of determining an image whose definition meets a specific condition among the N frames of candidate images as the reference image includes: 将所述N帧候选图像中锐度大于特定阈值的图像,确定为所述参考图像。An image whose sharpness is greater than a specific threshold among the N frames of candidate images is determined as the reference image. 9.图像处理装置,其特征在于,包括:9. An image processing device, characterized in that it comprises: 获取模块,用于获取待处理图像;An acquisition module, used for acquiring an image to be processed; 调用模块,用于调用已训练的至少一个图像重建模型;其中,所述图像重建模型是基于包括多帧样本图像和每帧对应的真值图像的训练样本集合进行训练后得到的,所述真值图像是由多帧参考图像融合得到的,所述真值图像的分辨率大于对应样本图像的分辨率;A calling module, used to call at least one trained image reconstruction model; wherein the image reconstruction model is obtained after training based on a training sample set including multiple frames of sample images and a true value image corresponding to each frame, the true value image is obtained by fusing multiple frames of reference images, and the resolution of the true value image is greater than the resolution of the corresponding sample image; 超分处理模块,用于通过所述至少一个图像重建模型,对所述待处理图像进行超分辨率处理,得到目标图像;A super-resolution processing module, used to perform super-resolution processing on the image to be processed through the at least one image reconstruction model to obtain a target image; 获取模块,还用于获取N帧候选图像,N为大于1的整数;确定每一所述候选图像的清晰度;将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像;其中,所述清晰度通过锐度表征,所述确定每一所述候选图像的清晰度,包括:对每一所述候选图像进行锐度估计,得到对应图像的锐度;相应地,所述将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像,包括:将所述N帧候选图像中锐度大于特定阈值的图像,确定为所述参考图像。The acquisition module is also used to acquire N frames of candidate images, where N is an integer greater than 1; determine the clarity of each of the candidate images; and determine the image whose clarity meets a specific condition among the N frames of candidate images as the reference image; wherein the clarity is characterized by sharpness, and the determination of the clarity of each of the candidate images includes: estimating the sharpness of each of the candidate images to obtain the sharpness of the corresponding image; accordingly, determining the image whose clarity meets a specific condition among the N frames of candidate images as the reference image includes: determining the image whose sharpness is greater than a specific threshold among the N frames of candidate images as the reference image. 10.模型训练装置,其特征在于,所述装置包括:10. A model training device, characterized in that the device comprises: 降采样模块,用于按照每一预设的降采样参数值,对原始图像集合进行降采样处理,得到对应降采样参数值下的降采样图像集合;A downsampling module is used to downsample the original image set according to each preset downsampling parameter value to obtain a downsampled image set under the corresponding downsampling parameter value; 样本生成模块,用于根据所述降采样图像集合和所述降采样图像对应的真值图像,生成训练样本集合;其中,所述真值图像是由多帧参考图像融合得到的,所述真值图像的分辨率大于对应样本图像的分辨率;A sample generation module, used to generate a training sample set according to the downsampled image set and the true value image corresponding to the downsampled image; wherein the true value image is obtained by fusion of multiple frames of reference images, and the resolution of the true value image is greater than the resolution of the corresponding sample image; 模型训练模块,用于利用每一所述降采样参数值下的训练样本集合,对原始深度学习模型进行训练,得到对应降采样参数值下的图像重建模型;A model training module, used to train the original deep learning model using the training sample set under each of the downsampling parameter values to obtain an image reconstruction model under the corresponding downsampling parameter value; 确定模块,用于获取N帧候选图像,N为大于1的整数;确定每一所述候选图像的清晰度;将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像;其中,所述清晰度通过锐度表征,所述确定每一所述候选图像的清晰度,包括:对每一所述候选图像进行锐度估计,得到对应图像的锐度;相应地,所述将所述N帧候选图像中清晰度满足特定条件的图像,确定为所述参考图像,包括:将所述N帧候选图像中锐度大于特定阈值的图像,确定为所述参考图像。A determination module is used to obtain N frames of candidate images, where N is an integer greater than 1; determine the clarity of each of the candidate images; and determine an image whose clarity meets a specific condition among the N frames of candidate images as the reference image; wherein the clarity is characterized by sharpness, and the determination of the clarity of each of the candidate images includes: estimating the sharpness of each of the candidate images to obtain the sharpness of the corresponding image; accordingly, determining an image whose clarity meets a specific condition among the N frames of candidate images as the reference image includes: determining an image whose sharpness is greater than a specific threshold among the N frames of candidate images as the reference image. 11.电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至7任一所述图像处理方法中的步骤,或者,所述处理器执行所述程序时实现权利要求8所述模型训练方法中的步骤。11. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and wherein the processor implements the steps of the image processing method described in any one of claims 1 to 7 when executing the program, or implements the steps of the model training method described in claim 8 when executing the program. 12.计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至7任一所述图像处理方法中的步骤,或者,该计算机程序被处理器执行时实现权利要求8所述模型训练方法中的步骤。12. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the computer program implements the steps of the image processing method described in any one of claims 1 to 7, or when the computer program is executed by a processor, the computer program implements the steps of the model training method described in claim 8.
CN202010599465.9A 2020-06-28 2020-06-28 Image processing, model training method and device, equipment, storage medium Active CN111784578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010599465.9A CN111784578B (en) 2020-06-28 2020-06-28 Image processing, model training method and device, equipment, storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010599465.9A CN111784578B (en) 2020-06-28 2020-06-28 Image processing, model training method and device, equipment, storage medium

Publications (2)

Publication Number Publication Date
CN111784578A CN111784578A (en) 2020-10-16
CN111784578B true CN111784578B (en) 2025-04-25

Family

ID=72760717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010599465.9A Active CN111784578B (en) 2020-06-28 2020-06-28 Image processing, model training method and device, equipment, storage medium

Country Status (1)

Country Link
CN (1) CN111784578B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488947B (en) * 2020-12-04 2025-09-05 北京字跳网络技术有限公司 Model training and image processing method, device, equipment and computer-readable medium
CN112614056B (en) * 2020-12-31 2023-09-05 北京纳析光电科技有限公司 Image super-resolution processing method
CN114723603B (en) * 2021-01-05 2024-12-24 北京小米移动软件有限公司 Image processing method, image processing device and storage medium
CN113570510A (en) * 2021-01-19 2021-10-29 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN112581372B (en) * 2021-02-26 2021-05-28 杭州海康威视数字技术股份有限公司 A kind of super-resolution light field imaging method, device and equipment for mapping across time and space
CN113033616B (en) * 2021-03-02 2022-12-02 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN113038267B (en) * 2021-03-09 2025-01-03 Oppo广东移动通信有限公司 Video processing method and device, computer readable storage medium and electronic device
CN115311177B (en) * 2021-05-07 2025-07-01 北京金山云网络技术有限公司 Image processing method, device, computer equipment and storage medium
CN113222178B (en) * 2021-05-31 2024-02-09 Oppo广东移动通信有限公司 Model training method, user interface generation method, device and storage medium
CN113538537B (en) * 2021-07-22 2023-12-12 北京世纪好未来教育科技有限公司 Image registration and model training method, device, equipment, server and medium
CN113570531B (en) * 2021-07-27 2024-09-06 Oppo广东移动通信有限公司 Image processing method, apparatus, electronic device, and computer-readable storage medium
CN113628134B (en) * 2021-07-28 2024-06-14 商汤集团有限公司 Image noise reduction method and device, electronic equipment and storage medium
CN113689335B (en) * 2021-08-24 2024-05-07 Oppo广东移动通信有限公司 Image processing method and device, electronic device and computer readable storage medium
CN113947521B (en) * 2021-10-14 2025-01-24 展讯通信(上海)有限公司 Image resolution conversion method and device based on deep neural network, and terminal equipment
CN114187057A (en) * 2021-12-16 2022-03-15 国网冀北电力有限公司计量中心 Electric power marketing data acquisition method, device, equipment and readable storage medium
CN114549307B (en) * 2022-01-28 2023-05-30 电子科技大学 High-precision point cloud color reconstruction method based on low-resolution image
CN115034967B (en) * 2022-06-27 2025-02-07 北京奇艺世纪科技有限公司 Image processing method, device, electronic device and computer readable storage medium
CN115619638A (en) * 2022-09-27 2023-01-17 深圳先进技术研究院 Dangerous behavior identification method and system based on super-resolution reconstruction and related equipment
CN115965531A (en) * 2022-12-28 2023-04-14 华人运通(上海)自动驾驶科技有限公司 Model training method, image generation method, device, equipment and storage medium
CN116309066B (en) * 2023-03-22 2025-09-26 北京航空航天大学 Super-resolution imaging method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110650295A (en) * 2019-11-26 2020-01-03 展讯通信(上海)有限公司 Image processing method and device
CN111080528A (en) * 2019-12-20 2020-04-28 北京金山云网络技术有限公司 Image super-resolution and model training method, device, electronic equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373019B2 (en) * 2016-01-13 2019-08-06 Ford Global Technologies, Llc Low- and high-fidelity classifiers applied to road-scene images
CN109360190B (en) * 2018-09-21 2020-10-16 清华大学 Building damage detection method and device based on image superpixel fusion
CN111080527B (en) * 2019-12-20 2023-12-05 北京金山云网络技术有限公司 Image super-resolution method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110650295A (en) * 2019-11-26 2020-01-03 展讯通信(上海)有限公司 Image processing method and device
CN111080528A (en) * 2019-12-20 2020-04-28 北京金山云网络技术有限公司 Image super-resolution and model training method, device, electronic equipment and medium

Also Published As

Publication number Publication date
CN111784578A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN111784578B (en) Image processing, model training method and device, equipment, storage medium
CN111539879B (en) Blind video denoising method and device based on deep learning
Liu et al. End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks.
CN108694705A (en) A kind of method multiple image registration and merge denoising
CN114339030B (en) Network live video image stabilizing method based on self-adaptive separable convolution
Marinč et al. Multi-kernel prediction networks for denoising of burst images
CN113902647B (en) Image deblurring method based on double closed-loop network
CN112150400A (en) Image enhancement method and device and electronic equipment
Cheng et al. A dual camera system for high spatiotemporal resolution video acquisition
CN112750092B (en) Training data acquisition method, image quality enhancement model and method, and electronic equipment
CN115063301A (en) Video denoising method, video processing method and device
Gryaditskaya et al. Motion aware exposure bracketing for HDR video
CN115311149A (en) Image denoising method, model, computer-readable storage medium and terminal device
WO2017124298A1 (en) Video encoding and decoding method, and inter-frame prediction method, apparatus, and system thereof
CN102572502A (en) Selecting method of keyframe for video quality evaluation
Kong et al. A comprehensive comparison of multi-dimensional image denoising methods
Mehta et al. Gated multi-resolution transfer network for burst restoration and enhancement
CN116797462B (en) Real-time video super-resolution reconstruction method based on deep learning
CN116630152A (en) Image resolution reconstruction method and device, storage medium and electronic equipment
CN111861877A (en) Method and apparatus for video superdivision variability
CN114514746B (en) Systems and methods for motion adaptive filtering as preprocessing for video encoding
WO2023133889A1 (en) Image processing method and apparatus, remote control device, system and storage medium
CN118608387A (en) Method, device and apparatus for super-resolution reconstruction of satellite video frames
Zhang et al. Spatial-temporal color video reconstruction from noisy CFA sequence
CN116847087A (en) Video processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant