WO2023246041A1 - Procédé, appareil et système de génération de vidéo, dispositif électronique et support de stockage lisible - Google Patents
Procédé, appareil et système de génération de vidéo, dispositif électronique et support de stockage lisible Download PDFInfo
- Publication number
- WO2023246041A1 WO2023246041A1 PCT/CN2022/141033 CN2022141033W WO2023246041A1 WO 2023246041 A1 WO2023246041 A1 WO 2023246041A1 CN 2022141033 W CN2022141033 W CN 2022141033W WO 2023246041 A1 WO2023246041 A1 WO 2023246041A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image frame
- frame
- reconstructed
- exposure image
- short exposure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
Definitions
- the present disclosure relates to the field of image processing, and more specifically, to a video generation method, device, system, electronic device, and readable storage medium.
- Video compressed sensing technology can generate video from a single frame of encoded image. This technology allows users to achieve the effect of high-speed video shooting while using a low-speed camera.
- the quality of videos generated by encoding single-frame images is often not satisfactory to users.
- the images in the generated videos are often blurry and cannot clearly show details.
- underdetermined inverse problems the number of known conditions is usually much smaller than the number of unknowns, so the unknowns cannot be effectively solved.
- commonly used methods include adding artificially prescribed regularization terms, such as total variation, or introducing statistical prior information in the data set through neural networks to impose constraints on the results, thereby improving the quality of the generated videos.
- the present disclosure provides a video generation method, device, system, electronic equipment and readable storage medium.
- This method can improve the clarity of images in videos generated based on single-frame encoded images, thereby improving the quality of the video. Provide users with a good visual experience.
- a video generation method including: acquiring a first short exposure image frame, a long exposure image frame, and a second short exposure image frame sequentially captured by an image capture device, wherein the first short exposure image frame
- the image frame and the second short-exposure image frame are short-exposure image frames obtained in a first encoding exposure mode
- the long-exposure image frame is obtained in a second encoding exposure mode different from the first encoding exposure mode.
- Multiple image frames obtained by continuous exposure are superposed to obtain a single-frame encoded long-exposure image frame; reconstructing the long-exposure image frame to obtain a plurality of pre-reconstructed frames; pre-reconstructing for each of the plurality of pre-reconstructed frames frames, fusing the first short exposure image frame, the second short exposure image frame and each pre-reconstructed frame to generate a reconstructed frame; and based on a plurality of reconstructions corresponding to the plurality of pre-reconstructed frames Frame generated reconstructed video.
- the first encoding exposure method encodes the captured scene information with a spatially uniform modulation pattern
- the second encoding exposure method encodes the captured scene information with N mutually different spatially different modulation patterns.
- the uniform modulation pattern encodes the scene information captured at N consecutive moments to obtain N image frames, and the N image frames are superimposed to generate the long exposure image frame, where N is an integer greater than or equal to 2.
- the first short exposure image frame, the second short exposure image frame and each pre-reconstruction frame are Fusion of the reconstructed frames to generate the reconstructed frame includes: for each pre-reconstructed frame in the plurality of pre-reconstructed frames, determining a first interpolated image between each pre-reconstructed frame and the first short exposure image frame frame and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame; converting the first interpolated image frame, the second interpolated image frame and each pre-reconstructed image frame The frames are fused to generate reconstructed frames.
- a first interpolated image between each pre-reconstructed frame and the first short exposure image frame is determined frames and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame including: for each pre-reconstructed frame in the plurality of pre-reconstructed frames, determining the each pre-reconstructed frame.
- the first relative position relationship information set and/or the second relative position relationship information set includes optical flow information describing the movement direction and offset of the object.
- the method further includes: combining the first short exposure image frame, the first relative position relationship information set and the first interpolation image frame and the second short exposure image frame , the second relative position relationship information set and the second interpolation image frame input the pre-trained first neural network model for fusion to obtain the refined first relative position relationship information set and the first information weight set and a refined second relative position relationship information set and a second information weight set, wherein the first information weight set indicates that the information of the object in the first short exposure image frame is in each pre-reconstruction frame.
- the weight in the second information weight set indicates the weight of the object information in the second short exposure image frame in each pre-reconstruction frame; based on the refined first relative position relationship information Set, perform spatial position mapping interpolation on the first short exposure image frame to align the spatial positions of the corresponding objects in each pre-reconstructed object frame and the first short exposure image frame, and The interpolated result is multiplied by the corresponding first information weight in the first information weight set to obtain a first fine interpolated image frame; based on the refined second relative position relationship information set, the second short The exposure image frame performs spatial position mapping interpolation to align the spatial position of the object in each pre-reconstructed frame with the corresponding object in the second short exposure image frame, and the interpolated result is multiplied by the The corresponding second information weight in the second information weight set is used to obtain a second fine interpolated image frame.
- fusing the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame to generate a reconstructed frame includes: merging the first fine interpolated image The frame is fused with the second fine interpolation image frame and each pre-reconstructed frame to obtain a reconstructed frame.
- a first interpolated image between each pre-reconstructed frame and the first short exposure image frame is determined frames and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame including: for each pre-reconstructed frame in the plurality of pre-reconstructed frames, The reconstructed frame and the first short exposure image frame and each of the pre-reconstructed frames and the second short exposure image frame are respectively input into a second neural network; each pre-reconstructed frame is input into the second neural network through the second neural network.
- the spatial positions of the objects in and the corresponding objects in the first short exposure image frame are aligned to obtain the first interpolated image frame; the objects in each pre-reconstructed frame and The spatial positions of the corresponding objects in the second short exposure image frame are aligned to obtain the second interpolated image frame; wherein the second neural network is pre-trained and the second neural network uses deformation convolution product.
- the object includes one of a pixel, a coding unit, or an identifiable feature of an image frame.
- fusing the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame to generate a reconstructed frame includes: merging the first interpolated image frame , the second interpolated image frame and each pre-reconstructed frame are input to a pre-trained third neural network for fusion to generate a reconstructed frame, wherein the third neural network is based on a neural network structure with a UNet structure.
- the image capture device captures image frames at a frame rate lower than a frame rate of the reconstructed video.
- the image capture device includes an optical encoding device for encoding the captured scene using different encoding exposure modes
- the optical encoding device includes a digital micromirror device DMD or a liquid crystal on silicon Modulator LCoS.
- the first short exposure image frame, the long exposure image frame, and the second short exposure image frame are continuously captured by the image capture device.
- the method further includes: acquiring a second long exposure image frame and a third short exposure image frame sequentially captured by the image capture device after the second short exposure image frame, wherein the The third short exposure image frame is a short exposure image frame obtained by the first coded exposure method, and the second long exposure image frame is a plurality of image frames obtained by continuous exposure by the second coded exposure method.
- the first short exposure image frame and the second short exposure image frame have higher quality spatial information than the long exposure image frame; and the long exposure image The frame has more temporal information than the first short exposure image frame and the second short exposure image frame.
- a video generation device including: an image frame acquisition module configured to acquire a first short exposure image frame, a long exposure image frame and a second short exposure image frame sequentially captured by an image capture device.
- Exposure image frames wherein the first short exposure image frame and the second short exposure image frame are short exposure image frames obtained in a first coded exposure manner, and the long exposure image frame is obtained by A single-frame coded long-exposure image frame obtained by superimposing multiple image frames obtained by continuous exposure of a second coded exposure method with different coded exposure methods; a pre-reconstruction module configured to reconstruct the long-exposure image frame to obtain multiple Pre-reconstruction frame; a fusion module configured to, for each pre-reconstruction frame in the plurality of pre-reconstruction frames, combine the first short exposure image frame, the second short exposure image frame and each pre-reconstruction frame.
- the reconstructed frames are fused to generate a reconstructed frame;
- a reconstruction module is configured to generate a reconstructed video based on a plurality of
- the first encoding exposure method encodes the captured scene information with a spatially uniform modulation pattern
- the second encoding exposure method encodes the captured scene information with N mutually different spatially different modulation patterns.
- the uniform modulation pattern encodes the scene information captured at N consecutive moments to obtain N image frames, and the N image frames are superimposed to generate the long exposure image frame, where N is an integer greater than or equal to 2.
- the fusion module includes an interpolation unit and a fusion reconstruction unit, wherein the interpolation unit is configured to determine, for each pre-reconstruction frame in the plurality of pre-reconstruction frames, the a first interpolated image frame between a pre-reconstructed frame and the first short exposure image frame and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame; the fusion The reconstruction unit is configured to fuse the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame to generate a reconstructed frame.
- the interpolation unit is configured to: for each pre-reconstructed frame in the plurality of pre-reconstructed frames, determine whether the object in each pre-reconstructed frame is consistent with the first A first set of relative positional relationship information of corresponding objects in the short exposure image frame and a second set of relative positional relationship information of the object in each pre-reconstructed frame and the corresponding object in the second short exposure image frame; based on The first set of relative position relationship information performs spatial position mapping interpolation on the first short exposure image frame to combine the objects in each pre-reconstructed frame with the corresponding objects in the first short exposure image frame.
- the spatial position of the object is aligned to obtain the first interpolated image frame; based on the second relative position relationship information set, spatial position mapping interpolation is performed on the second short exposure image frame to convert each predetermined The spatial positions of the objects in the reconstructed frame and the corresponding objects in the second short exposure image frame are aligned to obtain the second interpolated image frame.
- the first relative position relationship information set and/or the second relative position relationship information set includes optical flow information describing the movement direction and offset of the object.
- the fusion module further includes a thinning unit configured to: combine the first short exposure image frame, the first relative position relationship information set and the The first interpolated image frame and the second short exposure image frame, the second relative position relationship information set and the second interpolated image frame are input into a pre-trained first neural network model for fusion to obtain refinement The first relative position relationship information set and the first information weight set after the refinement and the refined second relative position relationship information set and the second information weight set, wherein the first information weight set indicates the first short exposure The weight of the information of the object in the image frame in each pre-reconstruction frame, the second information weight set indicates the weight of the information of the object in the second short exposure image frame in each pre-reconstruction frame Weight; based on the refined first relative position relationship information set, perform spatial position mapping interpolation on the first short exposure image frame to combine the objects in each pre-reconstructed frame with the first The spatial positions of the corresponding objects in the short exposure image frame are aligned, and the interpolated result is multiplie
- the fusion reconstruction unit is configured to: fuse the first fine interpolation image frame and the second fine interpolation image frame and each pre-reconstruction frame to obtain reconstruction frame.
- the interpolation unit is configured to: for each pre-reconstructed frame in the plurality of pre-reconstructed frames, compare each pre-reconstructed frame with the first short exposure image.
- the frames, each pre-reconstructed frame and the second short exposure image frame are respectively input into a second neural network; the objects in each pre-reconstructed frame and the first short exposure image frame are combined through the second neural network.
- the spatial positions of the corresponding objects in the image frames are aligned to obtain the first interpolated image frame; the objects in each pre-reconstructed frame and the objects in the second short exposure image frame are combined through the second neural network The spatial positions of the corresponding objects are aligned to obtain the second interpolated image frame; wherein the second neural network is pre-trained and the second neural network uses deformation convolution.
- the object includes one of a pixel, a coding unit, or an identifiable feature of an image frame.
- the fusion reconstruction unit is configured to: input the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame into a pre-trained third Neural networks are fused to generate reconstructed frames, wherein the third neural network is based on a neural network structure with a UNet structure.
- the image capture device captures image frames at a frame rate lower than a frame rate of the reconstructed video.
- the image capture device includes an optical encoding device for encoding the captured scene using different encoding exposure modes
- the optical encoding device includes a digital micromirror device DMD or a liquid crystal on silicon Modulator LCoS.
- the first short exposure image frame, the long exposure image frame, and the second short exposure image frame are continuously captured by the image capture device.
- the image frame acquisition module is further configured to acquire a second long exposure image frame and a third short exposure image frame sequentially captured by the image capture device after the second short exposure image frame.
- Exposure image frames wherein the third short exposure image frame is a short exposure image frame obtained in the first coded exposure mode, and the second long exposure image frame is obtained by continuously exposing in the second coded exposure mode
- the obtained multiple image frames are superimposed on the obtained single-frame encoded long exposure image frame
- the pre-reconstruction module is also configured to reconstruct the second long exposure image frame to obtain a plurality of second pre-reconstructed frames;
- the fusion The module is further configured to, for each second pre-reconstruction frame of the plurality of second pre-reconstruction frames, combine the second short exposure image frame, the third short exposure image frame and each second Pre-reconstructed frames are fused to generate second reconstructed frames
- the reconstruction module is further configured to generate a second reconstructed video based on a plurality of second reconstructed frames corresponding to the plurality of second pre-reconstructed
- the first short exposure image frame and the second short exposure image frame have higher quality spatial information than the long exposure image frame; and the long exposure image The frame has more temporal information than the first short exposure image frame and the second short exposure image frame.
- a video generation system including: an optical encoding device configured to respond to a driving signal to set a plurality of encoding exposure modes for a scene to be photographed; an image capture sensor configured to respond In response to the driving signal, exposure is performed sequentially to capture a first short exposure image frame, a long exposure image frame, and a second short exposure image frame, wherein the first short exposure image frame and the second short exposure image frame are represented by The short-exposure image frame obtained by the first coded exposure method, and the long-exposure image frame is a single frame obtained by superposing multiple image frames obtained by continuous exposure in a second coded exposure method different from the first coded exposure method.
- an image processor configured to reconstruct the long exposure image frame to obtain a plurality of pre-reconstructed frames; for each of the plurality of pre-reconstructed frames, converting the The first short exposure image frame, the second short exposure image frame and each of the pre-reconstructed frames are fused to generate a reconstructed frame; and a reconstructed video is generated based on a plurality of reconstructed frames corresponding to the plurality of pre-reconstructed frames.
- an electronic device including: a processor; and a memory, wherein computer readable code is stored in the memory, and the computer readable code is executed by the processor.
- a non-transitory computer-readable storage medium storing computer-readable instructions, wherein when the computer-readable instructions are executed by a processor, the above video generation method is implemented .
- Embodiments of the present disclosure provide a video generation method, device, system, electronic device, and readable storage medium.
- the first encoding exposure mode before and after the long exposure image frame that is different from the second encoding exposure mode is also acquired.
- the two short-exposure image frames captured by the encoding exposure method are combined with the two short-exposure image frames by utilizing the higher spatial information in the short-exposure image frame and the greater temporal information in the long-exposure image frame.
- Each of the multiple pre-reconstructed frames generated from the exposed image frame is fused to generate multiple optimized reconstructed frames with higher definition, and the quality of the video generated based on such reconstructed frames can be improved, thereby improving the quality of the video generated. Provide users with a good visual experience.
- FIG. 1A is a flowchart illustrating a video generation method according to the first embodiment of the present disclosure
- FIG. 1B is a schematic diagram illustrating a video generation method according to the first embodiment of the present disclosure
- FIG. 2 is an example illustrating a portion of an image capture device according to an embodiment of the present disclosure
- Figure 3 is a schematic diagram illustrating the use of modulation patterns to encode scene information to be captured to generate long exposure image frames
- Figure 4 is a schematic diagram of reconstructing a long exposure image frame to obtain multiple pre-reconstructed frames with reference to Figure 3;
- Figure 5 is a diagram illustrating the comparative effect of a video reconstruction method based on a conventional long-exposure image frame and a reconstructed frame generated based on the method of the present disclosure
- FIG. 6 is a flowchart illustrating a video generation method according to a second embodiment of the present disclosure
- FIG. 7 is a schematic diagram illustrating a video generation method according to a second embodiment of the present disclosure.
- FIG. 8 is a schematic diagram illustrating a video generation method according to a third embodiment of the present disclosure.
- FIG. 9 is a block diagram illustrating a video generating device according to a fourth embodiment of the present disclosure.
- FIG. 10 is a block diagram illustrating a video generating device according to a fifth embodiment of the present disclosure.
- FIG. 11 is a block diagram illustrating a video generating device according to a sixth embodiment of the present disclosure.
- FIG. 12 is a block diagram illustrating a video generation system according to a seventh embodiment of the present disclosure.
- Figure 13 is a structural diagram illustrating an electronic device according to some embodiments of the present disclosure.
- FIG. 1A shows a flowchart of a video generation method according to the first embodiment of the present disclosure.
- FIG. 1B shows a schematic diagram of a video generation method according to the first embodiment of the present disclosure.
- the method In addition to acquiring the single-frame encoded long exposure image frame captured in the second encoded exposure mode, the method also acquires two short-frame encoded long exposure image frames before and after the long exposure image frame captured in the first encoded exposure mode that is different from the second encoded exposure mode. Exposure image frames, by utilizing the higher spatial information in the short exposure image frame and the greater temporal information in the long exposure image frame, combine the two short exposure image frames with multiple presets generated by the long exposure image frame.
- Each pre-reconstruction in the reconstructed frame is fused to generate multiple optimized reconstructed frames with higher definition.
- the quality of the video generated based on such reconstructed frames can be improved, breaking through long-exposure images based on single-frame encoding. Constraints on frame video quality enable users to obtain a good visual experience.
- the video generation method described in the present disclosure will be described in detail below with reference to Figure 1A and Figure 1B. The method includes the following steps:
- a first short exposure image frame, a long exposure image frame and a second short exposure image frame sequentially captured by the image capturing device are acquired, wherein the first short exposure image frame and the second short exposure image frame are based on the first
- the short-exposure image frame obtained by the coded exposure method, and the long-exposure image frame is a single-frame coded long-exposure image frame obtained by superposing multiple image frames obtained by continuous exposure in a second coded exposure method different from the first coded exposure method.
- a relatively clear first short-exposure image frame encoded using the first encoding exposure method is obtained at time 1. That is to say, the first short exposure image frame has more spatial information.
- short-exposure image frames contain relatively little temporal information due to their shorter exposure time.
- a plurality of image frames used to generate long exposure image frames are obtained at time 2 to N-1. These image frames are obtained by continuous exposure in a second encoding exposure mode different from the first encoding exposure mode, wherein the second encoding exposure mode is The exposure method uses different modulation patterns for each moment, and these image frames are superimposed together in a long exposure process to generate a long exposure image frame.
- the spatial information of these image frames is limited by the quality of reconstruction.
- a clearer second short-exposure image frame encoded using the first encoding exposure method is obtained.
- the characteristics of the second short exposure image frame may refer to the first short exposure image frame. It can be seen from Figure 1B that from time 1 to time N, the objects in the picture move with time. Due to the long exposure time, these image frames have relatively more time information.
- the long exposure image frame is reconstructed to obtain a plurality of pre-reconstructed frames.
- the first short exposure image frame, the second short exposure image frame and each pre-reconstruction frame are fused to generate a reconstruction frame.
- this fusion utilizes the higher spatial information in short-exposure image frames and the greater temporal information in long-exposure image frames, making the multiple reconstructed frames generated clearer.
- a reconstructed video is generated based on a plurality of reconstructed frames corresponding to a plurality of pre-reconstructed frames.
- a first short exposure image frame, a long exposure image frame and a second short exposure image frame sequentially captured by the image capturing device may be acquired, wherein the first short exposure image frame and the second short exposure image frame
- the image frame may be a short-exposure image frame obtained in a first encoding exposure mode
- the long-exposure image frame may be obtained by superposing multiple image frames obtained by continuous exposure in a second encoding exposure mode different from the first encoding exposure mode.
- an image capture device can be used to capture image frames.
- the image capture device may be any type of device capable of capturing images, such as a camera, video camera, smartphone, tablet, laptop, or fixed or portable device with image capture capabilities, etc.
- image frame, image, frame, etc. may be used interchangeably.
- image frames can be generated by encoding the scene information to be captured in a specific encoding exposure mode when capturing images using an image capture device, so that multiple encoded frames can be superimposed to generate Single frame encoded long exposure image frames.
- encoded exposure involves encoding the scene information to be captured using a modulation pattern or mask set by an optical encoding device.
- an image capture device may include an optical encoding device for encoding a captured scene using different encoding exposures.
- Optical encoding devices may include digital micromirror devices (DMDs), liquid crystal on silicon modulators (LCoS), or other optical devices capable of setting modulation patterns or masks.
- the optical encoding device may also be referred to as a spatial light modulator.
- modulation patterns and masks are used to encode scene information whose information is to be captured, and thus modulation patterns and masks may be used interchangeably herein.
- the image capturing device may include an objective lens, a DMD, a relay lens, and an image sensor, where the image sensor may include an image sensor such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).
- CCD charge coupled device
- CMOS complementary metal oxide semiconductor
- DMD can also be replaced with other optical encoding periods.
- Objective lenses and relay lenses are conventional components used to transmit optical information and will not be described in detail here.
- the image capture device can control the optical encoding device it contains to encode the scene to be captured, and control the corresponding image sensor to perform exposure imaging to generate the encoded image frame.
- drive signals can be used to control optical encoding devices and image sensors. The drive signal may be generated by the image capture device or by other external devices coupled to the image capture device.
- the image capture device can obtain a short-exposure image frame in a first encoding exposure mode at a single moment, or can obtain multiple image frames by continuously exposing in a second encoding exposure mode and superimpose them to obtain a single frame encoding.
- Long exposure image frame The first and second ones are just to indicate that there are different encoding exposure methods between short exposure image frames and long exposure image frames.
- the encoding exposure method of each short exposure image frame and/or long exposure image frame may be different from each other.
- the encoding exposure methods may be different.
- continuous exposure to obtain multiple image frames refers to encoding scene information at multiple consecutive moments within one exposure to obtain multiple image frames.
- the optical encoder device can dynamically refresh the modulation pattern multiple times in response to the driving signal, and each modulation pattern corresponds to the scene information at the current moment; at the same time, the image The sensor can capture the scene information encoded by each corresponding modulation pattern according to the driving signal, and then multiple encoded image frames are superimposed along the time dimension in a long exposure process to obtain a single-frame encoded long exposure image frame.
- the encoding exposure method for the long-exposure image frame is different for the scene at each moment, so the encoding method can also be called time-varying encoding with spatial structure.
- Figure 3 shows a schematic diagram of using a modulation pattern to encode scene information to be captured to generate a long exposure image frame.
- the encoding exposure method used to generate a long exposure image frame is to use N mutually different spatially uneven modulation patterns to perform scene information captured at N consecutive moments. Encoding is performed to obtain N image frames, and the N image frames are superimposed to generate a long exposure image frame, where N is an integer greater than or equal to 2.
- N is an integer greater than or equal to 2.
- the captured short exposure image frame will typically also be encoded with the modulation pattern.
- the captured scene information may be encoded in a spatially uniform modulation pattern.
- Such a modulation pattern is usually a global constant pattern such that the scene spatial information in short-exposure image frames can be completely preserved.
- the exposure duration of the short exposure image frame is less than the total exposure duration of the long exposure image frame.
- a short exposure image frame may have a minimum unit exposure duration that the image capture device can support, while a long exposure image frame may have 8 or 16 minimum unit exposure durations.
- the exposure duration can be set according to actual needs and is not limited to a fixed exposure duration.
- a short exposure image frame may have 2 minimum unit exposure durations.
- multiple short exposure image frames and/or long exposure image frames may each have different exposure durations.
- the first long exposure image frame may have 8 minimum unit exposure durations
- the second long exposure image frame may have 16 minimum unit exposure durations.
- the method described in the present disclosure requires acquiring at least three exposure image frames, namely a first short exposure image frame, a long exposure image frame, and a second short exposure image frame.
- the first short exposure image frame, the long exposure image frame and the second short exposure image frame are captured sequentially in time sequence.
- the first short exposure image frame is captured at time 1
- the long exposure image frame is captured at time 2 to N-1
- the second short exposure image frame is captured at time N.
- the above acquired image frames may be discontinuous, for example, the second short exposure image frame is not captured at time N but at time N+1.
- the first short exposure image frame, the long exposure image frame and the second short exposure image frame are continuously captured by the image capture device.
- the order in which the image frames are acquired may be acquired in any order and not based on the capture order, since the captured image frames may also typically have a mark indicating, for example, the capture time.
- the long exposure image frame After acquiring the first short exposure image frame, the long exposure image frame, and the second short exposure image frame, at step S120, the long exposure image frame may be reconstructed to obtain a plurality of pre-reconstructed frames.
- the long exposure image frames can be reconstructed to obtain multiple pre-reconstructed frames for subsequent processing.
- a neural network for example, a convolutional neural network based on residual blocks
- the reconstruction algorithm used can also be based on, for example, GAP-TV, E2E-CNN, DUN, and any other Applicable algorithm.
- FIG. 4 is a schematic diagram of reconstructing a long exposure image frame to obtain multiple pre-reconstructed frames with reference to FIG. 3 .
- N pre-reconstructed frames can be obtained by performing corresponding processing on the video using N mutually different spatially non-uniform modulation patterns used in Figure 3.
- the first short exposure image frame, the second short exposure image frame and each pre-reconstructed frame may be Fusion is performed to generate reconstructed frames.
- the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame can be input into a pre-trained neural network for fusion to generate a reconstructed frame, where the neural network can, for example, be based on a UNet structure.
- neural network structure such as conventional U-Net, RA-UNet, Swin-Conv-UNet, or the neural network can be based on other suitable neural network structures.
- Neural networks with a UNet structure usually have cross-layer connections and spatial up and down sampling. Such a network structure can fuse image frames more effectively.
- short exposure image frames generally have higher quality spatial information than long exposure image frames, and long exposure image frames have more temporal information than short exposure image frames.
- the fusion in the method described in the present disclosure utilizes the higher quality spatial information of two short exposure image frames near the pre-reconstructed frame to optimize the pre-reconstructed frame, so that the generated reconstructed frame has higher definition.
- Figure 5 is a diagram illustrating a comparative effect of a video reconstruction method based on a conventional long-exposure image frame and a reconstructed frame generated based on the method of the present disclosure.
- the first row of images shows the acquired first short exposure image frame, the long exposure image frame, and the second long exposure image frame that are sequentially captured by the image device.
- the second row of images shows partially reconstructed frames (reconstructed frames 3, 9, 15) from the video generated based on the prior art.
- the fourth image in the second row shows the magnification effect of the area surrounded by the black frame in the reconstructed frame 15 based on the prior art.
- the third row of images shows partially reconstructed frames from a video generated based on the method of the present disclosure.
- the fourth image in the third row shows the magnification effect of the area surrounded by the black frame in the reconstructed frame 15 based on the method of the present disclosure. It can be seen that compared with the reconstructed frames generated based on the prior art, the clarity of the reconstructed frames generated based on the method described in the present disclosure is significantly improved.
- a reconstructed video may be generated based on a plurality of reconstructed frames corresponding to a plurality of pre-reconstructed frames.
- the frame rate at which the image capture device captures image frames may be lower than the frame rate at which the reconstructed video is reconstructed, thereby allowing the user to achieve the effect of high-speed video shooting while using a low-speed camera.
- a sequence of short exposure image frames and long exposure image frames can also be captured alternately in sequence, so that a continuous longer reconstructed video can be generated, which can provide users with better visual experience.
- a second long exposure image frame and a third short exposure image frame sequentially captured by the image capturing device after the second short exposure image frame may be acquired, wherein the third short exposure image frame is coded in the first encoding
- the short-exposure image frame obtained by the exposure method, and the second long-exposure image frame is a single-frame coded long-exposure image frame obtained by superposing multiple image frames obtained by continuous exposure in the second coded exposure method; for the second long-exposure image frame Perform reconstruction to obtain a plurality of second pre-reconstructed frames; for each second pre-reconstructed frame in the plurality of second pre-reconstructed frames, combine the second short exposure image frame, the third short exposure image frame and each second pre-reconstructed frame.
- data such as captured image frames or other data in the image capture device may be stored in a storage device, which may include any of a variety of distributed or locally accessed data storage media, such as a hard drive Drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing data.
- a storage device may include any of a variety of distributed or locally accessed data storage media, such as a hard drive Drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing data.
- the data in the image capture device can also be transmitted to the server via a wired network or a wireless network for acquisition by other devices, or directly transmitted to other devices, so that the captured images are further processed by other devices.
- the network may be a wired network and/or a wireless network.
- wired networks can use twisted pair, coaxial cable or optical fiber transmission for data transmission
- wireless networks can use 3G/4G/5G and other mobile communication networks, Bluetooth, Zigbee or WiFi for data transmission.
- the image processor for processing the image frames may also be directly or indirectly coupled to the image capture device to obtain the image frames captured by the image processing capture device.
- the image processor may be a central processing unit (CPU), a graphics processing unit (GPU), a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) ) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other devices that can perform image processing functions.
- the image processor may also be integrated with an image capture device, such as a camera including an image capture device and an image processor.
- the video generation method described in the present disclosure is described in detail above with reference to Figures 1-5. It can be known from the above detailed description that according to the first embodiment of the present disclosure, in addition to acquiring the single-frame encoded long exposure image frame captured in the second encoding exposure mode, the image frame before and after the long exposure image frame can also be acquired with the second encoding exposure mode.
- the two short-exposure image frames captured by the first encoding exposure mode with different encoding exposure modes are combined into two short-exposure image frames by utilizing the higher spatial information in the short-exposure image frame and the higher temporal information in the long-exposure image frame.
- the exposure image frame is fused with each of the plurality of pre-reconstruction frames generated from the long exposure image frame to generate a plurality of optimized reconstructed frames with higher definition, and the video generated based on such reconstructed frames
- the quality can be improved, allowing users to have a good visual experience.
- FIG. 6 is a flowchart illustrating a video generation method according to the second embodiment of the present disclosure.
- the steps in the first embodiment are further optimized to achieve reconstructed frames with higher definition, thereby improving the quality of the generated video.
- Some of the steps shown in FIG. 6 are similar to the steps shown in FIG. 1 , and therefore are marked with the same reference numerals and will not be repeated here.
- step S130 may include step S610: for each pre-reconstructed frame in the multiple pre-reconstructed frames, Determine a first interpolated image frame between each pre-reconstructed frame and the first short exposure image frame and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame; and step S620: convert the first The interpolated image frame, the second interpolated image frame and each pre-reconstructed frame are fused to generate a reconstructed frame.
- the interpolated image frame can be obtained by mapping the spatial position of the image frame based on the motion information and offset of the object in the image; it can also be obtained by processing the image frame using, for example, a neural network with deformation convolution. ; It can also be obtained in other ways based on the information of the objects in the two image frames.
- the image processor can perform corresponding operations using objects in the image as processing units.
- the objects in the image include at least one of the following: pixels of the image frame, coding units, identifiable features, or identifiable features that can otherwise represent motion information and offsets in the image frame. processing unit. Recognizable features may be specific features in images recognized based on image recognition technology.
- the identifiable feature may be an airplane graphic as shown in Figure IB.
- the object in each pre-reconstruction frame in the plurality of pre-reconstruction frames it can be determined that the object in each pre-reconstruction frame is consistent with the first short exposure image frame
- the first relative position relationship information set of the corresponding object in each pre-reconstruction frame and the second relative position relationship information set of the corresponding object in the second short exposure image frame based on the first relative position relationship information set, Perform spatial position mapping interpolation on the first short exposure image frame to align the spatial position of the object in each pre-reconstructed frame with the corresponding object in the first short exposure image frame to obtain the first interpolated image frame; based on the first short exposure image frame;
- Two relative position relationship information sets are used to perform spatial position mapping interpolation on the second short exposure image frame to align the spatial positions of the objects in each pre-reconstructed frame with the corresponding objects in the second short exposure image frame to obtain the second short exposure image frame.
- Two interpolated image frames Two interpolated image frames.
- the first relative position relationship information set and/or the second relative position relationship information set includes optical flow information describing the movement direction and offset of the object.
- optical flow information can be considered as optical flow information for a single object in an image frame, or as a collection of optical flow information for multiple or all objects in an image, depending on the context.
- the method of calculating optical flow information can be implemented based on existing technologies such as PWCNet, RAFT, and AMP, so it will not be described again in this article.
- the relative position relationship information set may also be other information describing the movement direction and offset of the object obtained according to the existing technology.
- the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame can be input into a pre-trained neural network for fusion to generate a reconstructed frame, wherein the
- the neural network is based on a neural network structure with a UNet structure, such as conventional U-Net, RA-UNet, Swin-Conv-UNet, or the neural network can be based on other suitable neural network structures.
- FIG. 7 is a schematic diagram illustrating a video generation method according to a second embodiment of the present disclosure.
- steps similar to S110 are performed to obtain a first short exposure image frame, a long exposure image frame, and a second short exposure image frame.
- steps similar to S120 are performed to obtain N pre-reconstructed frames.
- step S730 for the k-th pre-reconstruction frame among the N pre-reconstruction frames, calculate the first optical flow information from the k-th pre-reconstruction frame to the first short exposure image frame and the k-th pre-reconstruction frame to second optical flow information of the second short exposure image frame; and then perform spatial position mapping interpolation on the first short exposure image frame based on the first optical flow information to combine the object in the k-th pre-reconstructed frame with the first short exposure image
- the spatial positions of the corresponding objects in the frame are aligned to obtain a first interpolated image frame for the k-th pre-reconstructed frame, and the spatial position mapping interpolation is performed on the second short-exposure image frame based on the second optical flow information to convert the k-th pre-reconstructed frame.
- the spatial positions of the objects in the k pre-reconstructed frames and the corresponding objects in the second short exposure image frame are aligned to obtain a second interpolated image frame for the k-th pre-reconstructed frame; then the first interpolated image is combined based on the neural network
- the frame, the second interpolated image frame and the k-th pre-reconstructed frame are fused to generate a reconstructed frame corresponding to the k-th pre-reconstructed frame.
- a reconstructed image is generated based on the N reconstructed frames for the N pre-reconstructed frames.
- the reconstructed frame may also be generated based on bidirectional optical flow information, that is, both optical flow information from the reconstructed frame to the short exposure image frame and optical flow information from the short exposure image frame to the reconstructed frame.
- the method based on bidirectional optical flow information is similar to the method described with respect to Figure 7.
- the bidirectional optical flow information can make the spatial position of the object in the interpolated image frame closer to the position of the corresponding object in the pre-reconstructed frame, so compared with Using unidirectional optical flow information can further improve the clarity of the generated reconstructed frames.
- bidirectional optical flow information usually consumes more computing resources.
- the interpolated image frame may also be obtained without based on the spatial relative position relationship information set between the image frames.
- each pre-reconstruction frame and the first short exposure image frame and each pre-reconstruction frame and the second short exposure image frame are respectively input into the neural network.
- the neural network using deformation convolution uses the information in the image frame to directly perform spatial displacement processing on the objects in the image frame to generate an interpolated image frame, without actually calculating the relative position relationship and motion of the objects in the image frame.
- Information such as direction and offset.
- the spatial position of the object in the interpolated frame is more consistent with the spatial position of the corresponding object in the pre-reconstructed frame than the spatial position of the object in the short exposure, using The fusion of interpolated frames and pre-reconstructed frames further improves the clarity of the obtained reconstructed frames, thereby further improving the quality of the image.
- the method of the third embodiment of the present disclosure may also include further optimization of the interpolated image frame to further improve the clarity of the reconstructed frame.
- the first short exposure image frame, the first relative position relationship information set and the first interpolated image frame may also be The image frame and the second short exposure image frame, the second relative position relationship information set and the second interpolation image frame are input to the pre-trained neural network model for fusion to obtain the refined first relative position relationship information set and the first The information weight set and the refined second relative position relationship information set and the second information weight set, wherein the first information weight set indicates the weight of the information of the object in the first short exposure image frame in each pre-reconstruction frame , the second information weight set indicates the weight of the object information in the second short exposure image frame in each pre-reconstruction frame; based on the refined first relative position relationship information set, the first short exposure image frame can be Mapping interpolation of spatial positions to align the spatial positions of the objects in each pre-reconstructed frame with the corresponding objects in the first short exposure image frame, and multiplying the interpolated
- the first fine interpolation image frame and the second fine interpolation image frame and each pre-reconstruction frame may be fused to obtain reconstruction frame.
- FIG. 8 is a schematic diagram illustrating a video generation method according to the third embodiment of the present disclosure.
- steps like S110 and S710 are performed to obtain a first short exposure image frame, a long exposure image frame, and a second short exposure image frame.
- steps like S120 and S720 are performed to obtain N pre-reconstructed frames.
- step S830 the first interpolated image frame and the second interpolated image frame are first obtained based on a method similar to step S730.
- the method of obtaining the first interpolated image frame and the second interpolated image frame may be based on a relative position relationship information set or may be based on a neural network with deformation convolution.
- a relative position relationship information set since it is usually necessary to use a relative position relationship information set when further optimizing the interpolated graphics frame, it is preferable to obtain the first interpolation image frame and the second interpolation image frame based on the relative position relationship information set.
- the relative position relationship information set takes the first optical flow information and the second optical flow information shown in Figure 7 as an example, but the relative position relationship information set is not limited to the first optical flow information and the second optical flow information. information.
- the first short exposure image frame, the first optical flow information and the first interpolation image frame and the second short exposure image frame, the second optical flow information and the second interpolation image frame are input into the pre-reconstruction frame.
- the trained neural network models are fused to obtain the refined first optical flow information and the first information weight set and the refined second relative optical flow information and the second information weight set, where the first information weight The set indicates the weight of the information of the object in the first short exposure image frame in the k-th pre-reconstruction frame, and the second information weight set indicates the weight of the information of the object in the second short-exposure image frame in the k-th pre-reconstruction frame.
- the weight in where the weight information set may include a weighted map; based on the refined first optical flow information, spatial position mapping interpolation is performed on the first short exposure image frame to combine the objects in the k-th pre-reconstructed frame with The spatial positions of the corresponding objects in the first short exposure image frame are aligned, and the interpolated result is multiplied by the corresponding first information weight in the first information weight set to obtain the first fine interpolation image; based on the refined second Optical flow information, perform spatial position mapping interpolation on the second short exposure image frame to align the spatial positions of the object in the k-th pre-reconstructed frame and the corresponding object in the second short exposure image frame, and the interpolated The result is multiplied by the corresponding second information weight in the second information weight set to obtain a second fine interpolation image; and then the first fine interpolation image, the second fine interpolation image and the k-th pre-reconstructed frame are fused based on the neural network to generate the corresponding The reconstructed frame at the
- step S840 a reconstructed image is generated based on the N reconstructed frames for the N pre-reconstructed frames in a similar manner to steps S140 and S740.
- the spatial position of the object in the refined interpolated image frame is more consistent with the spatial position of the object in the pre-reconstructed frame. Therefore, the clarity of the image frame obtained by fusing the fine interpolation frame and the pre-reconstructed frame is further improved.
- the present disclosure also provides a video generation device 900 , which will be described in detail next with reference to FIG. 9 .
- FIG. 9 is a block diagram illustrating a video generating device 900 according to a fourth embodiment of the present disclosure.
- the video generation device 900 of the present disclosure may include an image frame acquisition module 910 , a pre-reconstruction module 920 , a fusion module 930 and a reconstruction module 940 .
- the image frame acquisition module 910 may be configured to acquire a first short exposure image frame, a long exposure image frame, and a second short exposure image frame sequentially captured by the image capture device, wherein the first short exposure image frame and the second short-exposure image frame is a short-exposure image frame obtained in a first coded exposure mode, and the long-exposure image frame is a plurality of image frames obtained by continuous exposure in a second coded exposure mode different from the first coded exposure mode.
- Single-frame encoded long-exposure image frames obtained by superposition.
- the first encoding exposure manner for generating short exposure image frames may be to encode the captured scene information in a spatially uniform modulation pattern.
- the second encoding exposure method for generating a long exposure image frame may be to encode the scene information captured at N consecutive time moments with N mutually different spatially non-uniform modulation patterns to obtain N image frames, and the N image frames are superimposed to generate a long exposure image frame, where N is an integer greater than or equal to 2.
- short exposure image frames generally have higher quality spatial information than long exposure image frames, and long exposure image frames have more temporal information than short exposure image frames.
- the first short exposure image frame, the long exposure image frame, and the second short exposure image frame may be continuously captured by the image capture device.
- an image capture device may include an optical encoding device for encoding a captured scene using different encoding exposures.
- Optical encoding devices may include digital micromirror devices (DMDs), liquid crystal on silicon modulators (LCoS), or other optical devices capable of setting modulation patterns or masks.
- DMDs digital micromirror devices
- LCDs liquid crystal on silicon modulators
- the pre-reconstruction module 920 may be configured to reconstruct the long exposure image frame to obtain a plurality of pre-reconstruction frames.
- the fusion module 930 may be configured to, for each of the plurality of pre-reconstructed frames, combine the first short exposure image frame, the second short exposure image frame and each of the pre-reconstructed frames. Fusion to generate reconstructed frames.
- the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame can be input into a pre-trained neural network for fusion to generate a reconstructed frame, wherein the neural network can be based on a UNet structure.
- Neural network structure such as conventional U-Net, RA-UNet, Swin-Conv-UNet, or the neural network can be based on other suitable neural network structures.
- the reconstruction module 940 may be configured to generate a reconstructed video based on a plurality of reconstructed frames corresponding to a plurality of pre-reconstructed frames.
- the frame rate at which the image capture device captures image frames may be lower than the frame rate at which the video is reconstructed.
- the image frame acquisition module 910 may be further configured to acquire a second long exposure image frame and a third short exposure image frame sequentially captured by the image capture device after the second short exposure image frame, wherein the third The short exposure image frame is a short exposure image frame obtained by the first encoding exposure method, and the second long exposure image frame is a single frame encoding long exposure image obtained by superposing multiple image frames obtained by continuous exposure by the second encoding exposure method.
- the pre-reconstruction module 920 may also be configured to reconstruct the second long exposure image frame to obtain a plurality of second pre-reconstructed frames;
- the fusion module 930 may also be configured to reconstruct each of the plurality of second pre-reconstructed frames.
- the second pre-reconstructed frame is to fuse the second short exposure image frame, the third short exposure image frame and each second pre-reconstructed frame to generate a second reconstructed frame;
- the reconstruction module 940 may also be configured to combine the second short exposure image frame with the plurality of second pre-reconstructed frames.
- FIG. 10 is a block diagram illustrating a video generation device 1000 according to a fifth embodiment of the present disclosure. Some components of FIG. 10 are the same as those shown in FIG. Elaborate.
- the fusion module 930 may include an interpolation unit 1010 and a fusion reconstruction unit 1020 .
- the interpolation unit 1010 may be configured to, for each of the plurality of pre-reconstructed frames, determine a first interpolated image frame between each pre-reconstructed frame and the first short exposure image frame and a second interpolated image frame between each pre-reconstructed frame and the second short exposure image frame;
- the fusion reconstruction unit may be configured 1020 to fuse the first interpolated image frame, the second interpolated image frame and each pre-reconstructed frame to Generate reconstructed frames.
- the interpolation unit 1010 may be further configured to: for each of the plurality of pre-reconstructed frames, determine whether the object in each pre-reconstructed frame is the same as the corresponding object in the first short exposure image frame.
- the exposure image frame performs spatial position mapping interpolation to align the spatial positions of the object in the first pre-reconstruction frame and the corresponding object in the first short exposure image frame to obtain the first interpolation image; based on the second relative position relationship information Set, and perform spatial position mapping interpolation on the second short exposure image frame to align the spatial positions of the objects in the first pre-reconstructed frame and the corresponding objects in the second short exposure image frame to obtain the second interpolated image.
- the first relative position relationship information set and/or the second relative position relationship information set may include optical flow information describing the movement direction and offset of the object.
- the interpolation unit 1010 may be further configured to: for each of the plurality of pre-reconstructed frames, combine each pre-reconstructed frame with the first short exposure image frame and each pre-reconstructed frame. and the second short exposure image frame are respectively input into the neural network; through the neural network, the spatial positions of the objects in the first pre-reconstructed frame and the corresponding objects in the first short exposure image frame are aligned to obtain the first interpolated image; through the neural network Align the spatial positions of the object in the first pre-reconstructed frame and the corresponding object in the second short exposure image frame to obtain the second interpolated image; wherein the neural network is pre-trained and the neural network uses deformation convolution .
- an object may include one of a pixel, a coding unit, or an identifiable feature of an image frame.
- FIG. 11 is a block diagram illustrating a video generation device 1100 according to a sixth embodiment of the present disclosure, in which some components of FIG. 11 are the same as those shown in FIGS. 9 and 10 , and therefore are shown with the same reference numerals. , and will not be described in detail.
- the fusion module 930 may also include a refinement unit 1110.
- the thinning unit 1110 may be configured to input the first short exposure image frame, the first relative position relationship information set and the first interpolation image frame into a pre-trained neural network model for fusion to obtain fine details.
- the first set of relative position relationship information and the first set of information weights after The exposure image frame, the second relative position relationship information set and the second interpolation image frame are input to the neural network model for fusion to obtain the refined second relative position relationship information set and the second information weight set, where the second information weight
- the set indicates the weight of the object information in the second short exposure image frame in each pre-reconstruction frame; based on the refined first relative position relationship information set, spatial position mapping interpolation is performed on the first short exposure image frame, To align the spatial position of the object in each pre-reconstructed frame with the corresponding object in the first short exposure image frame, and multiply the interpolated result by the corresponding first information weight in the first information weight set to obtain the first Finely interpolate the image; based on the refined second relative position relationship information set, perform
- the fusion reconstruction unit 1020 may be configured to fuse the first fine interpolation image frame and the second fine interpolation image frame and each pre-reconstruction frame to obtain a reconstructed frame.
- the present disclosure also provides a video generation system, which will be described in detail next with reference to FIG. 12 .
- FIG. 12 is a block diagram illustrating a video generation system 1200 according to a seventh embodiment of the present disclosure.
- the video generation system 1200 of the present disclosure may include an optical encoding device 1210 , an image capture sensor 1220 , and an image processor 1230 .
- the optical encoding device 1210 may be configured to set a plurality of encoding exposure modes for the scene to be photographed in response to the driving signal.
- the image capture sensor 1220 may be configured to sequentially perform exposures to capture a first short exposure image frame, a long exposure image frame, and a second short exposure image frame in response to the driving signal, wherein the first short exposure image frame
- the image frame and the second short-exposure image frame may be short-exposure image frames obtained in a first encoding exposure mode
- the long-exposure image frame may be obtained by continuous exposure in a second encoding exposure mode different from the first encoding exposure mode.
- a single-frame encoded long-exposure image frame obtained by superposing multiple image frames.
- the image processor 1230 may be configured to reconstruct the long exposure image frame to obtain a plurality of pre-reconstructed frames; for each pre-reconstructed frame of the plurality of pre-reconstructed frames, a th.
- a short exposure image frame, a second short exposure image frame and each pre-reconstructed frame are fused to generate a reconstructed frame; and a reconstructed video can be generated based on multiple reconstructed frames corresponding to the multiple pre-reconstructed frames.
- video generation system 1200 may be any device, such as a camera, that has capture, encoded exposure, and image processing capabilities.
- the video generation system 1200 may also include optical devices such as lenses to capture scene information.
- video generation system 1200 may also include drive circuitry that may generate drive signals to drive optical encoding device 1210 and image capture sensor 1220 .
- video generation system 1200 may also include input/output (I/O) components. I/O components may also be coupled to video generation system 1200, directly or indirectly.
- An I/O component may represent or interact with a modem, keyboard, mouse, touch screen, or similar device. In some cases, the I/O component may be implemented as part of the processor.
- users may interact with system 1200 via I/O components or via hardware components controlled by I/O components.
- the video generated by the image processor can be imaged to the user via the I/O component.
- the user can adjust the encoding exposure mode of the image capture sensor 1220 or adjust parameters of the image capture sensor 1220 via the I/O component.
- Figure 13 is a structural diagram illustrating an electronic device 1300 according to some embodiments of the present disclosure.
- electronic device 1300 may include a processor 1301 and a memory 1302. Both processor 1301 and memory 1302 may be connected via bus 1303.
- the electronic device 1300 may be any type of portable device (such as a smart camera, a smartphone, a tablet, etc.) or any type of fixed device (such as a desktop computer, a server, etc.).
- the processor 1301 can perform various actions and processes according to programs stored in the memory 1302.
- the processor 1301 may be an integrated circuit chip with signal processing capabilities.
- the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA off-the-shelf programmable gate array
- Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed.
- the general-purpose processor can be a microprocessor or the processor can be any conventional processor, etc., which can be of X86 architecture or ARM architecture.
- the memory 1302 stores computer-executable instructions, and when the computer-executable instructions are executed by the processor 801, the above video generation method is implemented.
- Memory 1302 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
- Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory.
- Volatile memory may be random access memory (RAM), which acts as an external cache.
- RAM Direct Memory Bus Random Access Memory
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDRSDRAM double data rate synchronous dynamic Random Access Memory
- ESDRAM Enhanced Synchronous Dynamic Random Access Memory
- SLDRAM Synchronous Linked Dynamic Random Access Memory
- DR RAM Direct Memory Bus Random Access Memory
- the video generation method according to the present disclosure may be recorded in a computer-readable recording medium.
- a computer-readable recording medium having computer-executable instructions stored therein may be provided, and when the computer-executable instructions are executed by a processor, the processor may be caused to perform the video generation method as described above.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains at least one element for implementing the specified logical function. Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
- the various example embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device. While aspects of embodiments of the present disclosure are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it will be understood that the blocks, devices, systems, techniques, or methods described herein may be used as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuitry or logic, general purpose hardware or controllers, or other computing devices, or some combination thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
La présente divulgation concerne un procédé, un appareil et un système de génération de vidéo, un dispositif électronique et un support de stockage lisible. Le procédé comprend : l'acquisition d'une première trame d'image de courte exposition, d'une trame d'image de longue exposition et d'une seconde trame d'image de courte exposition, qui sont capturées séquentiellement par un dispositif de capture d'images, la première trame d'image de courte exposition et la seconde trame d'image de courte exposition étant des trames d'image de courte exposition qui sont obtenues dans un premier mode d'exposition codé, et la trame d'image de longue exposition étant une trame d'image de longue exposition codée de trame unique qui est obtenue par superposition d'une pluralité de trames d'image qui sont obtenues par la réalisation d'une exposition continue dans un second mode d'exposition codé différent du premier mode d'exposition codé ; la reconstruction de la trame d'image de longue exposition pour obtenir une pluralité de trames pré-reconstruites ; pour chaque trame pré-reconstruite parmi la pluralité de trames pré-reconstruites, la fusion de la première trame d'image de courte exposition, de la seconde trame d'image de courte exposition et de chaque trame pré-reconstruite de façon à générer une trame reconstruite ; et la génération d'une vidéo sur la base d'une pluralité de trames reconstruites correspondant à la pluralité de trames pré-reconstruites.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210712026.3 | 2022-06-22 | ||
| CN202210712026.3A CN115118974B (zh) | 2022-06-22 | 2022-06-22 | 视频生成方法、装置、系统、电子设备以及可读存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023246041A1 true WO2023246041A1 (fr) | 2023-12-28 |
Family
ID=83329353
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/141033 Ceased WO2023246041A1 (fr) | 2022-06-22 | 2022-12-22 | Procédé, appareil et système de génération de vidéo, dispositif électronique et support de stockage lisible |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN115118974B (fr) |
| WO (1) | WO2023246041A1 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115118974B (zh) * | 2022-06-22 | 2025-11-04 | 清华大学 | 视频生成方法、装置、系统、电子设备以及可读存储介质 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130100314A1 (en) * | 2011-10-06 | 2013-04-25 | Aptina Imaging Corporation | Imaging systems and methods for generating motion-compensated high-dynamic-range images |
| CN104349069A (zh) * | 2013-07-29 | 2015-02-11 | 广达电脑股份有限公司 | 拍摄高动态范围影片的方法 |
| WO2017146972A1 (fr) * | 2016-02-22 | 2017-08-31 | Dolby Laboratories Licensing Corporation | Appareil et procédé de codage d'un contenu à cadence élevée en vidéo à fréquence de trame standard en utilisant un entrelacement temporel |
| CN110191299A (zh) * | 2019-04-15 | 2019-08-30 | 浙江大学 | 一种基于卷积神经网络的多重帧插值方法 |
| CN111462021A (zh) * | 2020-04-27 | 2020-07-28 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备和计算机可读存储介质 |
| US20210314474A1 (en) * | 2020-04-01 | 2021-10-07 | Samsung Electronics Co., Ltd. | System and method for motion warping using multi-exposure frames |
| CN115118974A (zh) * | 2022-06-22 | 2022-09-27 | 清华大学 | 视频生成方法、装置、系统、电子设备以及可读存储介质 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113329185A (zh) * | 2020-02-28 | 2021-08-31 | 努比亚技术有限公司 | 一种视频的拍摄方法、拍摄设备及计算机可读存储介质 |
| CN111757006B (zh) * | 2020-07-08 | 2021-10-29 | Oppo广东移动通信有限公司 | 图像获取方法、摄像头组件及移动终端 |
-
2022
- 2022-06-22 CN CN202210712026.3A patent/CN115118974B/zh active Active
- 2022-12-22 WO PCT/CN2022/141033 patent/WO2023246041A1/fr not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130100314A1 (en) * | 2011-10-06 | 2013-04-25 | Aptina Imaging Corporation | Imaging systems and methods for generating motion-compensated high-dynamic-range images |
| CN104349069A (zh) * | 2013-07-29 | 2015-02-11 | 广达电脑股份有限公司 | 拍摄高动态范围影片的方法 |
| WO2017146972A1 (fr) * | 2016-02-22 | 2017-08-31 | Dolby Laboratories Licensing Corporation | Appareil et procédé de codage d'un contenu à cadence élevée en vidéo à fréquence de trame standard en utilisant un entrelacement temporel |
| CN110191299A (zh) * | 2019-04-15 | 2019-08-30 | 浙江大学 | 一种基于卷积神经网络的多重帧插值方法 |
| US20210314474A1 (en) * | 2020-04-01 | 2021-10-07 | Samsung Electronics Co., Ltd. | System and method for motion warping using multi-exposure frames |
| CN111462021A (zh) * | 2020-04-27 | 2020-07-28 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备和计算机可读存储介质 |
| CN115118974A (zh) * | 2022-06-22 | 2022-09-27 | 清华大学 | 视频生成方法、装置、系统、电子设备以及可读存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115118974A (zh) | 2022-09-27 |
| CN115118974B (zh) | 2025-11-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11699217B2 (en) | Generating gaze corrected images using bidirectionally trained network | |
| CN113837938B (zh) | 基于动态视觉传感器重建潜在图像的超分辨率方法 | |
| US10582196B2 (en) | Generating heat maps using dynamic vision sensor events | |
| Han et al. | Hybrid high dynamic range imaging fusing neuromorphic and conventional images | |
| CN103209307B (zh) | 编码重聚焦计算摄像方法及装置 | |
| Liu et al. | End-to-end neural video coding using a compound spatiotemporal representation | |
| CN110366048A (zh) | 视频传输方法、装置、电子设备和计算机可读存储介质 | |
| CN113096021B (zh) | 一种图像处理方法、装置、设备及存储介质 | |
| JP2012134963A (ja) | シーンの一連のフレームをビデオとして取得するカメラ | |
| US20250069270A1 (en) | Image Compression and Reconstruction Using Machine Learning Models | |
| CN112150400A (zh) | 图像增强方法、装置和电子设备 | |
| CN112750092B (zh) | 训练数据获取方法、像质增强模型与方法及电子设备 | |
| CN108322650A (zh) | 视频拍摄方法和装置、电子设备、计算机可读存储介质 | |
| Nguyen et al. | Learning spatially varying pixel exposures for motion deblurring | |
| US20230410251A1 (en) | Methods And Apparatus For Optimized Stitching Of Overcapture Content | |
| CN116389912B (zh) | 脉冲相机融合普通相机重构高帧率高动态范围视频的方法 | |
| EP3979185A1 (fr) | Procédé de traitement d'images, programme, appareil de traitement d'images, procédé de production d'un modèle d'apprentissage formé, procédé d'apprentissage, appareil d'apprentissage et système de traitement d'images | |
| CN110689509A (zh) | 基于循环多列3d卷积网络的视频超分辨率重建方法 | |
| WO2023246041A1 (fr) | Procédé, appareil et système de génération de vidéo, dispositif électronique et support de stockage lisible | |
| CN116091337B (zh) | 一种基于事件信号神经编码方式的图像增强方法及装置 | |
| CN115063312B (zh) | 基于事件相机辅助的卷帘门效应矫正方法及装置 | |
| CN117391968A (zh) | 一种人脸图像复原方法、系统、存储介质及设备 | |
| CN115471417B (zh) | 图像降噪处理方法、装置、设备、存储介质和程序产品 | |
| CN104539851B (zh) | 一种基于像元优化编码曝光的高速成像系统及方法 | |
| US11363209B1 (en) | Systems and methods for camera zoom |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22947776 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22947776 Country of ref document: EP Kind code of ref document: A1 |