WO2025060587A1 - Procédé d'élimination de gigue vidéo, dispositif électronique, système et support de stockage - Google Patents
Procédé d'élimination de gigue vidéo, dispositif électronique, système et support de stockage Download PDFInfo
- Publication number
- WO2025060587A1 WO2025060587A1 PCT/CN2024/103322 CN2024103322W WO2025060587A1 WO 2025060587 A1 WO2025060587 A1 WO 2025060587A1 CN 2024103322 W CN2024103322 W CN 2024103322W WO 2025060587 A1 WO2025060587 A1 WO 2025060587A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- processed
- posture information
- camera
- camera posture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Definitions
- This article relates to but is not limited to the field of video processing technology.
- the embodiments of the present application provide a video jitter removal method, electronic device, system and storage medium, which utilize neural radiation field technology to perform rendering based on a generated smooth camera path to achieve processing of the video to be processed to achieve the effect of removing jitter.
- the present application provides a method for removing video jitter, including:
- the video scene of the video to be processed is rendered using the trained neural radiance field to generate a new video with the jitter removed.
- the present application also provides an electronic device, including one or more processors; a storage device for storing one or more programs;
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video jitter removal method as described in any embodiment of the present application.
- the embodiment of the present application also provides a video jitter removal system, including a client and a server;
- the client is configured to obtain a video to be processed and camera posture information corresponding to the video to be processed, and send the video to be processed and the camera posture information to the server;
- the server is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;
- the server is further configured to generate a smooth camera path according to the video to be processed and the camera posture information;
- the server is also configured to render the video scene of the video to be processed using the trained neural radiation field according to the smooth camera path to generate a new video with the jitter removed.
- An embodiment of the present application further provides a computer storage medium, in which a computer program is stored, wherein the computer program is configured to execute the video jitter removal method described in any embodiment of the present application when running.
- the video jitter removal system architecture proposed in the embodiment of the present application adopts a system architecture supported remotely by a server.
- the server is responsible for computing tasks that require high computing resources, such as neural radiation field training, camera posture correction, and rendering to generate new videos. This significantly alleviates the computing pressure on the client and ensures the practicality of the video jitter removal solution.
- FIG1 is a flow chart of a method for removing video jitter provided by an embodiment of the present application
- FIG2 is a schematic diagram of the structure of a video jitter removal system provided in an embodiment of the present application.
- FIG3 is a schematic diagram of the structure of another video jitter removal system provided in an embodiment of the present application.
- FIG4 is a flow chart of another video jitter removal method provided in an embodiment of the present application.
- FIG. 5 is a flow chart of another video jitter removal method provided in an embodiment of the present application.
- first and second used in this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality” is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.
- the embodiment of the present application provides an implementable solution, which uses the IMU (Inertial Measurement Unit) data sent back by the shooting device to calculate the translation and rotation, and then distorts the video frame by frame according to the translation and rotation to remove the video jitter.
- the resulting video obtained by this type of method will lose a certain amount of frame.
- the embodiment of the present application provides a video jitter removal method, as shown in FIG1 , comprising:
- Step 110 obtaining a video to be processed and camera posture information corresponding to the video to be processed
- Step 120 training the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field
- Step 130 generating a smooth camera path according to the video to be processed and the camera posture information
- Step 140 according to the smoothed camera path, the video scene of the video to be processed is rendered using the trained neural radiation field to generate a new video with the jitter removed.
- Neural Radiance Field is an emerging new viewpoint synthesis technology. It implicitly models the input video through a multi-layer perceptron, so that realistic images can be rendered from a new viewpoint.
- This solution uses a fully connected network to represent a three-dimensional scene, inputs three-dimensional spatial position and viewing angle information, outputs the volume density at the spatial position and the color information related to the viewing angle, and further combines the volume rendering technology to render the output color information and volume density onto the 2D image, thereby realizing new view synthesis and obtaining a new viewpoint image.
- the camera path refers to the camera position and motion trajectory used to generate each frame of a given video sequence in a three-dimensional scene.
- a video can be viewed as a sequence of continuous image frames. In order to generate these image frames, it is necessary to determine the position, direction, and movement of the camera in the scene.
- the camera path corresponding to the video describes the change process of the camera in time, that is, how the camera moves from one position and posture to the next position and posture and captures each image frame.
- the camera path corresponding to the video can be obtained by a variety of methods, including manually setting camera parameters, using a motion capture system to record real camera motion, and creating a virtual camera path through mathematical models and interpolation calculations. By defining and controlling the camera path corresponding to the video, it is possible to achieve perspective transformation, object tracking, simulation of photographic effects, etc., and ultimately generate a video sequence with coherent motion and visual sense.
- the camera path can be represented by a rotation that describes the position and orientation of the camera in three-dimensional space.
- Rotations are usually represented using Euler angles (such as pitch, yaw, and roll) or quaternions, and translations or displacements are represented using three-dimensional vectors.
- the camera path can be described by a series of discrete path points, each of which contains the position and posture information of the camera. These path points can be set manually or generated by other methods, such as real camera motion data recorded by a motion capture system or virtual path points calculated based on an interpolation algorithm.
- the camera path can be described by curve parameterization.
- curve parameterization Common curve types include Bezier Curve, Spline Curve, etc.
- Curve parameterization describes the change of the camera over time, and determines the position and direction of the camera at different time points according to the curve shape and control points.
- the new video generated by rendering based on the smooth camera path is a video with more coherent pictures and smoother changes compared to the initial video obtained from the shooting device. Therefore, it is also called a video with removed jitter.
- the step of obtaining a video to be processed and camera posture information corresponding to the video to be processed includes:
- the camera posture information corresponding to each frame image in the video to be processed is obtained through a simultaneous localization and mapping (SLAM) algorithm.
- SLAM simultaneous localization and mapping
- the frame image is also called an image frame, or frame for short.
- the step of obtaining a video to be processed and camera posture information corresponding to the video to be processed includes:
- the initial camera posture information is optimized by using the Structure from Motion (SFM) algorithm to obtain the camera posture information corresponding to each frame image in the video to be processed.
- SFM Structure from Motion
- the camera posture information obtained after SFM algorithm optimization is more accurate than the initial camera posture information.
- generating a smooth camera path according to the video to be processed and the camera posture information includes:
- Interpolation processing is performed on the camera posture information corresponding to the multiple key frames to obtain a smooth camera path corresponding to the video to be processed.
- the step of obtaining a plurality of key frames in the video to be processed includes:
- the plurality of key frames are determined from all frame images included in the video to be processed.
- the preset frame interval is 5
- a key frame is determined every 5 frames from the video to be processed, which is flexibly set according to the system processing performance needs and is not limited to a specific aspect.
- it is dynamically determined according to the motion information of the shooter. For example, in the case of intense exercise, the frame interval is small, and in the case of gentle action, the frame interval is large. More examples are not listed here one by one.
- the step of obtaining a plurality of key frames in the video to be processed includes: selecting a frame image satisfying a first condition as a key frame in the video to be processed, wherein the first condition is that a change in attribute information of the frame image on the original camera path compared to attribute information of a previous frame image of the frame image on the original camera path is greater than a first threshold;
- the original camera path is a camera path determined according to the camera posture information corresponding to the video to be processed.
- the acquiring a plurality of key frames in the video to be processed includes: acquiring a plurality of key frames selected by a user from the video to be processed.
- a user interaction instruction is received through a user interaction module to determine the selected multiple key frames.
- the attribute information of the frame image on the camera path includes one or more of the following:
- Location information orientation information.
- orientation information is simply referred to as orientation, also called rotation information, or direction information.
- the attribute information when the attribute information includes position information and orientation information, the attribute information is also referred to as camera attitude information, or attitude information for short.
- the position information is coordinates; in some exemplary embodiments, the coordinates are three-dimensional coordinates.
- a corresponding first threshold is set.
- the first threshold is a distance threshold
- the change range of the position information of the frame image on the original camera path compared with the position information of the previous frame image on the original camera path is greater than the distance threshold
- the frame image is determined as a key frame; that is, the distance between the position of the frame image on the original camera path and the position of the previous frame image of the frame image on the original camera path is greater than the distance threshold.
- the first threshold is the direction angle difference threshold.
- the change amplitude of the orientation information of the frame image on the original camera path compared to the orientation information of the previous frame image on the original camera path is greater than the direction angle threshold, the frame image is determined as a key frame; that is, the angle difference between the orientation of the frame image on the original camera path and the orientation of the previous frame image on the original camera path is greater than the distance direction angle difference threshold.
- the first threshold also correspondingly includes: a distance threshold and a direction angle difference threshold.
- the change amplitude of the orientation information of the frame image on the original camera path compared to the orientation information of the previous frame image on the original camera path of the frame image is greater than the direction angle threshold, or when the change amplitude of the position information of the frame image on the original camera path compared to the position information of the previous frame image on the original camera path of the frame image is greater than the distance threshold, the frame image is determined to be a key frame.
- the frame image is determined to be a key frame. More examples are not listed here one by one.
- the original camera path corresponding to the video to be processed may not be smooth, and there may be large jumps in spatial position, orientation or posture.
- frame images at path points with large changes in attribute information are selected to form the multiple key frames, and then interpolation processing is performed based on these selected key frames to obtain a smooth camera path corresponding to the video to be processed.
- the duration corresponding to the smoothed camera path obtained after interpolation processing based on multiple key frames is the same as the duration of the video to be processed.
- the duration corresponding to the smoothed camera path obtained after interpolation processing based on multiple key frames is the same as the duration of the video to be processed. It can be understood that the new video rendered based on the smoothed camera with the same duration has the same duration as the video to be processed obtained in step 110.
- the position information is interpolated using a linear interpolation method, and interpolation processing is performed based on the position information corresponding to a plurality of key frames to obtain the position information of the insertion point.
- the orientation information is interpolated using a spherical linear interpolation method, and interpolation processing is performed based on the orientation information corresponding to a plurality of key frames to obtain the orientation information of the insertion point.
- the duration of the smoothed camera path obtained after interpolation processing based on multiple key frames is shorter than the duration of the video to be processed. It can be understood that the duration of the new video rendered based on the smoothed camera with a shorter duration is shorter than the duration of the video to be processed obtained in step 110.
- generating a smooth camera path according to the video to be processed and the camera posture information includes:
- a smooth camera path corresponding to the video to be processed is generated according to the corrected camera posture information corresponding to each frame image in the video to be processed.
- the camera posture information includes: coordinates and orientation.
- the coordinates include three-dimensional coordinates.
- the orientation comprises a quaternion.
- correcting the posture information includes: correcting coordinates and correcting orientation.
- correcting the camera posture information corresponding to the frame image by using multiple frames of images adjacent to the frame image according to a preset number of iterations includes:
- the camera posture information corresponding to the frame is used as the initial camera posture information to be corrected. According to the preset number of iterations, in each round of iteration, the following steps are performed respectively:
- the corrected camera posture information is used as the camera posture information to be corrected in the next round of iteration.
- the camera posture information corresponding to the frame image in the video to be processed is used as the initial camera posture information to be corrected, which is corrected in the first iteration, and the corrected camera posture information is used as the camera posture information to be corrected in the next iteration, until N corrections are completed to obtain the final corrected camera posture information.
- the preset number of iterations N 3, that is, each initial camera posture information to be corrected is iterated 3 times to obtain the final correction result.
- the specific value of the number of iterations can be flexibly set as needed and is not limited to the aspects of the embodiments of the present application.
- correcting the camera posture information corresponding to the frame image by using multiple frames of images adjacent to the frame image according to a preset number of iterations includes:
- p is the three-dimensional coordinate to be corrected
- d is the orientation quaternion to be corrected
- n is the number of multiple frames adjacent to the frame image to be corrected
- p i is the three-dimensional coordinate of the i-th frame image, ⁇ p i
- i 1...n ⁇
- d i is the orientation quaternion of the i-th frame image, ⁇ d i
- i 1...n ⁇
- wi is the weight of the i-th frame image, ⁇ w i
- i 1...n ⁇
- ⁇ is the set correction degree coefficient, ⁇ [0,1]
- A(d) is the orthogonal attitude matrix of the quaternion d
- p * is the corrected three-dimensional coordinate
- d * is the corrected orientation quaternion.
- the three-dimensional coordinates to be corrected are the three-dimensional coordinates corresponding to the frame image to be corrected, and the orientation quaternion to be corrected is the quaternion corresponding to the frame image to be corrected;
- the three-dimensional coordinates to be corrected are the corrected three-dimensional coordinates obtained in the previous iteration, and the orientation quaternion to be corrected is the corrected quaternion obtained in the previous iteration.
- the weight wi is determined according to the following method:
- the multiple frames of images adjacent to the frame image in the above correction method are n frames of images adjacent to the frame image, where n is an integer greater than or equal to 1 and is flexibly set according to needs and is not limited to a specific aspect.
- the training of the initial neural radiation field according to the video to be processed and the camera posture information includes:
- a plurality of training samples are obtained according to the video to be processed and the camera posture information, wherein each training sample is composed of The light emitted by the pixel points of the frame image in the video to be processed and the corresponding color composition;
- the initial neural radiation field is trained according to the multiple training samples.
- the light emitted by a pixel point of a frame image in the video to be processed is determined based on the camera posture information corresponding to the frame image where the pixel point is located and the position of the pixel point in the frame image to which the pixel point belongs. That is, the light emitted by the pixel point is determined based on the camera posture information corresponding to the frame image where the pixel point is located and the position of the pixel point in the image.
- the step of obtaining a plurality of training samples according to the video to be processed and the camera posture information includes:
- the light emitted by each pixel and the color of the pixel form a sample.
- the color of each pixel point p of a frame image is c.
- ⁇ represents the azimuth or longitude, which is the angle between the projection from a certain reference direction on the reference plane (usually the positive x axis) to the point and the reference direction, and the value range of ⁇ is usually 0 to 2 ⁇ .
- a frame image in the video to be processed includes multiple pixels, corresponding to multiple training samples composed of light and color emitted by the pixels, that is, one pixel corresponds to one training sample, and multiple image frames in the video to be processed obtain more training samples.
- training the initial neural radiation field according to the plurality of training samples comprises:
- All training samples are randomly shuffled and then the initial neural radiation field is trained.
- l is light and c is color.
- the coordinate p is spatially deformed and its position is queried through hash coding.
- the multi-layer perceptron of the node is used to encode it.
- the encoded features f and d are encoded together using a global multi-layer perceptron to output the color c pred and the difference disp.
- the spatial division in the neural radiation field may be a uniform grid-based spatial division or an octree-based spatial division.
- the spatial deformation in the neural radiation field may be a spatial deformation based on normalized device coordinates, or a spatial deformation based on a perspective projection coordinate system.
- the video de-jittering solution provided in the embodiment of the present application obtains training samples from the video to be processed to train the neural radiation field, utilizes the new viewpoint synthesis technology based on the neural radiation field, and then renders and generates a new video with de-jittering corresponding to the video to be processed based on the smoothed camera path obtained after smoothing the camera path of the video to be processed. This can effectively remove video jitter and avoid the frame loss caused by some de-jittering solutions.
- the present application also provides an electronic device, including:
- processors one or more processors
- a storage device for storing one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video jitter removal method as described in any embodiment of the present application.
- the embodiment of the present application also provides a video jitter removal system, as shown in FIG2 , comprising:
- the client 210 is configured to obtain a video to be processed and camera posture information corresponding to the video to be processed, and send the video to be processed and the camera posture information to the server 220;
- the server 220 is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;
- the server 220 is further configured to generate a smooth camera path according to the video to be processed and the camera posture information;
- the server 220 is further configured to render the video scene of the video to be processed using the trained neural radiation field according to the smoothed camera path, so as to generate a new video with the jitter removed.
- the server 220 is further configured to send the new video after the jitter is removed to the client 210;
- the client 210 is also configured to display the new video after the jitter is removed.
- the client 210 is further configured to obtain a plurality of key frames in the video to be processed, and send the plurality of key frames to the server 220 .
- the client 210 includes: a user interaction module, configured to receive a user operation instruction and determine the selected multiple key frames from the video to be processed.
- the server 220 is further configured to perform interpolation processing on camera posture information corresponding to multiple key frames in the video to be processed, so as to obtain a smooth camera path corresponding to the video to be processed.
- the server 220 or the client 210 selects the multiple key frames from the video to be processed, and the frame images that meet the first condition are used as key frames;
- the first condition is that the change amplitude of the attribute information of the frame image on the original camera path compared with the attribute information of the previous frame image on the original camera path of the frame image is greater than a first threshold;
- the original camera path is a camera path determined according to the camera posture information corresponding to the video to be processed.
- the key frames used to determine the smooth camera path can be determined by the server 220 according to the video to be processed, or by the client 210 according to the video to be processed, or by the client 210 according to the user's operation instruction.
- the user uses the user interaction module to mark to determine multiple key frames.
- the embodiment of the present application also provides a video jitter removal system, as shown in FIG3 , comprising:
- the client 210 includes: an image acquisition module 2110, a posture information acquisition module 2120, and a user interaction module 2130;
- the server 220 includes: a neural radiation field training module 2210, a smooth path determination module 2220, a neural radiation field training module 2211, a smooth path determination module 2221, a neural radiation field training module 2212, a neural radiation field training module 2213, a neural radiation field training module 2214, a neural radiation field training module 2215, a neural radiation field training module 2216, a neural radiation field training module 2217, a Shooting field rendering module 2230.
- the image acquisition module 2110 is configured to acquire a video to be processed; it can be understood that the video to be processed includes a plurality of frame images.
- the posture information acquisition module 2120 is configured to acquire the camera posture information corresponding to the video to be processed; accordingly, the corresponding camera posture information includes the camera posture information corresponding to each frame of image.
- the user interaction module 2130 is configured to receive a user operation instruction and determine the multiple key frames selected from the video to be processed.
- the neural radiation field training module 2210 is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field.
- the smooth path determination module 2220 is configured to generate a smooth camera path according to the video to be processed and the camera posture information.
- the neural radiation field rendering module 2230 is configured to render the video to be processed using the trained neural radiation field according to the smoothed camera path to obtain a rendering result.
- the posture information acquisition module 2120 is configured to acquire the camera posture information corresponding to each frame image in the video to be processed through a SLAM algorithm.
- the posture information acquisition module 2120 is configured to obtain the initial camera posture information corresponding to each frame image in the video to be processed through the SLAM algorithm; optimize the initial camera posture information through the SFM algorithm to obtain the camera posture information corresponding to each frame image in the video to be processed.
- the client 210 further includes: a first data transceiver module, configured to send the video to be processed and camera posture information corresponding to the video to be processed to the server 220 .
- the first data transceiver module is further configured to send the multiple key frames selected by the user to the server 220 .
- the first data transceiver module is further configured to receive a rendering result from the server 220 .
- the user interaction module 2130 is configured to generate a new video after de-jittering according to the rendering result.
- the server 220 further includes: a second data transceiver module, configured to receive the video to be processed and the camera posture information corresponding to the video to be processed from the client 210.
- the second data transceiver module is further configured to receive the selected key frame from the client 210 .
- the second data transceiver module is further configured to send the rendering result to the client 210 .
- the smooth path determination module 2220 is configured to obtain a plurality of key frames in the video to be processed; and perform interpolation processing on the camera posture information corresponding to the plurality of key frames to obtain a smooth camera path corresponding to the video to be processed.
- the smooth path determination module 2220 is configured to correct the camera posture information corresponding to each frame image in the video to be processed according to a preset number of iterations using multiple frame images adjacent to the frame image; and generate a smooth camera path corresponding to the video to be processed based on the corrected camera posture information corresponding to each frame image in the video to be processed.
- the embodiment of the present application also provides a video jitter removal method, as shown in FIG4 , comprising:
- Step 410 the image acquisition module acquires the video to be processed
- Step 420 the posture information acquisition module acquires the camera posture information corresponding to the video to be processed through the SLAM algorithm
- Step 430 The first data transceiver module sends the video to be processed and the camera posture information corresponding to the video to be processed to the server;
- Step 440 the neural radiation field training module trains the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;
- Step 450 the user interaction module obtains a plurality of key frames and sends them to the server through the first data transceiver module;
- Step 460 the smooth path determination module performs interpolation processing according to the multiple key frames to obtain a smooth camera path;
- Step 470 The neural radiation field rendering module renders the video scene of the video to be processed using the trained neural radiation field according to the smoothed camera path to generate a new video after jitter removal;
- Step 480 sending the new video to the client through the second transceiver module
- Step 490 The user interaction module displays the new video.
- steps 450-460 are replaced by steps 451-461, as shown in FIG5 :
- Step 451 the smooth path determination module corrects the camera posture information corresponding to each frame image in the video to be processed by using multiple frames of images adjacent to the frame image according to a preset number of iterations;
- Step 461 The smooth path determination module generates a smooth camera path corresponding to the video to be processed according to the corrected camera posture information corresponding to each frame image in the video to be processed.
- the system architecture proposed in the disclosed embodiment is that the client executes the basic steps of video acquisition and camera posture information acquisition, and the server executes the neural radiation field training, camera posture information correction and rendering to generate new videos and other steps that require high computing resources.
- the server executes the neural radiation field training, camera posture information correction and rendering to generate new videos and other steps that require high computing resources.
- a distributed computing solution combining the server and the client is adopted to improve the feasibility and practicality of the solution.
- the powerful computing power of the server is utilized to avoid the insufficient computing power that may be faced by relying solely on the local execution of the solution by the video capture device, which affects the final jitter removal effect.
- An embodiment of the present application further provides a computer storage medium, in which a computer program is stored, wherein the computer program is configured to execute the video jitter removal method as described in any embodiment of the present application when running.
- a server remote support system architecture is adopted, and the server undertakes computing tasks that require high computing resources such as neural radiation field training, camera posture correction, and rendering to generate new videos, which significantly alleviates the computing pressure of the client and ensures the practicality of the solution of the present application.
- Such software can be distributed on a computer-readable medium, which can include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium).
- a computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules or other data).
- Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer.
- communication media generally contain computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Studio Devices (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un procédé d'élimination de gigue vidéo, un dispositif électronique, un système et un support de stockage. Le procédé consiste à : obtenir une vidéo à traiter et des informations d'attitude de caméra correspondant à la vidéo à traiter (110) ; à partir de la vidéo à traiter et des informations d'attitude de caméra, entraîner un champ de rayonnement neuronal initial, et obtenir un champ de rayonnement neuronal entraîné (120) ; à partir de la vidéo à traiter et des informations d'attitude de caméra, générer un trajet de caméra lisse (130) ; et à partir du trajet de caméra lisse, rendre une scène vidéo de la vidéo à traiter à l'aide du champ de rayonnement neuronal entraîné, et générer une nouvelle vidéo après élimination de gigue (140). Selon la solution fournie par la présente demande, une technologie de champ de rayonnement neuronal est utilisée et le rendu est effectué à partir du trajet de caméra lisse généré, ce qui permet d'obtenir l'effet d'élimination de gigue au moyen du traitement de la vidéo à traiter.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311221184.X | 2023-09-20 | ||
| CN202311221184.XA CN117596482A (zh) | 2023-09-20 | 2023-09-20 | 一种视频抖动去除方法、电子设备、系统和存储介质 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2025060587A1 true WO2025060587A1 (fr) | 2025-03-27 |
| WO2025060587A9 WO2025060587A9 (fr) | 2025-06-05 |
Family
ID=89918993
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/103322 Pending WO2025060587A1 (fr) | 2023-09-20 | 2024-07-03 | Procédé d'élimination de gigue vidéo, dispositif électronique, système et support de stockage |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117596482A (fr) |
| WO (1) | WO2025060587A1 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117596482A (zh) * | 2023-09-20 | 2024-02-23 | 虹软科技股份有限公司 | 一种视频抖动去除方法、电子设备、系统和存储介质 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180115706A1 (en) * | 2016-10-22 | 2018-04-26 | Microsoft Technology Licensing, Llc | Controlling generation of hyperlapse from wide-angled, panoramic videos |
| CN113542600A (zh) * | 2021-07-09 | 2021-10-22 | Oppo广东移动通信有限公司 | 一种图像生成方法、装置、芯片、终端和存储介质 |
| WO2022141445A1 (fr) * | 2020-12-31 | 2022-07-07 | 华为技术有限公司 | Procédé et dispositif de traitement d'image |
| KR20230026246A (ko) * | 2021-08-12 | 2023-02-24 | 주식회사 딥엑스 | 인공지능 기반 영상 안정화 방법 및 이에 대한 카메라 모듈 |
| CN116095487A (zh) * | 2021-11-05 | 2023-05-09 | Oppo广东移动通信有限公司 | 图像防抖方法、装置、电子设备及计算机可读存储介质 |
| CN116309137A (zh) * | 2023-02-17 | 2023-06-23 | 北京航空航天大学 | 一种多视点图像去模糊方法、装置、系统和电子介质 |
| CN116596963A (zh) * | 2023-04-11 | 2023-08-15 | 华南理工大学 | 一种基于神经辐射场的定位建图方法、装置及存储介质 |
| CN117596482A (zh) * | 2023-09-20 | 2024-02-23 | 虹软科技股份有限公司 | 一种视频抖动去除方法、电子设备、系统和存储介质 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9953400B2 (en) * | 2013-07-23 | 2018-04-24 | Microsoft Technology Licensing, Llc | Adaptive path smoothing for video stabilization |
| CN113766117B (zh) * | 2020-11-09 | 2023-08-08 | 北京沃东天骏信息技术有限公司 | 一种视频去抖动方法和装置 |
| CN113436113B (zh) * | 2021-07-22 | 2023-04-18 | 黑芝麻智能科技有限公司 | 防抖动的图像处理方法、装置、电子设备和存储介质 |
| CN114979785B (zh) * | 2022-04-15 | 2023-09-08 | 荣耀终端有限公司 | 视频处理方法、电子设备及存储介质 |
-
2023
- 2023-09-20 CN CN202311221184.XA patent/CN117596482A/zh active Pending
-
2024
- 2024-07-03 WO PCT/CN2024/103322 patent/WO2025060587A1/fr active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180115706A1 (en) * | 2016-10-22 | 2018-04-26 | Microsoft Technology Licensing, Llc | Controlling generation of hyperlapse from wide-angled, panoramic videos |
| WO2022141445A1 (fr) * | 2020-12-31 | 2022-07-07 | 华为技术有限公司 | Procédé et dispositif de traitement d'image |
| CN113542600A (zh) * | 2021-07-09 | 2021-10-22 | Oppo广东移动通信有限公司 | 一种图像生成方法、装置、芯片、终端和存储介质 |
| KR20230026246A (ko) * | 2021-08-12 | 2023-02-24 | 주식회사 딥엑스 | 인공지능 기반 영상 안정화 방법 및 이에 대한 카메라 모듈 |
| CN116095487A (zh) * | 2021-11-05 | 2023-05-09 | Oppo广东移动通信有限公司 | 图像防抖方法、装置、电子设备及计算机可读存储介质 |
| CN116309137A (zh) * | 2023-02-17 | 2023-06-23 | 北京航空航天大学 | 一种多视点图像去模糊方法、装置、系统和电子介质 |
| CN116596963A (zh) * | 2023-04-11 | 2023-08-15 | 华南理工大学 | 一种基于神经辐射场的定位建图方法、装置及存储介质 |
| CN117596482A (zh) * | 2023-09-20 | 2024-02-23 | 虹软科技股份有限公司 | 一种视频抖动去除方法、电子设备、系统和存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117596482A (zh) | 2024-02-23 |
| WO2025060587A9 (fr) | 2025-06-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108335353B (zh) | 动态场景的三维重建方法、装置和系统、服务器、介质 | |
| CN110264509B (zh) | 确定图像捕捉设备的位姿的方法、装置及其存储介质 | |
| CN114140510B (zh) | 一种增量式三维重建方法、装置以及计算机设备 | |
| CN103608847B (zh) | 一种用于图像模型构建的方法和装置 | |
| CN105210368B (zh) | 背景差分提取装置以及背景差分提取方法 | |
| US9253415B2 (en) | Simulating tracking shots from image sequences | |
| TW201915944A (zh) | 圖像處理方法、裝置、系統和儲存介質 | |
| KR20200015147A (ko) | 딥 러닝 기반의 카메라 캘리브레이션 방법 및 장치 | |
| US10586378B2 (en) | Stabilizing image sequences based on camera rotation and focal length parameters | |
| US20170301110A1 (en) | Producing three-dimensional representation based on images of an object | |
| CN113705379A (zh) | 一种手势估计方法、装置、存储介质及设备 | |
| WO2020107312A1 (fr) | Procédé de configuration de corps rigides et procédé de capture de mouvement optique | |
| WO2025097814A1 (fr) | Nouveau procédé et système de synthèse d'image de point de vue, dispositif électronique et support de stockage | |
| CN117527993A (zh) | 一种在可控空间中进行虚拟拍摄装置及进行虚拟拍摄的方法 | |
| WO2025060587A1 (fr) | Procédé d'élimination de gigue vidéo, dispositif électronique, système et support de stockage | |
| CN115131528A (zh) | 虚拟现实场景确定方法、装置及系统 | |
| CN111193918B (zh) | 影像处理系统及影像处理方法 | |
| CN115997379A (zh) | 用于立体渲染的图像fov的复原 | |
| CN114241127A (zh) | 全景图像生成方法、装置、电子设备和介质 | |
| CN112017242A (zh) | 显示方法及装置、设备、存储介质 | |
| JP7164873B2 (ja) | 画像処理装置及びプログラム | |
| CN115311472A (zh) | 一种动作捕捉方法及相关设备 | |
| CN115278049A (zh) | 拍摄方法及其装置 | |
| CN120014714B (zh) | 一种手部姿态识别方法、头戴式显示设备和存储介质 | |
| CN116129091B (zh) | 生成虚拟形象视频的方法及装置、电子设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24867006 Country of ref document: EP Kind code of ref document: A1 |