WO2025060587A1

WO2025060587A1 - Video jitter removal method, electronic device, system, and storage medium

Info

Publication number: WO2025060587A1
Application number: PCT/CN2024/103322
Authority: WO
Inventors: 朱捷; 林奶养; 王斌; 吴优; 王磊; 王进
Original assignee: ArcSoft Corp Ltd
Current assignee: ArcSoft Corp Ltd
Priority date: 2023-09-20
Filing date: 2024-07-03
Publication date: 2025-03-27
Anticipated expiration: 2026-03-20
Also published as: CN117596482A; WO2025060587A9

Abstract

Provided are a video jitter removal method, an electronic device, a system, and a storage medium. The method comprises: obtaining a video to be processed and camera attitude information corresponding to the video to be processed (110); on the basis of the video to be processed and the camera attitude information, training an initial neural radiance field, and obtaining a trained neural radiance field (120); on the basis of the video to be processed and the camera attitude information, generating a smooth camera path (130); and on the basis of the smooth camera path, rendering a video scene of the video to be processed by using the trained neural radiance field, and generating a new video after jitter removal (140). According to the solution provided by the present application, neural radiance field technology is utilized and rendering is carried out on the basis of the generated smooth camera path, achieving the effect of jitter removal by means of processing of the video to be processed.

Description

Video jitter removal method, electronic device, system and storage medium

本申请要求于2023年09月20日提交的、申请号为202311221184X、发明名称为“一种视频抖动去除方法、电子设备、系统和存储介质”的中国专利申请的优先权，其内容应理解为通过引用的方式并入本申请中。This application claims priority to the Chinese patent application filed on September 20, 2023, with application number 202311221184X, and invention name “A video jitter removal method, electronic device, system and storage medium”, the content of which should be understood as incorporated into this application by reference.

Technical Field

本文涉及但不限于视频处理技术领域。This article relates to but is not limited to the field of video processing technology.

Background Art

在日常生活中，拍摄视频已经成为了人们的习惯性行为，人们以此来记录信息，分享美好生活。然而普通用户在使用手持设备进行视频拍摄时，往往会出现抖动的问题，特别是在移动过程中。因此，在视频处理领域，对抖动去除技术的需求也日益广泛，并且要求越来越高。In daily life, shooting videos has become a habitual behavior for people to record information and share their beautiful lives. However, when ordinary users use handheld devices to shoot videos, jitter problems often occur, especially when moving. Therefore, in the field of video processing, the demand for jitter removal technology is becoming increasingly widespread and the requirements are becoming higher and higher.

发明内容Summary of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

本申请实施例提供了一种视频抖动去除方法、电子设备、系统和存储介质，利用神经辐射场技术，基于生成的平滑相机路径进行渲染，实现针对待处理视频的处理，以达到去除抖动的效果。The embodiments of the present application provide a video jitter removal method, electronic device, system and storage medium, which utilize neural radiation field technology to perform rendering based on a generated smooth camera path to achieve processing of the video to be processed to achieve the effect of removing jitter.

本申请实施例提供一种视频抖动去除方法，包括：The present application provides a method for removing video jitter, including:

获取待处理视频及所述待处理视频对应的相机姿态信息；Obtaining a video to be processed and camera posture information corresponding to the video to be processed;

根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；Training the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径；Generate a smooth camera path according to the video to be processed and the camera posture information;

根据所述平滑的相机路径，利用所述训练后的神经辐射场对所述待处理视频的视频场景进行渲染，生成去除抖动后的新视频。According to the smoothed camera path, the video scene of the video to be processed is rendered using the trained neural radiance field to generate a new video with the jitter removed.

本申请实施例还提供一种电子设备，包括，一个或多个处理器；存储装置，用于存储一个或多个程序，The present application also provides an electronic device, including one or more processors; a storage device for storing one or more programs;

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本申请任一实施例所述的视频抖动去除方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the video jitter removal method as described in any embodiment of the present application.

本申请实施例还提供一种视频抖动去除系统，包括，客户端和服务器；The embodiment of the present application also provides a video jitter removal system, including a client and a server;

其中，所述客户端配置为，获取待处理视频及所述待处理视频对应的相机姿态信息，并将所述待处理视频和所述相机姿态信息发送至所述服务器；The client is configured to obtain a video to be processed and camera posture information corresponding to the video to be processed, and send the video to be processed and the camera posture information to the server;

所述服务器配置为，根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；The server is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

所述服务器还配置为，根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径； The server is further configured to generate a smooth camera path according to the video to be processed and the camera posture information;

所述服务器还配置为，根据所述平滑的相机路径，利用所述训练后的神经辐射场对所述待处理视频的视频场景进行渲染，生成去除抖动后的新视频。The server is also configured to render the video scene of the video to be processed using the trained neural radiation field according to the smooth camera path to generate a new video with the jitter removed.

本申请实施例还提供一种计算机存储介质，所述存储介质中存储有计算机程序，其中，所述计算机程序被设置为运行时执行本申请任一实施例所述的视频抖动去除方法。An embodiment of the present application further provides a computer storage medium, in which a computer program is stored, wherein the computer program is configured to execute the video jitter removal method described in any embodiment of the present application when running.

本申请实施例提出的视频抖动去除系统架构，采用了服务器远程支撑的体系架构，由服务器承担神经辐射场训练、相机姿态校正以及渲染生成新视频的等对计算资源要求较高的计算任务，显著缓解了客户端的计算压力，确保了视频抖动去除方案的实用性。The video jitter removal system architecture proposed in the embodiment of the present application adopts a system architecture supported remotely by a server. The server is responsible for computing tasks that require high computing resources, such as neural radiation field training, camera posture correction, and rendering to generate new videos. This significantly alleviates the computing pressure on the client and ensures the practicality of the video jitter removal solution.

本申请的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请而了解。本申请的其他优点可通过在说明书以及附图中所描述的方案来实现和获得。Other features and advantages of the present application will be described in the following description, and partly become apparent from the description, or be understood by implementing the present application. Other advantages of the present application can be realized and obtained by the schemes described in the description and the drawings.

在阅读并理解了附图和详细描述后，可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.

附图概述BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本申请技术方案的理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请技术方案的限制。The accompanying drawings are used to provide an understanding of the technical solution of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application and do not constitute a limitation on the technical solution of the present application.

图1为本申请实施例提供的一种视频抖动去除方法的流程图；FIG1 is a flow chart of a method for removing video jitter provided by an embodiment of the present application;

图2为本申请实施例提供的一种视频抖动去除系统的结构示意图；FIG2 is a schematic diagram of the structure of a video jitter removal system provided in an embodiment of the present application;

图3为本申请实施例提供的另一种视频抖动去除系统的结构示意图；FIG3 is a schematic diagram of the structure of another video jitter removal system provided in an embodiment of the present application;

图4为本申请实施例提供的另一种视频抖动去除方法的流程图；FIG4 is a flow chart of another video jitter removal method provided in an embodiment of the present application;

图5为本申请实施例提供的另一种视频抖动去除方法的流程图。FIG. 5 is a flow chart of another video jitter removal method provided in an embodiment of the present application.

详述Details

为使本申请的目的、技术方案和优点更加清楚明白，下文中将结合附图对本申请的实施例进行详细说明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present application more clear, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features in the embodiments of the present application can be combined with each other arbitrarily without conflict.

为了便于理解本申请，下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的实施例。但是，本申请可以以许多不同的形式来实现，并不限于本文所描述的实施例。相反地，提供这些实施例的目的是使本申请的公开内容更加透彻全面。In order to facilitate understanding of the present application, the present application will be described more fully below with reference to the relevant drawings. Embodiments of the present application are provided in the drawings. However, the present application can be implemented in many different forms and is not limited to the embodiments described herein. On the contrary, the purpose of providing these embodiments is to make the disclosure of the present application more thorough and comprehensive.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.

可以理解，本申请所使用的术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本申请的描述中，“多个”的含义是至少两个，例如两个、三个等，除非另有明确具体的限定。It is understood that the terms "first" and "second" used in this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality" is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

在此使用时，单数形式的“一”、“一个”和“所述/该”也可以包括复数形式，除非上下文清楚指出另外的方式。还应当理解的是，术语“包括/包含”或“具有”等指定所陈述的特征、整体、步骤、操作、组件、部分或它们的组合的存在，但是不排除存在或添加一个或更多个其他特征、整体、步骤、操作、组件、部分或它们的组合的可能性。同时，在本说明书中使用的术语“和/或”包括相关所列项目的任何及所有组合。When used herein, the singular forms "a", "an", and "said/the" may also include plural forms, unless the context clearly indicates otherwise. It should also be understood that the terms "include/comprise" or "have" and the like specify the presence of stated features, wholes, steps, operations, components, parts, or combinations thereof, but do not exclude the possibility of the presence or addition of one or more other features, wholes, steps, operations, components, parts, or combinations thereof. At the same time, in this specification The term "and/or" used in the present invention includes any and all combinations of the associated listed items.

本申请实施例提供一种可实现方案，采用拍摄设备传回的IMU(Inertial Measurement Unit，惯性测量单元)数据，计算出的平移和旋转量，再根据平移和旋转量对视频逐帧进行扭曲，以进行视频去抖动。这一类方法得到的结果视频会损失一定的画幅。The embodiment of the present application provides an implementable solution, which uses the IMU (Inertial Measurement Unit) data sent back by the shooting device to calculate the translation and rotation, and then distorts the video frame by frame according to the translation and rotation to remove the video jitter. The resulting video obtained by this type of method will lose a certain amount of frame.

本申请实施例提供一种视频抖动去除方法，如图1所示，包括：The embodiment of the present application provides a video jitter removal method, as shown in FIG1 , comprising:

步骤110，获取待处理视频及所述待处理视频对应的相机姿态信息；Step 110, obtaining a video to be processed and camera posture information corresponding to the video to be processed;

步骤120，根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；Step 120, training the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

步骤130，根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径；Step 130, generating a smooth camera path according to the video to be processed and the camera posture information;

步骤140，根据所述平滑的相机路径，利用所述训练后的神经辐射场对所述待处理视频的视频场景进行渲染，生成去除抖动后的新视频。Step 140, according to the smoothed camera path, the video scene of the video to be processed is rendered using the trained neural radiation field to generate a new video with the jitter removed.

其中，神经辐射场(Neural Radiance Field，NERF)是一种新兴的新视点合成技术，通过多层感知机对输入视频进行隐式建模，从而可以实现从新视点渲染出具有真实感的画面。该方案利用全连接网络来表示三维场景，输入三维空间位置和视角信息，输出该空间位置处的体积密度和视角相关的颜色信息，进一步结合体渲染技术，将可输出的颜色信息和体积密度渲染到2D图像上，从而实现新视图合成，得到了新视点图像。Among them, Neural Radiance Field (NERF) is an emerging new viewpoint synthesis technology. It implicitly models the input video through a multi-layer perceptron, so that realistic images can be rendered from a new viewpoint. This solution uses a fully connected network to represent a three-dimensional scene, inputs three-dimensional spatial position and viewing angle information, outputs the volume density at the spatial position and the color information related to the viewing angle, and further combines the volume rendering technology to render the output color information and volume density onto the 2D image, thereby realizing new view synthesis and obtaining a new viewpoint image.

需要说明的是，相机路径是指在三维场景中，用于生成给定视频序列的每一帧的相机位置和运动轨迹。在计算机图形学中，视频可以被看作是由连续的图像帧组成的序列。为了生成这些图像帧，需要确定相机在场景中的位置、方向和移动方式。视频对应的相机路径描述了相机在时间上的变化过程，即相机如何从一个位置和姿态移动到下一个位置和姿态，并捕捉到每个图像帧。视频对应的相机路径可以通过多种方法获得，包括手动设置相机参数、使用运动捕捉系统记录真实相机运动、通过数学模型和插值计算来创建虚拟相机路径等。通过定义和控制视频对应的相机路径，可以实现视角的变换、物体的追踪、摄影效果的模拟等，并最终生成具有连贯运动和视觉感的视频序列。It should be noted that the camera path refers to the camera position and motion trajectory used to generate each frame of a given video sequence in a three-dimensional scene. In computer graphics, a video can be viewed as a sequence of continuous image frames. In order to generate these image frames, it is necessary to determine the position, direction, and movement of the camera in the scene. The camera path corresponding to the video describes the change process of the camera in time, that is, how the camera moves from one position and posture to the next position and posture and captures each image frame. The camera path corresponding to the video can be obtained by a variety of methods, including manually setting camera parameters, using a motion capture system to record real camera motion, and creating a virtual camera path through mathematical models and interpolation calculations. By defining and controlling the camera path corresponding to the video, it is possible to achieve perspective transformation, object tracking, simulation of photographic effects, etc., and ultimately generate a video sequence with coherent motion and visual sense.

一些示例性实施例中，相机路径可以通过描述相机在三维空间中的位置的位移和方向的旋转来表示。通常使用欧拉角(如俯仰角、偏航角和滚转角)或四元数(quaternions)来表示旋转，并使用三维向量表示平移或位移。In some exemplary embodiments, the camera path can be represented by a rotation that describes the position and orientation of the camera in three-dimensional space. Rotations are usually represented using Euler angles (such as pitch, yaw, and roll) or quaternions, and translations or displacements are represented using three-dimensional vectors.

一些示例性实施例中，相机路径可以通过一系列离散的路径点来描述，每个路径点包含相机的位置和姿态信息。这些路径点可以手动设置或由其他方法生成，例如运动捕捉系统记录的真实相机运动数据或基于插值算法计算的虚拟路径点。In some exemplary embodiments, the camera path can be described by a series of discrete path points, each of which contains the position and posture information of the camera. These path points can be set manually or generated by other methods, such as real camera motion data recorded by a motion capture system or virtual path points calculated based on an interpolation algorithm.

一些示例性实施例中，相机路径可以通过曲线参数化的方式进行描述。常见的曲线类型包括贝塞尔曲线(Bezier Curve)、样条曲线(Spline Curve)等。曲线参数化描述了相机在时间上的变化，并根据曲线形状和控制点确定相机在不同时间点的位置和方向。In some exemplary embodiments, the camera path can be described by curve parameterization. Common curve types include Bezier Curve, Spline Curve, etc. Curve parameterization describes the change of the camera over time, and determines the position and direction of the camera at different time points according to the curve shape and control points.

可以理解，本申请实施例方案中，基于平滑的相机路径所渲染生成的新视频，相比于从拍摄设备中获取到的初始视频，是画面更连贯、变化更平滑的视频，因此，也被称为去除了抖动的视频。It can be understood that in the embodiment of the present application, the new video generated by rendering based on the smooth camera path is a video with more coherent pictures and smoother changes compared to the initial video obtained from the shooting device. Therefore, it is also called a video with removed jitter.

一些示例性实施例中，所述获取待处理视频及所述待处理视频对应的相机姿态信息，包括：In some exemplary embodiments, the step of obtaining a video to be processed and camera posture information corresponding to the video to be processed includes:

通过同步定位与地图构建(SLAM，Simultaneous Localization and Mapping)算法，获取所述待处理视频中的每一帧图像对应的相机姿态信息。 The camera posture information corresponding to each frame image in the video to be processed is obtained through a simultaneous localization and mapping (SLAM) algorithm.

其中，所述帧图像，也称为图像帧，简称帧。The frame image is also called an image frame, or frame for short.

通过SLAM算法，获取所述待处理视频中的每一帧图像对应的初始相机姿态信息；Obtaining initial camera posture information corresponding to each frame of the video to be processed through the SLAM algorithm;

通过运动恢复结构(SFM，Structure from Motion)算法对所述初始相机姿态信息进行优化，得到所述待处理视频中的每一帧图像对应的相机姿态信息。The initial camera posture information is optimized by using the Structure from Motion (SFM) algorithm to obtain the camera posture information corresponding to each frame image in the video to be processed.

可以理解，经过SFM算法优化后得到在相机姿态信息，相比于初始相机姿态信息精度更高。It can be understood that the camera posture information obtained after SFM algorithm optimization is more accurate than the initial camera posture information.

一些示例性实施例中，所述根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径，包括：In some exemplary embodiments, generating a smooth camera path according to the video to be processed and the camera posture information includes:

获取所述待处理视频中的多个关键帧；Acquire multiple key frames in the video to be processed;

对所述多个关键帧所对应的相机姿态信息进行插值处理，得到与所述待处理视频对应的平滑的相机路径。Interpolation processing is performed on the camera posture information corresponding to the multiple key frames to obtain a smooth camera path corresponding to the video to be processed.

一些示例性实施例中，所述获取所述待处理视频中的多个关键帧，包括：In some exemplary embodiments, the step of obtaining a plurality of key frames in the video to be processed includes:

根据预设的帧间隔，从所述待处理视频包括的全部帧图像中，确定所述多个关键帧。According to a preset frame interval, the plurality of key frames are determined from all frame images included in the video to be processed.

例如，预设的帧间隔为5，则从所述待处理视频中每隔5帧确定一个关键帧，根据系统处理性能需要灵活设定，不限于特定的方面。或者，根据拍摄者的运动信息动态确定。例如，剧烈运动的情况下，帧间隔较小，平缓行动的情况下，帧间隔较大，更多示例不在此一一例举。For example, if the preset frame interval is 5, a key frame is determined every 5 frames from the video to be processed, which is flexibly set according to the system processing performance needs and is not limited to a specific aspect. Alternatively, it is dynamically determined according to the motion information of the shooter. For example, in the case of intense exercise, the frame interval is small, and in the case of gentle action, the frame interval is large. More examples are not listed here one by one.

一些示例性实施例中，所述获取所述待处理视频中的多个关键帧，包括：在所述待处理视频中，选取满足第一条件的帧图像作为关键帧，其中，所述第一条件为所述帧图像在原始相机路径上的属性信息相较于所述帧图像的上一帧图像在原始相机路径上的属性信息的变化幅度大于第一阈值；In some exemplary embodiments, the step of obtaining a plurality of key frames in the video to be processed includes: selecting a frame image satisfying a first condition as a key frame in the video to be processed, wherein the first condition is that a change in attribute information of the frame image on the original camera path compared to attribute information of a previous frame image of the frame image on the original camera path is greater than a first threshold;

其中，所述原始相机路径为根据所述待处理视频对应的相机姿态信息所确定的相机路径。The original camera path is a camera path determined according to the camera posture information corresponding to the video to be processed.

一些示例性实施例中，所述获取所述待处理视频中的多个关键帧，包括：获取用户从所述待处理视频中选定的多个关键帧。In some exemplary embodiments, the acquiring a plurality of key frames in the video to be processed includes: acquiring a plurality of key frames selected by a user from the video to be processed.

一些示例性实施例中，通过用户交互模块，接收用户的交互指令，确定所选定的多个关键帧。In some exemplary embodiments, a user interaction instruction is received through a user interaction module to determine the selected multiple key frames.

一些示例性实施例中，帧图像在相机路径上的属性信息包括以下一种或多种：In some exemplary embodiments, the attribute information of the frame image on the camera path includes one or more of the following:

位置信息、朝向信息。Location information, orientation information.

一些示例性实施例中，所述朝向信息简称为朝向，也称为旋转信息，或方向信息。In some exemplary embodiments, the orientation information is simply referred to as orientation, also called rotation information, or direction information.

一些示例性实施例中，所述属性信息包括位置信息和朝向信息时，所述属性信息也称为相机姿态信息，简称姿态信息。In some exemplary embodiments, when the attribute information includes position information and orientation information, the attribute information is also referred to as camera attitude information, or attitude information for short.

一些示例性实施例中，所述位置信息为坐标；一些示例性实施例中，所述坐标为三维坐标。In some exemplary embodiments, the position information is coordinates; in some exemplary embodiments, the coordinates are three-dimensional coordinates.

其中，根据属性信息的不同，对应设定相应的第一阈值。例如，属性信息包括位置信息的情况下，第一阈值为距离阈值，在所述帧图像在原始相机路径上的位置信息相较于所述帧图像的上一帧图像在原始相机路径上的位置信息的变化幅度大于所述距离阈值的情况下，所述帧图像被确定为关键帧；也就是说，所述帧图像在原始相机路径上的位置与所述帧图像的上一帧图像在原始相机路径上的位置之间的距离大于所述距离阈值。According to different attribute information, a corresponding first threshold is set. For example, when the attribute information includes position information, the first threshold is a distance threshold, and when the change range of the position information of the frame image on the original camera path compared with the position information of the previous frame image on the original camera path is greater than the distance threshold, In this case, the frame image is determined as a key frame; that is, the distance between the position of the frame image on the original camera path and the position of the previous frame image of the frame image on the original camera path is greater than the distance threshold.

又例如，属性信息包括朝向信息的情况下，第一阈值为方向角度差阈值，在所述帧图像在原始相机路径上的朝向信息相较于所述帧图像的上一帧图像在原始相机路径上的朝向信息的变化幅度大于所述方向角度阈值的情况下，所述帧图像被确定为关键帧；也就是说，所述帧图像在原始相机路径上的朝向与所述帧图像的上一帧图像在原始相机路径上的朝向之间的角度差大于所述距离方向角度差阈值。For another example, when the attribute information includes orientation information, the first threshold is the direction angle difference threshold. When the change amplitude of the orientation information of the frame image on the original camera path compared to the orientation information of the previous frame image on the original camera path is greater than the direction angle threshold, the frame image is determined as a key frame; that is, the angle difference between the orientation of the frame image on the original camera path and the orientation of the previous frame image on the original camera path is greater than the distance direction angle difference threshold.

又例如，属性信息包括位置信息和朝向信息的情况下，第一阈值也相应包括：距离阈值和方向角度差阈值。在所述帧图像在原始相机路径上的朝向信息相较于所述帧图像的上一帧图像在原始相机路径上的朝向信息的变化幅度大于所述方向角度阈值，或者，所述帧图像在原始相机路径上的位置信息相较于所述帧图像的上一帧图像在原始相机路径上的位置信息的变化幅度大于所述距离阈值的情况下，所述帧图像被确定为关键帧。For another example, when the attribute information includes position information and orientation information, the first threshold also correspondingly includes: a distance threshold and a direction angle difference threshold. When the change amplitude of the orientation information of the frame image on the original camera path compared to the orientation information of the previous frame image on the original camera path of the frame image is greater than the direction angle threshold, or when the change amplitude of the position information of the frame image on the original camera path compared to the position information of the previous frame image on the original camera path of the frame image is greater than the distance threshold, the frame image is determined to be a key frame.

或者，在所述帧图像在原始相机路径上的朝向信息相较于所述帧图像的上一帧图像在原始相机路径上的朝向信息的变化幅度大于所述方向角度阈值，并且，所述帧图像在原始相机路径上的位置信息相较于所述帧图像的上一帧图像在原始相机路径上的位置信息的变化幅度大于所述距离阈值的情况下，所述帧图像被确定为关键帧。更多示例在此不一一举例。Alternatively, when the change amplitude of the orientation information of the frame image on the original camera path compared to the orientation information of the previous frame image on the original camera path is greater than the direction angle threshold, and the change amplitude of the position information of the frame image on the original camera path compared to the position information of the previous frame image on the original camera path is greater than the distance threshold, the frame image is determined to be a key frame. More examples are not listed here one by one.

可以理解，由于拍摄时抖动的存在，待处理视频对应的原始相机路径可能并不平滑，出现空间位置、朝向或姿态的大幅度跳变。一些示例性实施中，选择这些属性信息变化幅度大的路径点上的帧图像构成所述多个关键帧，再基于这些选定的关键帧进行插值处理，得到与所述待处理视频对应的平滑的相机路径。It is understandable that due to the presence of jitter during shooting, the original camera path corresponding to the video to be processed may not be smooth, and there may be large jumps in spatial position, orientation or posture. In some exemplary implementations, frame images at path points with large changes in attribute information are selected to form the multiple key frames, and then interpolation processing is performed based on these selected key frames to obtain a smooth camera path corresponding to the video to be processed.

一些示例性实施例中，根据多个关键帧进行插值处理后得到的平滑的相机路径对应的时长，与所述待处理视频的时长相同。In some exemplary embodiments, the duration corresponding to the smoothed camera path obtained after interpolation processing based on multiple key frames is the same as the duration of the video to be processed.

一些示例性实施例中，根据多个关键帧进行插值处理后得到的平滑的相机路径对应的时长，与所述待处理视频的时长相同。可以理解，基于相同时长的平滑相机所渲染得到的新视频与步骤110中所获取的待处理视频具有相同时长。In some exemplary embodiments, the duration corresponding to the smoothed camera path obtained after interpolation processing based on multiple key frames is the same as the duration of the video to be processed. It can be understood that the new video rendered based on the smoothed camera with the same duration has the same duration as the video to be processed obtained in step 110.

一些示例性实施例中，对位置信息的插值采用线性插值法，根据多个关键帧对应的位置信息进行插值处理，得到插入点的位置信息。In some exemplary embodiments, the position information is interpolated using a linear interpolation method, and interpolation processing is performed based on the position information corresponding to a plurality of key frames to obtain the position information of the insertion point.

一些示例性实施例中，对朝向信息的插值采用球面线性插值法，根据多个关键帧对应的朝向信息进行插值处理，得到插入点的朝向信息。In some exemplary embodiments, the orientation information is interpolated using a spherical linear interpolation method, and interpolation processing is performed based on the orientation information corresponding to a plurality of key frames to obtain the orientation information of the insertion point.

一些示例性实施例中，根据多个关键帧进行插值处理后得到的平滑的相机路径对应的时长小于所述待处理视频的时长。可以理解，基于更小时长的平滑相机所渲染得到的新视频的时长小于步骤110中所获取的待处理视频的时长。In some exemplary embodiments, the duration of the smoothed camera path obtained after interpolation processing based on multiple key frames is shorter than the duration of the video to be processed. It can be understood that the duration of the new video rendered based on the smoothed camera with a shorter duration is shorter than the duration of the video to be processed obtained in step 110.

对所述待处理视频中的每一帧图像，按照预设的迭代次数，利用与该帧图像相邻的多帧图像，对该帧图像对应的相机姿态信息进行校正；For each frame image in the video to be processed, according to a preset number of iterations, using multiple frames of images adjacent to the frame image, correct the camera posture information corresponding to the frame image;

根据所述待处理视频中的每一帧图像对应的校正后的相机姿态信息，生成与所述待处理视频对应的平滑的相机路径。A smooth camera path corresponding to the video to be processed is generated according to the corrected camera posture information corresponding to each frame image in the video to be processed.

一些示例性实施例中，所述相机姿态信息包括：坐标和朝向。In some exemplary embodiments, the camera posture information includes: coordinates and orientation.

一些示例性实施例中，所述坐标包括三维坐标。 In some exemplary embodiments, the coordinates include three-dimensional coordinates.

一些示例性实施例中，所述朝向包括四元数。In some exemplary embodiments, the orientation comprises a quaternion.

一些示例性实施例中，对姿态信息进行校正，包括：校正坐标和校正朝向。In some exemplary embodiments, correcting the posture information includes: correcting coordinates and correcting orientation.

一些示例性实施例中，所述对所述待处理视频中的每一帧图像，按照预设的迭代次数，利用与该帧图像相邻的多帧图像，对该帧图像对应的相机姿态信息进行校正，包括：In some exemplary embodiments, for each frame image in the video to be processed, correcting the camera posture information corresponding to the frame image by using multiple frames of images adjacent to the frame image according to a preset number of iterations includes:

针对所述待处理视频中的每一帧图像，以该帧图像对应的相机姿态信息作为初始的待校正相机姿态信息，按照预设的迭代次数，在每一轮迭代中，分别执行以下步骤：For each frame of the video to be processed, the camera posture information corresponding to the frame is used as the initial camera posture information to be corrected. According to the preset number of iterations, in each round of iteration, the following steps are performed respectively:

获取所述相邻的多帧图像的图像参数，根据所述相邻的多帧图像的图像参数对该帧图像的待校正相机姿态信息进行校正，其中，所述图像参数包括坐标、朝向及权重；Acquire image parameters of the adjacent multi-frame images, and correct the to-be-corrected camera posture information of the frame image according to the image parameters of the adjacent multi-frame images, wherein the image parameters include coordinates, orientations, and weights;

以校正后的相机姿态信息作为下一轮迭代的待相机校正姿态信息。The corrected camera posture information is used as the camera posture information to be corrected in the next round of iteration.

即对所述待处理视频中的每一帧图像的相机姿态信息分别执行多次迭代校正，总的重复次数为预设的迭代次数N，以待处理视频中的帧图像对应的相机姿态信息为初始的待校正相机姿态信息，在首次迭代中被校正，以校正后的相机姿态信息作为下一次迭代的待校正相机姿态信息，直到完成N次校正，得到最终校正后的相机姿态信息。That is, multiple iterative corrections are performed on the camera posture information of each frame image in the video to be processed, and the total number of repetitions is the preset number of iterations N. The camera posture information corresponding to the frame image in the video to be processed is used as the initial camera posture information to be corrected, which is corrected in the first iteration, and the corrected camera posture information is used as the camera posture information to be corrected in the next iteration, until N corrections are completed to obtain the final corrected camera posture information.

一些示例性实施例中，预设的迭代次数N＝3，即对每一个初始的待校正的相机姿态信息进行3次迭代，得到最终的校正结果。迭代次数的具体数值根据需要灵活设定，不限于本申请实施例示例的方面。In some exemplary embodiments, the preset number of iterations N=3, that is, each initial camera posture information to be corrected is iterated 3 times to obtain the final correction result. The specific value of the number of iterations can be flexibly set as needed and is not limited to the aspects of the embodiments of the present application.

按照预设的迭代次数，重复以下方法进行校正：Repeat the following method for calibration according to the preset number of iterations:

根据确定校正后的坐标信息；according to Determine the corrected coordinate information;

根据确定校正后的朝向信息；according to Determine the corrected orientation information;

其中，p为待校正的三维坐标，d为待校正的朝向四元数；n为与该待校正帧图像相邻的多帧图像的数量；p_i为第i帧图像的三维坐标，{p_i|i＝1…n}，d_i为第i帧图像的朝向四元数，{d_i|i＝1…n}；w_i为第i帧图像的权重，{w_i|i＝1…n}；λ为设定的矫正程度系数，λ∈[0,1]；是Frobenius范数，A(d)是四元数d的正交姿态矩阵；p^*为校正后的三维坐标，d^*为校正后的朝向四元数。Wherein, p is the three-dimensional coordinate to be corrected, d is the orientation quaternion to be corrected; n is the number of multiple frames adjacent to the frame image to be corrected; p _i is the three-dimensional coordinate of the i-th frame image, {p _i |i＝1…n}, d _i is the orientation quaternion of the i-th frame image, {d _i |i＝1…n}; _wi is the weight of the i-th frame image, {w _i |i＝1…n}; λ is the set correction degree coefficient, λ∈[0,1]; is the Frobenius norm, A(d) is the orthogonal attitude matrix of the quaternion d; p ^* is the corrected three-dimensional coordinate, and d ^* is the corrected orientation quaternion.

其中，首次迭代中，所述待校正的三维坐标为该待校正帧图像对应的三维坐标，所述待校正的朝向四元数为该待校正帧图像对应的四元数；Wherein, in the first iteration, the three-dimensional coordinates to be corrected are the three-dimensional coordinates corresponding to the frame image to be corrected, and the orientation quaternion to be corrected is the quaternion corresponding to the frame image to be corrected;

非首次迭代中，所述待校正的三维坐标为上一次迭代所得到的校正后的三维坐标，所述待校正的朝向四元数为上一次迭代所得到的校正后的四元数。In a non-first iteration, the three-dimensional coordinates to be corrected are the corrected three-dimensional coordinates obtained in the previous iteration, and the orientation quaternion to be corrected is the corrected quaternion obtained in the previous iteration.

一些示例性实施例中，所述权重w_i根据以下方法确定：
In some exemplary embodiments, the weight _wi is determined according to the following method:

需要说明的是，上述校正方法中该帧图像相邻的多帧图像为与该帧图像相邻的n帧图像，n为大于或等于1的整数，根据需要灵活设定，不限于特定的方面。It should be noted that the multiple frames of images adjacent to the frame image in the above correction method are n frames of images adjacent to the frame image, where n is an integer greater than or equal to 1 and is flexibly set according to needs and is not limited to a specific aspect.

一些示例性实施例中，所述根据待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，包括：In some exemplary embodiments, the training of the initial neural radiation field according to the video to be processed and the camera posture information includes:

根据所述待处理视频和所述相机姿态信息获取多个训练样本，其中，每个训练样本由所述待处理视频中的帧图像的像素点所发射的光线及对应的颜色构成；A plurality of training samples are obtained according to the video to be processed and the camera posture information, wherein each training sample is composed of The light emitted by the pixel points of the frame image in the video to be processed and the corresponding color composition;

根据所述多个训练样本，对所述初始神经辐射场进行训练。The initial neural radiation field is trained according to the multiple training samples.

一些示例性实施例中，所述待处理视频中的帧图像的像素点所发射的光线根据所述像素点所在帧图像对应的相机姿态信息及所述像素点在归属帧图像中的位置确定。即，根据所述像素点所在的帧图像对应的相机姿态信息及所述像素点在图像中的位置，确定所述像素点所发射的光线。In some exemplary embodiments, the light emitted by a pixel point of a frame image in the video to be processed is determined based on the camera posture information corresponding to the frame image where the pixel point is located and the position of the pixel point in the frame image to which the pixel point belongs. That is, the light emitted by the pixel point is determined based on the camera posture information corresponding to the frame image where the pixel point is located and the position of the pixel point in the image.

一些示例性实施例中，所述根据所述待处理视频和所述相机姿态信息获取多个训练样本，包括：In some exemplary embodiments, the step of obtaining a plurality of training samples according to the video to be processed and the camera posture information includes:

根据待处理视频和所述相机姿态信息，确定多个数据组，每一数据组包括一个帧图像和对应的相机姿态信息；Determine a plurality of data groups according to the video to be processed and the camera posture information, each data group including a frame image and corresponding camera posture information;

针对每一个数据组包括的帧图像和相机姿态信息，执行以下步骤得到多个训练样本：For each data set including frame images and camera posture information, perform the following steps to obtain multiple training samples:

将该帧图像与相机姿态信息解析成每一个像素点发射的光线；Analyze the frame image and camera posture information into the light emitted by each pixel;

将每一个像素点发射的光线和该像素点的颜色，构成一个样本。The light emitted by each pixel and the color of the pixel form a sample.

一些示例性实施例中，一个帧图像的每一个像素点p的颜色为c，结合该帧图像对应的相机姿态信息和像素点位置，可以得到该像素点所发射的光线记为l(p,d)，其中p＝(x，y，z)为三维笛卡尔空间坐标系中该像素点的位置坐标，d＝(θ,φ)为球坐标系中该光线方向的立体角参数，其中，θ表示极角或纬度，它是从参考轴(通常是正z轴)到点的矢量与参考轴之间的夹角，θ的取值范围通常是0到π。φ表示方位角或经度，它是从参考平面上的某个基准方向(通常是正x轴)到点的投影与基准方向之间的夹角，φ的取值范围通常是0到2π。In some exemplary embodiments, the color of each pixel point p of a frame image is c. Combining the camera posture information and pixel position corresponding to the frame image, the light emitted by the pixel point can be obtained and recorded as l(p, d), where p = (x, y, z) is the position coordinate of the pixel point in the three-dimensional Cartesian space coordinate system, and d = (θ, φ) is the solid angle parameter of the light direction in the spherical coordinate system, where θ represents the polar angle or latitude, which is the angle between the vector from the reference axis (usually the positive z axis) to the point and the reference axis, and the value range of θ is usually 0 to π. φ represents the azimuth or longitude, which is the angle between the projection from a certain reference direction on the reference plane (usually the positive x axis) to the point and the reference direction, and the value range of φ is usually 0 to 2π.

可以理解，待处理视频中的一个帧图像包括多个像素点，对应得到多个由像素点所发射光线和颜色构成的多个训练样本，即一个像素点对应一个训练样本，所述待处理视频中的多个图像帧得到更多的训练样本。It can be understood that a frame image in the video to be processed includes multiple pixels, corresponding to multiple training samples composed of light and color emitted by the pixels, that is, one pixel corresponds to one training sample, and multiple image frames in the video to be processed obtain more training samples.

一些示例性实施例中，根据所述多个训练样本，对所述初始神经辐射场进行训练，包括：In some exemplary embodiments, training the initial neural radiation field according to the plurality of training samples comprises:

将全部训练样本，随机打乱后，对初始神经辐射场进行训练。All training samples are randomly shuffled and then the initial neural radiation field is trained.

其中，每一个输入样本(l，c)，l为光线，c为颜色，在划分过后的空间中对坐标p进行空间变形并通过哈希编码查询到其位置，使用该结点的多层感知机对其进行编码。编码所得的特征f和d一起使用一个全局的多层感知机进行编码，输出颜色c_pred和差异度disp。For each input sample (l, c), l is light and c is color. In the divided space, the coordinate p is spatially deformed and its position is queried through hash coding. The multi-layer perceptron of the node is used to encode it. The encoded features f and d are encoded together using a global multi-layer perceptron to output the color c _pred and the difference disp.

其中，损失函数 Among them, the loss function

一些示例性实施例中，神经辐射场中的空间划分可以是基于均匀网格的空间划分，或者是基于八叉树的空间划分。In some exemplary embodiments, the spatial division in the neural radiation field may be a uniform grid-based spatial division or an octree-based spatial division.

一些示例性实施例中，神经辐射场中的空间变形可以是基于规范化设备坐标的空间变形，或者是基于透视投影坐标系的空间变形。In some exemplary embodiments, the spatial deformation in the neural radiation field may be a spatial deformation based on normalized device coordinates, or a spatial deformation based on a perspective projection coordinate system.

可以理解，本申请实施例提供的视频去除抖动方案，从待处理视频中获取训练样本对神经辐射场进行训练，利用了基于神经辐射场的新视点合成技术，再根据待处理视频进行相机路径平滑处理后得到的平滑相机路径，渲染生成与待处理视频相对应的已去除抖动的新视频，能够有效去除视频抖动，避免了一些去抖动方案所造成的画幅损失。 It can be understood that the video de-jittering solution provided in the embodiment of the present application obtains training samples from the video to be processed to train the neural radiation field, utilizes the new viewpoint synthesis technology based on the neural radiation field, and then renders and generates a new video with de-jittering corresponding to the video to be processed based on the smoothed camera path obtained after smoothing the camera path of the video to be processed. This can effectively remove video jitter and avoid the frame loss caused by some de-jittering solutions.

本申请实施例还提供一种电子设备，包括，The present application also provides an electronic device, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，a storage device for storing one or more programs,

本申请实施例还提供一种视频抖动去除系统，如图2所示，包括，The embodiment of the present application also provides a video jitter removal system, as shown in FIG2 , comprising:

客户端210和服务器220；Client 210 and server 220;

其中，所述客户端210配置为，获取待处理视频及所述待处理视频对应的相机姿态信息，并将所述待处理视频和所述相机姿态信息发送至所述服务器220；The client 210 is configured to obtain a video to be processed and camera posture information corresponding to the video to be processed, and send the video to be processed and the camera posture information to the server 220;

所述服务器220配置为，根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；The server 220 is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

所述服务器220还配置为，根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径；The server 220 is further configured to generate a smooth camera path according to the video to be processed and the camera posture information;

所述服务器220还配置为，根据所述平滑的相机路径，利用所述训练后的神经辐射场对所述待处理视频的视频场景进行渲染，生成去除抖动后的新视频。The server 220 is further configured to render the video scene of the video to be processed using the trained neural radiation field according to the smoothed camera path, so as to generate a new video with the jitter removed.

一些示例性实施例中，服务器220还配置为，发送所述去除抖动后的新视频给客户端210；In some exemplary embodiments, the server 220 is further configured to send the new video after the jitter is removed to the client 210;

相应地，所述客户端210还配置为，显示所述去除抖动后的新视频。Correspondingly, the client 210 is also configured to display the new video after the jitter is removed.

一些示例性实施例中，所述客户端210还配置为，获取所述待处理视频中的多个关键帧，并将所述多个关键帧发送至所述服务器220。In some exemplary embodiments, the client 210 is further configured to obtain a plurality of key frames in the video to be processed, and send the plurality of key frames to the server 220 .

一些示例性实施例中，所述客户端210包括：用户交互模块，设置为接收用户操作指令，从所述待处理视频中确定被选定的所述多个关键帧。In some exemplary embodiments, the client 210 includes: a user interaction module, configured to receive a user operation instruction and determine the selected multiple key frames from the video to be processed.

一些示例性实施例中，所述服务器220还配置为，根据所述待处理视频中的多个关键帧，对所述多个关键帧所对应的相机姿态信息进行插值处理，得到与所述待处理视频对应的平滑的相机路径。In some exemplary embodiments, the server 220 is further configured to perform interpolation processing on camera posture information corresponding to multiple key frames in the video to be processed, so as to obtain a smooth camera path corresponding to the video to be processed.

一些示例性实施例中，所述多个关键帧由服务器220或客户端210从所述待处理视频中，选取满足第一条件的帧图像作为关键帧；In some exemplary embodiments, the server 220 or the client 210 selects the multiple key frames from the video to be processed, and the frame images that meet the first condition are used as key frames;

其中，所述第一条件为所述帧图像在原始相机路径上的属性信息相较于所述帧图像的上一帧图像在原始相机路径上的属性信息的变化幅度大于第一阈值；所述原始相机路径为根据所述待处理视频对应的相机姿态信息所确定的相机路径。Among them, the first condition is that the change amplitude of the attribute information of the frame image on the original camera path compared with the attribute information of the previous frame image on the original camera path of the frame image is greater than a first threshold; the original camera path is a camera path determined according to the camera posture information corresponding to the video to be processed.

可以看到，用于确定平滑的相机路径的关键帧可以由服务器220根据待处理视频自行确定，也可以由客户端210根据待处理视频自行确定，也可以是由客户端210根据用户的操作指令确定。例如，用户在视频拍摄或视频播放期间，利用用户交互模块进行标记，以确定多个关键帧。It can be seen that the key frames used to determine the smooth camera path can be determined by the server 220 according to the video to be processed, or by the client 210 according to the video to be processed, or by the client 210 according to the user's operation instruction. For example, during video shooting or video playback, the user uses the user interaction module to mark to determine multiple key frames.

本申请实施例还提供一种视频抖动去除系统，如图3所示，包括，The embodiment of the present application also provides a video jitter removal system, as shown in FIG3 , comprising:

客户端210和服务器220；Client 210 and server 220;

所述客户端210包括：图像获取模块2110，姿态信息获取模块2120，用户交互模块2130；The client 210 includes: an image acquisition module 2110, a posture information acquisition module 2120, and a user interaction module 2130;

所述服务器220包括：神经辐射场训练模块2210，平滑路径确定模块2220，神经辐射场渲染模块2230。The server 220 includes: a neural radiation field training module 2210, a smooth path determination module 2220, a neural radiation field training module 2211, a smooth path determination module 2221, a neural radiation field training module 2212, a neural radiation field training module 2213, a neural radiation field training module 2214, a neural radiation field training module 2215, a neural radiation field training module 2216, a neural radiation field training module 2217, a Shooting field rendering module 2230.

其中，所述图像获取模块2110设置为，获取待处理视频；可以理解，所述待处理视频包括多个帧图像。The image acquisition module 2110 is configured to acquire a video to be processed; it can be understood that the video to be processed includes a plurality of frame images.

所述姿态信息获取模块2120设置为，获取待处理视频对应的相机姿态信息；相应地，所述对应的相机姿态信息，包括每一帧图像对应的相机姿态信息。The posture information acquisition module 2120 is configured to acquire the camera posture information corresponding to the video to be processed; accordingly, the corresponding camera posture information includes the camera posture information corresponding to each frame of image.

所述用户交互模块2130设置为，接收用户操作指令，从所述待处理视频中确定被选定的所述多个关键帧。The user interaction module 2130 is configured to receive a user operation instruction and determine the multiple key frames selected from the video to be processed.

所述神经辐射场训练模块2210，设置为根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场。The neural radiation field training module 2210 is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field.

平滑路径确定模块2220，设置为根据所述待处理视频和所述相机姿态信息，生成平滑的相机路径。The smooth path determination module 2220 is configured to generate a smooth camera path according to the video to be processed and the camera posture information.

所述神经辐射场渲染模块2230，设置为根据所述平滑的相机路径，利用所述训练后的神经辐射场对所述待处理视频进行渲染，得到渲染结果。The neural radiation field rendering module 2230 is configured to render the video to be processed using the trained neural radiation field according to the smoothed camera path to obtain a rendering result.

一些示例性实施例中，所述姿态信息获取模块2120设置为，通过SLAM算法，获取所述待处理视频中的每一帧图像对应的相机姿态信息。In some exemplary embodiments, the posture information acquisition module 2120 is configured to acquire the camera posture information corresponding to each frame image in the video to be processed through a SLAM algorithm.

一些示例性实施例中，所述姿态信息获取模块2120设置为，通过SLAM算法，获取所述待处理视频中的每一帧图像对应的初始相机姿态信息；通过SFM算法对所述初始相机姿态信息进行优化，得到所述待处理视频中的每一帧图像对应的相机姿态信息。In some exemplary embodiments, the posture information acquisition module 2120 is configured to obtain the initial camera posture information corresponding to each frame image in the video to be processed through the SLAM algorithm; optimize the initial camera posture information through the SFM algorithm to obtain the camera posture information corresponding to each frame image in the video to be processed.

一些示例性实施例中，所述客户端210还包括：第一数据收发模块，设置为向服务器220发送所述待处理视频及所述待处理视频对应的相机姿态信息。In some exemplary embodiments, the client 210 further includes: a first data transceiver module, configured to send the video to be processed and camera posture information corresponding to the video to be processed to the server 220 .

一些示例性实施例中，所述第一数据收发模块还设置为，为向服务器220发送用户选定的所述多个关键帧。In some exemplary embodiments, the first data transceiver module is further configured to send the multiple key frames selected by the user to the server 220 .

一些示例性实施例中，所述第一数据收发模块还设置为，接收来自服务器220的渲染结果。In some exemplary embodiments, the first data transceiver module is further configured to receive a rendering result from the server 220 .

一些示例性实施例中，所述用户交互模块2130设置为，根据所述渲染结果，生成去抖动后的新视频。In some exemplary embodiments, the user interaction module 2130 is configured to generate a new video after de-jittering according to the rendering result.

一些示例性实施例中，所述服务器220还包括：第二数据收发模块，设置为接收来自客户端210的所述待处理视频及所述待处理视频对应的相机姿态信息。In some exemplary embodiments, the server 220 further includes: a second data transceiver module, configured to receive the video to be processed and the camera posture information corresponding to the video to be processed from the client 210.

一些示例性实施例中，所述第二数据收发模块还设置为，接收客户端210来自客户端所述选定的关键帧。In some exemplary embodiments, the second data transceiver module is further configured to receive the selected key frame from the client 210 .

一些示例性实施例中，所述第二数据收发模块还设置为，向客户端210发送所述渲染结果。In some exemplary embodiments, the second data transceiver module is further configured to send the rendering result to the client 210 .

一些示例性实施例中，所述平滑路径确定模块2220设置为，获取所述待处理视频中的多个关键帧；对所述多个关键帧所对应的相机姿态信息进行插值处理，得到与所述待处理视频对应的平滑的相机路径。In some exemplary embodiments, the smooth path determination module 2220 is configured to obtain a plurality of key frames in the video to be processed; and perform interpolation processing on the camera posture information corresponding to the plurality of key frames to obtain a smooth camera path corresponding to the video to be processed.

一些示例性实施例中，所述平滑路径确定模块2220设置为，对所述待处理视频中的每一帧图像，按照预设的迭代次数，利用与该帧图像相邻的多帧图像，对该帧图像对应的相机姿态信息进行校正；根据所述待处理视频中的每一帧图像对应的校正后的相机姿态信息，生成与所述待处理视频对应的平滑的相机路径。 In some exemplary embodiments, the smooth path determination module 2220 is configured to correct the camera posture information corresponding to each frame image in the video to be processed according to a preset number of iterations using multiple frame images adjacent to the frame image; and generate a smooth camera path corresponding to the video to be processed based on the corrected camera posture information corresponding to each frame image in the video to be processed.

本申请实施例还提供一种视频抖动去除方法，如图4所示，包括：The embodiment of the present application also provides a video jitter removal method, as shown in FIG4 , comprising:

步骤410，图像获取模块获取待处理视频；Step 410, the image acquisition module acquires the video to be processed;

步骤420，姿态信息获取模块通过SLAM算法，获取所述待处理视频对应的相机姿态信息；Step 420, the posture information acquisition module acquires the camera posture information corresponding to the video to be processed through the SLAM algorithm;

步骤430，第一数据收发模块向服务器发送所述待处理视频和所述待处理视频对应的相机姿态信息；Step 430: The first data transceiver module sends the video to be processed and the camera posture information corresponding to the video to be processed to the server;

步骤440，神经辐射场训练模块根据所述待处理视频和所述相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；Step 440, the neural radiation field training module trains the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

步骤450，用户交互模块获取多个关键帧，并通过第一数据收发模块发送给服务器；Step 450: the user interaction module obtains a plurality of key frames and sends them to the server through the first data transceiver module;

步骤460，平滑路径确定模块根据所述多个关键帧，进行插值处理得到平滑的相机路径；Step 460: the smooth path determination module performs interpolation processing according to the multiple key frames to obtain a smooth camera path;

步骤470，神经辐射场渲染模块根据所述平滑的相机路径利用所述训练后的神经辐射场对所述待处理视频的视频场景进行渲染，生成去除抖动后的新视频；Step 470: The neural radiation field rendering module renders the video scene of the video to be processed using the trained neural radiation field according to the smoothed camera path to generate a new video after jitter removal;

步骤480，通过第二收发模块发送所述新视频给客户端；Step 480, sending the new video to the client through the second transceiver module;

步骤490，用户交互模块显示所述新视频。Step 490: The user interaction module displays the new video.

一些示例性实施例中，步骤450-460替换为步骤451-461，如图5所示：In some exemplary embodiments, steps 450-460 are replaced by steps 451-461, as shown in FIG5 :

步骤451，平滑路径确定模块对所述待处理视频中的每一帧图像，按照预设的迭代次数，利用与该帧图像相邻的多帧图像，对该帧图像对应的相机姿态信息进行校正；Step 451, the smooth path determination module corrects the camera posture information corresponding to each frame image in the video to be processed by using multiple frames of images adjacent to the frame image according to a preset number of iterations;

步骤461，平滑路径确定模块根据所述待处理视频中的每一帧图像对应的校正后的相机姿态信息，生成与所述待处理视频对应的平滑的相机路径。Step 461: The smooth path determination module generates a smooth camera path corresponding to the video to be processed according to the corrected camera posture information corresponding to each frame image in the video to be processed.

可以看到，公开实施例提出的系统构架，由客户端执行视频获取和相机姿态信息获取的基础步骤，由服务器执行神经辐射场训练，相机姿态信息校正以及渲染生成新视频等对计算资源要求高的步骤，充分考虑到神经辐射场方案的实施特点，采用服务器与客户端相结合的分布式计算方案，提升了方案可行性和实用性，利用服务器的强大计算能力，避免了单独依靠视频拍摄设备本地执行方案可能面临的算力不足，而影响最终抖动去除效果。It can be seen that the system architecture proposed in the disclosed embodiment is that the client executes the basic steps of video acquisition and camera posture information acquisition, and the server executes the neural radiation field training, camera posture information correction and rendering to generate new videos and other steps that require high computing resources. Fully considering the implementation characteristics of the neural radiation field solution, a distributed computing solution combining the server and the client is adopted to improve the feasibility and practicality of the solution. The powerful computing power of the server is utilized to avoid the insufficient computing power that may be faced by relying solely on the local execution of the solution by the video capture device, which affects the final jitter removal effect.

本申请实施例还提供一种计算机存储介质，所述存储介质中存储有计算机程序，其中，所述计算机程序被设置为运行时执行如本申请任一实施例所述的视频抖动去除方法。An embodiment of the present application further provides a computer storage medium, in which a computer program is stored, wherein the computer program is configured to execute the video jitter removal method as described in any embodiment of the present application when running.

本申请实施例提供的视频抖动去除方案，采用了全新的神经辐射场方案，通过多层感知机对输入视频进行隐式建模，从而可以实现从新视点渲染出具有真实感的画面，克服一些抖动去除方案所引起的画幅损失，达到了良好的抖动去除效果。一些示例性实施例中，支持用户选定关键帧，再确定平滑的相机路径的方案，能够充分满足用户的视点设定需要。一些示例性实施例中，采用自动校正的方式，对待处理视频进行相机姿态进行校正，再进一步得到平滑的相机路径，显著提升了抖动去除效果。一些示例性实施例中，采用了服务器远程支撑的体系架构，由服务器承担神经辐射场训练、相机姿态校正以及渲染生成新视频的等对计算资源要求较高的计算任务，显著缓解了客户端的计算压力，确保了本申请方案的实用性。The video jitter removal solution provided in the embodiments of the present application adopts a new neural radiation field solution, and implicitly models the input video through a multi-layer perceptron, so that a realistic picture can be rendered from a new viewpoint, overcoming the frame loss caused by some jitter removal solutions, and achieving a good jitter removal effect. In some exemplary embodiments, the solution that supports users to select key frames and then determine a smooth camera path can fully meet the user's viewpoint setting needs. In some exemplary embodiments, an automatic correction method is used to correct the camera posture of the video to be processed, and then a smooth camera path is obtained, which significantly improves the jitter removal effect. In some exemplary embodiments, a server remote support system architecture is adopted, and the server undertakes computing tasks that require high computing resources such as neural radiation field training, camera posture correction, and rendering to generate new videos, which significantly alleviates the computing pressure of the client and ensures the practicality of the solution of the present application.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器，如数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。 Those skilled in the art will appreciate that all or some of the steps, systems, and functional modules/units in the above disclosed methods may be implemented as software, firmware, hardware, or a suitable combination thereof. In hardware implementations, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Components or all components can be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software can be distributed on a computer-readable medium, which can include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium). As known to those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. In addition, it is known to those of ordinary skill in the art that communication media generally contain computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium.

Claims

A video jitter removal method, comprising:

Obtaining a video to be processed and camera posture information corresponding to the video to be processed;

Training the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

Generate a smooth camera path according to the video to be processed and the camera posture information;

According to the smoothed camera path, the video scene of the video to be processed is rendered using the trained neural radiance field to generate a new video with the jitter removed.

The video jitter removal method according to claim 1, wherein the step of obtaining the video to be processed and the camera posture information corresponding to the video to be processed comprises:

The camera posture information corresponding to each frame image in the video to be processed is obtained by synchronously positioning and mapping the SLAM algorithm.

Acquire the initial camera posture information corresponding to each frame image in the video to be processed by synchronous positioning and mapping SLAM algorithm;

The initial camera posture information is optimized by using the structure from motion (SFM) algorithm to obtain the camera posture information corresponding to each frame of the video to be processed.

The video jitter removal method according to claim 1, wherein generating a smooth camera path according to the video to be processed and the camera posture information comprises:

Acquire multiple key frames in the video to be processed;

Interpolation processing is performed on the camera posture information corresponding to the multiple key frames to obtain a smooth camera path corresponding to the video to be processed.

The video jitter removal method according to claim 4, wherein the obtaining of a plurality of key frames in the video to be processed comprises:

Selecting, from the video to be processed, a frame image that meets a first condition as a key frame, wherein the first condition is that a change in attribute information of the frame image on the original camera path compared to attribute information of a previous frame image of the frame image on the original camera path is greater than a first threshold;

The original camera path is a camera path determined according to the camera posture information corresponding to the video to be processed.

For each frame image in the video to be processed, according to a preset number of iterations, using multiple frames of images adjacent to the frame image, correct the camera posture information corresponding to the frame image;

A smooth camera path corresponding to the video to be processed is generated according to the corrected camera posture information corresponding to each frame image in the video to be processed.

The video jitter removal method according to claim 6, wherein the step of correcting the camera posture information corresponding to each frame image in the video to be processed by using multiple frames of images adjacent to the frame image according to a preset number of iterations comprises:

For each frame of the video to be processed, the camera posture information corresponding to the frame is used as the initial camera posture information to be corrected. According to the preset number of iterations, in each round of iteration, the following steps are performed respectively:

Acquire image parameters of the adjacent multi-frame images, and correct the to-be-corrected camera posture information of the frame image according to the image parameters of the adjacent multi-frame images, wherein the image parameters include coordinates, orientations, and weights;

The corrected camera posture information is used as the camera posture information to be corrected for the next round of iteration.

The video jitter removal method according to any one of claims 1 to 7, wherein:

The training of the initial neural radiation field according to the video to be processed and the camera posture information includes:

Acquire a plurality of training samples according to the video to be processed and the camera posture information, wherein each training sample is composed of light emitted by a pixel point of a frame image in the video to be processed and a corresponding color;

The initial neural radiation field is trained according to the multiple training samples.

The video jitter removal method according to claim 8, wherein:

The light emitted by the pixel point of the frame image in the video to be processed is determined according to the camera posture information corresponding to the frame image where the pixel point is located and the position of the pixel point in the belonging frame image.

An electronic device comprising:

one or more processors;

a storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the video jitter removal method according to any one of claims 1 to 9.

A video jitter removal system, comprising:

Client and server;

The client is configured to obtain a video to be processed and camera posture information corresponding to the video to be processed, and send the video to be processed and the camera posture information to the server;

The server is configured to train the initial neural radiation field according to the video to be processed and the camera posture information to obtain a trained neural radiation field;

The server is further configured to generate a smooth camera path according to the video to be processed and the camera posture information;

The server is also configured to render the video scene of the video to be processed using the trained neural radiation field according to the smooth camera path to generate a new video with the jitter removed.

The video jitter removal system according to claim 11, wherein the server is further configured to, based on multiple key frames in the video to be processed, perform interpolation processing on camera posture information corresponding to the multiple key frames to obtain a smooth camera path corresponding to the video to be processed.

The video jitter removal system according to claim 11, wherein the client is further configured to obtain multiple key frames in the video to be processed and send the multiple key frames to the server.

The video jitter removal system according to claim 13, wherein the client comprises: a user interaction module, configured to receive user operation instructions and determine the selected multiple key frames from the video to be processed.

A computer storage medium having a computer program stored therein, wherein the computer program is configured to execute the video jitter removal method according to any one of claims 1 to 9 when running.