WO2025097814A1

WO2025097814A1 - New viewpoint image synthesis method and system, electronic device, and storage medium

Info

Publication number: WO2025097814A1
Application number: PCT/CN2024/103328
Authority: WO
Inventors: 朱捷; 林奶养; 吴优; 王斌; 王磊; 王进
Original assignee: ArcSoft Corp Ltd
Current assignee: ArcSoft Corp Ltd
Priority date: 2023-11-09
Filing date: 2024-07-03
Publication date: 2025-05-15
Anticipated expiration: 2026-05-09
Also published as: CN117576542A

Abstract

Disclosed are a new viewpoint image synthesis method and system, an electronic device, and a storage medium. The method comprises: acquiring a sample image and first camera pose information corresponding to the sample image; by means of a spatial matching method, optimizing the first camera pose information of the sample image to obtain second camera pose information corresponding to the sample image; on the basis of the sample image and the second camera pose information, training an initial neural radiance field to obtain a trained neural radiance field; and by means of the trained neural radiance field, performing new viewpoint rendering to obtain a new viewpoint image. According to the solution of the embodiments of the present application, a neural radiance field is trained on the basis of optimized camera pose information, thereby significantly improving the training effect; and by means of the trained neural radiance field, a more realistic new viewpoint synthesized image can be obtained.

Description

New viewpoint image synthesis method, system, electronic device and storage medium

本申请要求于2023年11月09日提交的、申请号为202311492488X、发明名称为“一种新视点图像合成方法、系统、设备和存储介质”的中国专利申请的优先权，其内容应理解为通过引用的方式并入本申请中。This application claims priority to the Chinese patent application filed on November 9, 2023, with application number 202311492488X, and invention name “A new viewpoint image synthesis method, system, device and storage medium”, the content of which should be understood as incorporated into this application by reference.

Technical Field

本文涉及但不限于图像处理技术领域。This article relates to but is not limited to the field of image processing technology.

Background Art

人工智能技术和硬件计算能力的不断的发展，给视频/图像处理技术带来了新的发展契机。突破实际拍摄视频/图像的视点约束，按需生成各种视点下的新图像，逐渐成为很多应用系统的必需功能。这种从输入视频/图像中不存在的视点下生成新视频/图像的技术，称为新视点合成技术。如何得到更具真实感的新图像，是新视点图像合成技术方案努力追求的重要方面。The continuous development of artificial intelligence technology and hardware computing power has brought new development opportunities to video/image processing technology. Breaking through the viewpoint constraints of actual video/image shooting and generating new images from various viewpoints on demand has gradually become a necessary function for many application systems. This technology of generating new videos/images from viewpoints that do not exist in the input video/image is called new viewpoint synthesis technology. How to obtain more realistic new images is an important aspect that new viewpoint image synthesis technology strives to pursue.

发明内容Summary of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

本申请实施例提供了一种新视点图像合成方法、系统、电子设备和存储介质，基于优化后的相机姿态信息进行神经辐射场训练，显著提升了训练效果，能够得到更具真实感的新视点合成图像。The embodiments of the present application provide a new viewpoint image synthesis method, system, electronic device and storage medium, which perform neural radiation field training based on optimized camera posture information, significantly improve the training effect, and can obtain a more realistic new viewpoint synthesized image.

本申请实施例提供一种新视点图像合成方法，包括：The present application provides a new viewpoint image synthesis method, including:

获取样本图像和所述样本图像对应的第一相机姿态信息；Acquire a sample image and first camera posture information corresponding to the sample image;

通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息；By using a spatial matching method, the first camera posture information of the sample image is optimized to obtain the second camera posture information corresponding to the sample image;

根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；Training the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field;

通过所述训练后的神经辐射场进行新视点渲染，得到新视点图像。The trained neural radiation field is used to perform new viewpoint rendering to obtain a new viewpoint image.

本申请实施例还提供一种新视点图像合成系统，包括：The present application also provides a new viewpoint image synthesis system, including:

客户端和服务器；Client and server;

所述客户端设置为，获取样本图像和所述样本图像对应的第一相机姿态信息；The client is configured to obtain a sample image and first camera posture information corresponding to the sample image;

所述服务器设置为，通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息；The server is configured to optimize the first camera posture information of the sample image through a spatial matching method to obtain the second camera posture information corresponding to the sample image;

所述服务器还设置为，根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场。The server is also configured to train the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field.

本申请实施例还提供一种电子设备，包括，一个或多个处理器；An embodiment of the present application further provides an electronic device, including one or more processors;

存储装置，用于存储一个或多个程序，a storage device for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本申请任一实施例所述的新视点图像合成方法。When the one or more programs are executed by the one or more processors, the one or more processors implement Now, a new viewpoint image synthesis method as described in any embodiment of the present application.

本申请实施例还提供一种计算机存储介质，所述存储介质中存储有计算机程序，其中，所述计算机程序被设置为运行时执行如本申请任一实施例所述的新视点图像合成方法。An embodiment of the present application further provides a computer storage medium, in which a computer program is stored, wherein the computer program is configured to execute the new viewpoint image synthesis method as described in any embodiment of the present application when running.

本申请实施例提供的新视点图像合成系统框架，采用客户端和服务器端相结合的体系架构，分布式进行算力部署，能够支持灵活的渲染执行部署，满足更多应用场景的功能要求。The new viewpoint image synthesis system framework provided in the embodiment of the present application adopts a system architecture that combines the client and the server, and deploys computing power in a distributed manner. It can support flexible rendering execution deployment and meet the functional requirements of more application scenarios.

本申请的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请而了解。本申请的其他优点可通过在说明书以及附图中所描述的方案来实现和获得。Other features and advantages of the present application will be described in the following description, and partly become apparent from the description, or be understood by implementing the present application. Other advantages of the present application can be realized and obtained by the schemes described in the description and the drawings.

在阅读并理解了附图和详细描述后，可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.

附图概述BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本申请技术方案的理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请技术方案的限制。The accompanying drawings are used to provide an understanding of the technical solution of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application and do not constitute a limitation on the technical solution of the present application.

图1为本申请实施例提供的一种新视点图像合成方法的流程图；FIG1 is a flow chart of a new viewpoint image synthesis method provided in an embodiment of the present application;

图2为本申请实施例提供的另一种新视点图像合成方法的流程图；FIG2 is a flow chart of another new viewpoint image synthesis method provided in an embodiment of the present application;

图3为本申请实施例提供的另一种新视点图像合成方法的流程图；FIG3 is a flow chart of another new viewpoint image synthesis method provided in an embodiment of the present application;

图4为本申请实施例提供的另一种新视点图像合成方法的流程图；FIG4 is a flow chart of another new viewpoint image synthesis method provided in an embodiment of the present application;

图5为本申请实施例提供的另一种新视点图像合成方法的流程图；FIG5 is a flow chart of another new viewpoint image synthesis method provided in an embodiment of the present application;

图6为本申请实施例提供的一种新视点图像合成系统结构图；FIG6 is a structural diagram of a new viewpoint image synthesis system provided in an embodiment of the present application;

图7为本申请实施例提供的另一种新视点图像合成系统结构图。FIG. 7 is a structural diagram of another new viewpoint image synthesis system provided in an embodiment of the present application.

详述Details

为使本申请的目的、技术方案和优点更加清楚明白，下文中将结合附图对本申请的实施例进行详细说明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present application more clear, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features in the embodiments of the present application can be combined with each other arbitrarily without conflict.

为了便于理解本申请，下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的实施例。但是，本申请可以以许多不同的形式来实现，并不限于本文所描述的实施例。相反地，提供这些实施例的目的是使本申请的公开内容更加透彻全面。In order to facilitate understanding of the present application, the present application will be described more fully below with reference to the relevant drawings. Embodiments of the present application are provided in the drawings. However, the present application can be implemented in many different forms and is not limited to the embodiments described herein. On the contrary, the purpose of providing these embodiments is to make the disclosure of the present application more thorough and comprehensive.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.

可以理解，本申请所使用的术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本申请的描述中，“多个”的含义是至少两个，例如两个、三个等，除非另有明确具体的限定。It is understood that the terms "first" and "second" used in this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality" is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

在此使用时，单数形式的“一”、“一个”和“所述/该”也可以包括复数形式，除非上下文清楚指出另外的方式。还应当理解的是，术语“包括/包含”或“具有”等指定所陈述的特征、整体、步骤、操作、组件、部分或它们的组合的存在，但是不排除存在或添加一个或更多个其他特征、整体、步骤、操作、组件、部分或它们的组合的可能性。同时，在本说明书中使用的术语“和/或”包括相关所列项目的任何及所有组合。As used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context so dictates. It should also be understood that the terms "include/comprise" or "have" etc. specify the presence of stated features, wholes, steps, operations, components, parts or combinations thereof, but do not exclude the possibility of the presence or addition of one or more other features, wholes, steps, operations, components, parts or combinations thereof. At the same time, the term "and/or" used in this specification includes any and all combinations of the relevant listed items.

本申请实施例提供一种新视点图像合成方法，如图1所示，包括：The present application embodiment provides a new viewpoint image synthesis method, as shown in FIG1 , comprising:

步骤110，获取样本图像和所述样本图像对应的第一相机姿态信息；Step 110, obtaining a sample image and first camera posture information corresponding to the sample image;

步骤120，通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息；Step 120, optimizing the first camera pose information of the sample image by a spatial matching method to obtain second camera pose information corresponding to the sample image;

步骤130，根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场；Step 130, training the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field;

步骤140，通过所述训练后的神经辐射场进行新视点渲染，得到新视点图像。Step 140, performing new viewpoint rendering using the trained neural radiation field to obtain a new viewpoint image.

一些示例性实施例中，步骤110包括：获取待处理图像，通过即时定位与地图构建方法，获取所述待处理图像对应的相机姿态信息；In some exemplary embodiments, step 110 includes: obtaining an image to be processed, and obtaining camera posture information corresponding to the image to be processed by a real-time positioning and mapping method;

根据视点多样性原则，选取所述待处理图像中的多帧图像作为所述样本图像，所述样本图像对应的相机姿态信息为第一相机姿态信息。According to the viewpoint diversity principle, multiple frames of images in the image to be processed are selected as the sample images, and the camera posture information corresponding to the sample images is the first camera posture information.

一些示例性实施例中，所述即时定位与地图构建方法包括：SLAM(即时定位与地图构建，Simultaneous Localization and Mapping)算法。In some exemplary embodiments, the simultaneous positioning and mapping method includes: a SLAM (Simultaneous Localization and Mapping) algorithm.

可以理解，一些示例性实施例中，所述待处理图像包括所拍摄视频中的多个帧图像，也称为多个图像帧，简称多帧。即通过拍摄视频的方式获取样本图像的最原始来源。可以是面向目标物体的拍摄视频，或者是环绕目标物体一周的拍摄视频，或者，其他针对目标物体的自由路径的拍摄视频，不限于特定的拍摄方式。It can be understood that in some exemplary embodiments, the image to be processed includes multiple frame images in the captured video, also referred to as multiple image frames, or multiple frames for short. That is, the original source of the sample image is obtained by capturing the video. It can be a video facing the target object, or a video circling the target object, or other videos captured on a free path to the target object, and is not limited to a specific shooting method.

一些示例性实施例中，所述根据视点多样性原则，选取所述待处理图像中的多帧图像作为所述样本图像，包括：In some exemplary embodiments, selecting multiple frames of images from the to-be-processed images as the sample images according to the viewpoint diversity principle includes:

根据所述待处理图像对应的相机姿态信息，按照视点多样性原则，从所述待处理图像中选取稀疏的多帧图像作为所述样本图像。According to the camera posture information corresponding to the image to be processed and in accordance with the viewpoint diversity principle, sparse multiple frame images are selected from the image to be processed as the sample images.

由于待处理图像一般包括多个视点下的多帧图像，根据每个帧图像对应的相机姿态信息，可以知晓其对应的视点。按照视点多样性原则选取关键帧，在全部帧图像对应的视点范围内，分散选取多个帧图像，作为关键帧，使所选取的帧图像对应的视点对全部视点的覆盖率超过第一设定比例。其中，从全部帧图像中选择第二设定比例的帧图像作为关键帧。第一设定比例和第二设定比例根据需要灵活设定。例如，从全部帧图像中选取40％(第二设定比例)的图像作为关键帧，这些关键帧对应的视点覆盖了全部帧图像视点范围的80％(第一设定比例)，任意两个关键帧对应的视点可以相同，或者，不同。Since the image to be processed generally includes multiple frames of images under multiple viewpoints, the corresponding viewpoint can be known according to the camera posture information corresponding to each frame of image. According to the principle of viewpoint diversity, key frames are selected, and multiple frame images are dispersedly selected as key frames within the viewpoint range corresponding to all frame images, so that the coverage rate of the viewpoint corresponding to the selected frame image to all viewpoints exceeds the first set ratio. Among them, frame images of the second set ratio are selected from all frame images as key frames. The first set ratio and the second set ratio are flexibly set as needed. For example, 40% (the second set ratio) of images are selected from all frame images as key frames, and the viewpoints corresponding to these key frames cover 80% (the first set ratio) of the viewpoint range of all frame images, and the viewpoints corresponding to any two key frames can be the same, or different.

可以理解，从待处理图像中按照视点多样性原则选取帧图像作为关键帧，可以减少样本数量，并保持样本的丰富性，能够在减小训练数据量的情况下，保证训练的有效性，确保训练效果。It can be understood that selecting frame images as key frames from the images to be processed according to the viewpoint diversity principle can reduce the number of samples and maintain the richness of the samples, which can ensure the effectiveness of training and the training effect while reducing the amount of training data.

一些示例性实施例中，所述通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息，包括：In some exemplary embodiments, the optimizing the first camera pose information of the sample image by a spatial matching method to obtain the second camera pose information corresponding to the sample image includes:

提取所述样本图像的特征点；Extracting feature points of the sample image;

根据所述第一相机姿态信息，建立初始地图；Establishing an initial map according to the first camera posture information;

根据所述初始地图，获取所述样本图像之间的空间距离； According to the initial map, obtaining the spatial distance between the sample images;

对所述空间距离在空间距离阈值以内的所述样本图像的特征点进行匹配，得到匹配结果；Matching the feature points of the sample image whose spatial distance is within the spatial distance threshold to obtain a matching result;

根据所述匹配结果，对所述第一相机姿态信息进行优化，得到所述第二相机姿态信息。According to the matching result, the first camera posture information is optimized to obtain the second camera posture information.

一些示例性实施例中，所述根据所述匹配结果，对所述第一相机姿态信息进行优化，得到所述第二相机姿态信息，包括：In some exemplary embodiments, optimizing the first camera posture information according to the matching result to obtain the second camera posture information includes:

通过光束法平差对所述第一相机姿态信息进行优化，得到所述第二相机姿态信息。The first camera posture information is optimized through bundle adjustment to obtain the second camera posture information.

样本图像包括所选取的多帧图像，每个帧图像对应一个第一相机姿态信息，根据这些第一相机姿态信息建立初始地图；针对每个帧图像分别提取特征点。The sample image includes a plurality of selected frame images, each frame image corresponds to a first camera posture information, and an initial map is established according to the first camera posture information; and feature points are extracted for each frame image.

根据所述样本图像的特征点和所述样本图像对应的第一相机姿态信息，建立初始地图；Establishing an initial map according to the feature points of the sample image and the first camera posture information corresponding to the sample image;

根据所述初始地图，获取所述样本图像之间的空间距离；According to the initial map, obtaining the spatial distance between the sample images;

通过光束法平差对所述初始姿态变换进行优化，得到所述相机姿态变换。The initial posture transformation is optimized through bundle adjustment to obtain the camera posture transformation.

样本图像包括所选取的多帧图像，针对每个帧图像分别提取特征点，每个帧图像对应一个第一相机姿态信息，根据这些特征点和这些第一相机姿态信息建立初始地图。The sample image includes a plurality of selected frame images, feature points are extracted for each frame image, each frame image corresponds to a first camera posture information, and an initial map is established according to the feature points and the first camera posture information.

可以理解，对所述空间距离在空间距离阈值以内的所述样本图像的特征点进行匹配，得到匹配结果，意味着在空间距离相近的关键帧之间进行特征点匹配，再采用光束法平差对所述第一相机姿态信息进行优化，可以减少重投影误差。相比于一些SFM(Structure from Motion，运动求取结构)算法中不考虑空间距离约束而进行特征点匹配后，将关键帧置入地图的方式，本申请实施例方案对空间距离在空间距离阈值以内的样本图像(关键帧)的特征点进行匹配后，再进行优化，不但提升了计算效果，也避免了由于重复纹理导致的距离较远的帧图像出现错误的匹配。It can be understood that matching the feature points of the sample images whose spatial distances are within the spatial distance threshold to obtain matching results means matching the feature points between key frames with similar spatial distances, and then optimizing the first camera posture information using bundle adjustment, which can reduce reprojection errors. Compared to some SFM (Structure from Motion) algorithms that do not consider spatial distance constraints and match feature points before placing key frames into a map, the embodiment of the present application matches the feature points of sample images (key frames) whose spatial distances are within the spatial distance threshold and then optimizes them, which not only improves the calculation effect, but also avoids erroneous matching of frame images at a distance due to repeated textures.

一些示例性实施例中，步骤130包括：In some exemplary embodiments, step 130 includes:

根据所述样本图像和所述第二相机姿态信息获取多个训练样本，其中，每个训练样本由所述样本图像的像素点所发射的光线及像素点对应的颜色构成；Acquire a plurality of training samples according to the sample image and the second camera posture information, wherein each training sample is composed of light emitted by a pixel of the sample image and a color corresponding to the pixel;

根据所述多个训练样本，对所述初始神经辐射场进行训练。The initial neural radiation field is trained according to the multiple training samples.

一些示例性实施例中，所述根据所述样本图像和所述第二相机姿态信息获取多个训练样本，包括：In some exemplary embodiments, the step of acquiring a plurality of training samples according to the sample image and the second camera posture information includes:

根据所述像素点所在的样本图像对应的第二相机姿态信息及所述像素点在所述样本图像中的位置，确定所述像素点所发射的光线。The light emitted by the pixel point is determined according to the second camera posture information corresponding to the sample image where the pixel point is located and the position of the pixel point in the sample image.

一些示例性实施例中，根据所述样本图像和所述第二相机姿态信息获取多个训练样本，包括：In some exemplary embodiments, obtaining a plurality of training samples according to the sample image and the second camera posture information includes:

将样本图像与第二相机姿态信息解析成每一个像素点发射的光线；Resolve the sample image and the second camera posture information into the light emitted by each pixel;

将每一个像素点发射的光线和该像素点的颜色，构成一个样本。The light emitted by each pixel and the color of the pixel form a sample.

一些示例性实施例中，一个样本图像的每一个像素点p的颜色为c，结合该样本图像对应的第二相机姿态信息和像素点位置，可以得到该像素点所发射的光线记为l(p,d)，其中p＝(x，y，z)为三维笛卡尔空间坐标系中该像素点的位置坐标，为球坐标系中该光线方向的立体角参数，其中，θ表示极角或纬度，它是从参考轴(通常是正z轴)到点的矢量与参考轴之间的夹角，θ的取值范围通常是0到π。表示方位角或经度，它是从参考平面上的某个基准方向(通常是正x轴)到点的投影与基准方向之间的夹角，的取值范围通常是0到2π。In some exemplary embodiments, the color of each pixel p of a sample image is c. Combining the second camera posture information and the pixel position corresponding to the sample image, the light emitted by the pixel can be obtained and recorded as l(p, d). Where p = (x, y, z) is the position coordinate of the pixel point in the three-dimensional Cartesian space coordinate system. is the solid angle parameter of the light direction in the spherical coordinate system, where θ represents the polar angle or latitude, which is the angle between the vector from the reference axis (usually the positive z-axis) to the point and the reference axis. The value range of θ is usually 0 to π. Represents azimuth or longitude, which is the angle between the projection of a reference direction (usually the positive x-axis) to a point on a reference plane and the reference direction. The value range of is usually 0 to 2π.

可以理解，一个样本图像包括多个像素点，对应得到多个由像素点所发射光线和颜色构成的多个训练样本，即一个像素点对应一个训练样本，所述多个样本图像得到更多的训练样本。It can be understood that a sample image includes multiple pixels, and correspondingly obtains multiple training samples consisting of light and color emitted by the pixels, that is, one pixel corresponds to one training sample, and the multiple sample images obtain more training samples.

一些示例性实施例中，根据所述多个训练样本，对所述初始神经辐射场进行训练，包括：In some exemplary embodiments, training the initial neural radiation field according to the plurality of training samples comprises:

将全部训练样本随机打乱后，对初始神经辐射场进行训练。After randomly shuffling all training samples, the initial neural radiation field is trained.

其中，每一个输入样本(l，c)，l为光线，c为颜色，在划分过后的空间中对坐标p进行空间变形并通过哈希编码查询到其位置，使用该结点的多层感知机对其进行编码。编码所得的特征f和d一起使用一个全局的多层感知机进行编码，输出颜色c_pred和差异度disp。For each input sample (l, c), l is light and c is color. In the divided space, the coordinate p is spatially deformed and its position is queried through hash coding. The multi-layer perceptron of the node is used to encode it. The encoded features f and d are encoded together using a global multi-layer perceptron to output the color c _pred and the difference disp.

一些示例性实施例中，进行训练所对应的损失函数 In some exemplary embodiments, the loss function corresponding to the training is

其中，n_r是样本数量。Where n _r is the sample size.

一些示例性实施例中，神经辐射场中的空间划分可以是基于均匀网格的空间划分，或者是基于八叉树的空间划分。In some exemplary embodiments, the spatial division in the neural radiation field may be a uniform grid-based spatial division or an octree-based spatial division.

一些示例性实施例中，所述对初始神经辐射场进行训练，包括以下步骤：空间划分、空间变形、哈希编码和训练；In some exemplary embodiments, the training of the initial neural radiation field comprises the following steps: space division, space deformation, hash coding and training;

一些示例性实施例中，空间划分包括：对于感兴趣区域通过八叉树进行空间划分，如果存在一个可见的相机到一个结点的距离小于其边长的λ倍(例如，λ取3)，则将该结点平均划分为八个子结点。重复这个过程直到每一个结点都不可再细分。每一个结点都包含一个多层感知机。In some exemplary embodiments, the spatial partitioning includes: performing spatial partitioning on the region of interest by using an octree, if there is a visible camera with a distance to a node that is less than λ times its side length (for example, λ is 3), then the node is evenly divided into eight child nodes. This process is repeated until each node cannot be further subdivided. Each node contains a multi-layer perceptron.

一些示例性实施例中，空间变形包括：对八叉树的每一个子结点进行空间变形；即为了更好地对空间进行表达，需要对八叉树的每一个子结点进行空间变形，包括：In some exemplary embodiments, the spatial deformation includes: performing spatial deformation on each child node of the octree; that is, in order to better express the space, it is necessary to perform spatial deformation on each child node of the octree, including:

空间中所有相机记为{C_i|i＝1…n_c}，n_c为相机数量，将三维空间中的点变形至相机投影空间的函数{y＝G(x)＝{C₁(x)…C_nc(x)}。在空间中均匀采样n_p个点{x_j|j＝1…n_p}，则通过y＝G(x)得到变形后的点{y_j|j＝1…n_p}。{y_j}的协方差矩阵的前三个特征向量组成的矩阵记为M，则得到了最终的空间变形函数F(x)＝M·G(x)。All cameras in the space are denoted as {C _i |i＝1…n _c }, where n _c is the number of cameras. The function that deforms a point in the three-dimensional space to the camera projection space is {y＝G(x)＝{C ₁ (x)…C _nc (x)}. Uniformly sample n _p points in the space {x _j |j＝1…n _p }, and then obtain the deformed point {y _j |j＝1…n _p } through y＝G(x). The matrix composed of the first three eigenvectors of the covariance matrix of {y _j } is denoted as M, and the final space deformation function F(x)＝M·G(x) is obtained.

通过空间划分和空间变形能够有效处理灵活拍摄方式下获取的多种样本图像，改善了一些方案中，需要对样本图像采集的拍摄角度进行较多约束的问题，使得在自由拍摄方式下所获取的图像，都能作为有效样本参与训练，提升训练效果。Through spatial division and spatial deformation, we can effectively process a variety of sample images obtained under flexible shooting methods, which improves the problem of some solutions requiring more constraints on the shooting angles of sample image acquisition. This makes it possible for images obtained under free shooting methods to participate in training as effective samples, thereby improving the training effect.

一些示例性实施例中，哈希编码包括：使用多重哈希函数对经过变形的空间进行编码，从而加速空间查询。In some exemplary embodiments, hash coding includes: using multiple hash functions to encode the deformed space, thereby accelerating spatial query.

一些示例性实施例中，训练包括：In some exemplary embodiments, the training includes:

每一个输入样本(l，c)，l为光线，c为颜色，在划分过后的空间中对坐标p进行空间变形并通过哈希编码查询到其位置，使用该结点的多层感知机对其进行编码。编码所得的特征f和d一起使用一个全局的多层感知机进行编码，输出颜色c_pred和差异度disp。 For each input sample (l, c), l is light and c is color. In the divided space, the coordinate p is spatially deformed and its position is queried through hash coding. The multi-layer perceptron of the node is used to encode it. The encoded features f and d are encoded together with a global multi-layer perceptron to output the color c _pred and the difference disp.

一些示例性实施例中，进行训练的损失函数
In some exemplary embodiments, the loss function for training is

其中，n_r是样本数量，f₀，f₁是一对在空间中随机采样得到的两个相邻八叉树结点提取的特征，n_b是采样数量；一些示例性实施例中，n_b设为10000。Wherein, n _r is the number of samples, f ₀ , f ₁ are features extracted from two adjacent octree nodes randomly sampled in space, and n _b is the number of samples; in some exemplary embodiments, n _b is set to 10,000.

一些示例性实施例中，神经辐射场中的空间变形可以是基于规范化设备坐标的空间变形，或者是基于透视投影坐标系的空间变形。In some exemplary embodiments, the spatial deformation in the neural radiation field may be a spatial deformation based on normalized device coordinates, or a spatial deformation based on a perspective projection coordinate system.

一些示例性实施例中，步骤140包括：根据待渲染视点信息，通过所述训练后的神经辐射场，得到所述新视点图像；In some exemplary embodiments, step 140 includes: obtaining the new viewpoint image through the trained neural radiance field according to the viewpoint information to be rendered;

其中，所述待渲染视点信息包括待渲染视点的相机姿态及预览图像分辨率，所述新视点图像的分辨率不小于所述预览图像分辨率。The viewpoint information to be rendered includes the camera posture of the viewpoint to be rendered and the preview image resolution, and the resolution of the new viewpoint image is not less than the preview image resolution.

一些示例性实施例中，根据待渲染视点信息，通过所述训练后的神经辐射场，得到所述新视点图像，包括：In some exemplary embodiments, obtaining the new viewpoint image through the trained neural radiance field according to the viewpoint information to be rendered includes:

根据待渲染视点信息和图像高度、宽度，计算出整个图像上每一个像素点的射线；Calculate the ray of each pixel on the entire image based on the viewpoint information to be rendered and the image height and width;

针对每一个像素点，通过光线前进方法，沿射线进行采样，对训练好的神经辐射场给出的采样点的颜色值进行积分，得到该像素点的颜色值。For each pixel, sampling is performed along the ray through the ray advancing method, and the color value of the sampling point given by the trained neural radiation field is integrated to obtain the color value of the pixel.

可以知晓，根据待渲染视点信息得到全部像素点的颜色值后，最终得到整幅图像，即根据待渲染视点信息渲染得到了新视点图像。It can be known that after the color values of all pixels are obtained according to the viewpoint information to be rendered, the entire image is finally obtained, that is, a new viewpoint image is rendered according to the viewpoint information to be rendered.

一些示例性实施例中，步骤140包括：根据所述训练后的神经辐射场，生成三维模型，其中，所述三维模型包括三维网格和纹理贴图；In some exemplary embodiments, step 140 includes: generating a three-dimensional model according to the trained neural radiation field, wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

通过三维渲染方法对所述三维模型进行渲染，得到所述新视点图像。The three-dimensional model is rendered by a three-dimensional rendering method to obtain the new viewpoint image.

可以理解，三维模型包括的三维网格和纹理贴图能够被用于通过三维管线渲染得到新视点图像。一些示例性实施例中，通过所述训练后的神经辐射场，得到的新视点图像，记为第一新视点图像；通过三维渲染方法对所述三维模型进行渲染，得到的新视点图像，记为第二新视点图像。It can be understood that the three-dimensional mesh and texture map included in the three-dimensional model can be used to obtain a new viewpoint image through three-dimensional pipeline rendering. In some exemplary embodiments, the new viewpoint image obtained by the trained neural radiation field is recorded as the first new viewpoint image; the new viewpoint image obtained by rendering the three-dimensional model through a three-dimensional rendering method is recorded as the second new viewpoint image.

一些示例性实施例中，根据所述训练后的神经辐射场，生成三维模型，包括：In some exemplary embodiments, generating a three-dimensional model according to the trained neural radiation field includes:

根据神经辐射场的训练结果，通过提取等值面算法，计算出近似神经辐射场的三角形网格G＝{V,E}；其中顶点V＝{v_i|i＝1…n_v}，边E＝{e_i|i＝1…n_e}，n_v为顶点数量，n_e为边数量。According to the training results of the neural radiation field, the triangular mesh G = {V, E} that approximates the neural radiation field is calculated by extracting the isosurface algorithm; where vertices V = { _vi |i = 1… _nv }, edges E = { _ei |i = 1… _ne }, _nv is the number of vertices, and _ne is the number of edges.

为每一个顶点v_i赋予偏移量Δv_i，权值向量w_i，每一张输入图像I_j的相机姿态p_i，在对应顶点v_i位置的颜色值为c_ij，则通过最小化以下能量函数求解出最优的Δv_i与c_i：
Assign an offset _Δvi and a weight vector _wi to each vertex _vi , a camera pose _pi of each input image _Ij , and a color value cij at the corresponding vertex _vi . Then, the optimal _Δvi and _cij are solved by minimizing the following _energy function:

其中v_l是v_i一邻域内相邻顶点的均值，n_v为顶点数量，m为图像数量。Where v _l is the mean of adjacent vertices in the neighborhood of _vi , n _v is the number of vertices, and m is the number of images.

根据优化的结果，更新三角形网格G的顶点，并根据优化得到的权值向量w_i生成三维模型所包括的纹理贴图；所述三角形网格G＝{V,E}即为三维模型所包括的三维网格。According to the optimization result, the vertices of the triangular mesh G are updated, and the texture map included in the three-dimensional model is generated according to the optimized weight vector _wi ; the triangular mesh G = {V, E} is the three-dimensional mesh included in the three-dimensional model.

本申请实施例还提供一种新视点图像合成方法，如图2所示，包括：The present application also provides a new viewpoint image synthesis method, as shown in FIG2 , including:

步骤210，第一客户端获取样本图像和所述样本图像对应的第一相机姿态信息，并发送给服务器；Step 210: The first client obtains a sample image and first camera posture information corresponding to the sample image, and sends Send to the server;

步骤220，服务器通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息；Step 220: The server optimizes the first camera pose information of the sample image by a spatial matching method to obtain second camera pose information corresponding to the sample image;

步骤230，服务器根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场。Step 230: The server trains the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field.

一些示例性实施例中，如图3所示，所述方法还包括：In some exemplary embodiments, as shown in FIG3 , the method further includes:

步骤2401，第二客户端获取待渲染视点信息，并发送至所述服务器；Step 2401: The second client obtains viewpoint information to be rendered and sends it to the server;

步骤2501，服务器根据所述待渲染视点信息，通过所述训练后的神经辐射场进行渲染，得到新视点图像，记为第一新视点图像，并将所述新视点图像发送给所述第二客户端。Step 2501: The server renders according to the viewpoint information to be rendered using the trained neural radiation field to obtain a new viewpoint image, recorded as the first new viewpoint image, and sends the new viewpoint image to the second client.

一些示例性实施例中，步骤2401包括：第二客户端获取待渲染视点信息和目标图像分辨率，并发送至所述服务器；In some exemplary embodiments, step 2401 includes: the second client obtains viewpoint information to be rendered and target image resolution, and sends them to the server;

相应地，步骤2501包括：服务器根据所述待渲染视点信息和目标图像分辨率，通过所述训练后的神经辐射场进行渲染，得到新视点图像，记为第一新视点图像，并将所述新视点图像发送给所述第二客户端。Correspondingly, step 2501 includes: the server renders according to the viewpoint information to be rendered and the target image resolution through the trained neural radiation field to obtain a new viewpoint image, recorded as the first new viewpoint image, and sends the new viewpoint image to the second client.

一些示例性实施例中，如图4所示，所述方法还包括：In some exemplary embodiments, as shown in FIG4 , the method further includes:

步骤260，服务器根据所述训练后的神经辐射场，生成三维模型，并将所述三维模型发送至第二客户端；其中，所述三维模型包括三维网格和纹理贴图；Step 260: the server generates a three-dimensional model according to the trained neural radiation field, and sends the three-dimensional model to the second client; wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

步骤270，第二客户端根据待渲染视点信息，通过三维渲染方法对所述三维模型进行渲染，得到新视点图像，记为第二新视点图像。Step 270: The second client renders the 3D model using a 3D rendering method according to the viewpoint information to be rendered, to obtain a new viewpoint image, which is recorded as a second new viewpoint image.

一些示例性实施例中，所述第一客户端和所述第二客户端是同一个客户端，或者，不同的客户端，不限于特定的方面。In some exemplary embodiments, the first client and the second client are the same client, or different clients, without limitation to a specific aspect.

一些示例性实施例中，步骤260还包括：服务器根据第二客户端的三维模型请求，将所述三维模型发送给所述第二客户端。In some exemplary embodiments, step 260 further includes: the server sending the three-dimensional model to the second client according to the three-dimensional model request of the second client.

一些示例性实施例中，如图5所示，所述方法包括：In some exemplary embodiments, as shown in FIG5 , the method includes:

步骤2401，第二客户端获取多个候选的待渲染视点信息；Step 2401: The second client obtains information of multiple candidate viewpoints to be rendered;

步骤2402，第二客户端将所述多个候选的待渲染视点信息和预览图像分辨率发送给服务器；Step 2402: The second client sends the plurality of candidate viewpoint information to be rendered and the preview image resolution to the server;

步骤2501，服务器根据多个候选的待渲染视点信息和预览图像分辨率，通过所述训练后的神经辐射场进行渲染，得到多个候选的新视点图像，并将所述多个候选的新视点图像发送给所述第二客户端；Step 2501: the server performs rendering through the trained neural radiance field according to multiple candidate viewpoint information to be rendered and the preview image resolution, obtains multiple candidate new viewpoint images, and sends the multiple candidate new viewpoint images to the second client;

步骤2502，第二客户端响应用户的选择指令，从所述多个候选的新视点图像中确定一个待渲染视点信息；Step 2502: The second client determines a viewpoint information to be rendered from the plurality of candidate new viewpoint images in response to the user's selection instruction;

步骤2503，第二客户端将所述待渲染视点信息和目标图像分辨率发送给服务器；Step 2503: The second client sends the viewpoint information to be rendered and the target image resolution to the server;

步骤2504，服务器根据所述待渲染视点信息和目标图像分辨率，通过所述训练后的神经辐射场进行渲染，得到新视点图像(记为第一新视点图像)，并将所述新视点图像发送给所述第二客户端；Step 2504: the server performs rendering through the trained neural radiance field according to the viewpoint information to be rendered and the target image resolution to obtain a new viewpoint image (referred to as the first new viewpoint image), and sends the new viewpoint image to the second client;

其中，所述目标图像分辨率大于或等于所述预览图像分辨率。The target image resolution is greater than or equal to the preview image resolution.

可以理解，根据该实施例方案，客户端可以先提交多个新视点的预览需求到服务器，根据对应的渲染结果，确定是否是自己所需要的新视点效果，在选定后，再由服务器按照更高的分辨率渲染生成最终的新视点图像。在预览阶段，为了提高服务器响应速度，减少服务器算力浪费和网络资源浪费，渲染生成分辨率更低的预览图像。It can be understood that according to this embodiment, the client can first submit preview requirements of multiple new viewpoints to the server, and determine whether it is the new viewpoint effect required by the client according to the corresponding rendering results. After the selection, the server will A higher resolution rendering generates the final new viewpoint image. In the preview stage, in order to improve the server response speed and reduce the waste of server computing power and network resources, a lower resolution preview image is rendered.

本申请实施例还提供一种新视点图像合成系统，如图6所示，包括：The embodiment of the present application further provides a new viewpoint image synthesis system, as shown in FIG6 , comprising:

客户端610和服务器620；Client 610 and server 620;

所述客户端610设置为，获取样本图像和所述样本图像对应的第一相机姿态信息；The client 610 is configured to obtain a sample image and first camera posture information corresponding to the sample image;

所述服务器620设置为，通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息；The server 620 is configured to optimize the first camera pose information of the sample image through a spatial matching method to obtain the second camera pose information corresponding to the sample image;

所述服务器620还设置为，根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场。The server 620 is further configured to train the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field.

一些示例性实施例中，所述客户端610还设置为，获取待渲染视点信息，并发送至所述服务器220；In some exemplary embodiments, the client 610 is further configured to obtain viewpoint information to be rendered and send it to the server 220;

所述服务器620还设置为，根据所述待渲染视点信息，通过所述训练后的神经辐射场进行渲染，得到新视点图像，记为第一新视点图像，并将所述新视点图像发送给所述客户端610。The server 620 is further configured to perform rendering according to the viewpoint information to be rendered using the trained neural radiation field to obtain a new viewpoint image, recorded as a first new viewpoint image, and send the new viewpoint image to the client 610 .

一些示例性实施例中，所述客户端610还设置为，显示所述新视点图像。In some exemplary embodiments, the client 610 is further configured to display the new viewpoint image.

可以理解，一些示例性实施例中，神经辐射场训练和第一新视点图像的渲染生成都在服务器端执行，可以充分利用服务器的强大运算能力，保证渲染效果，显著减小了客户端的运算压力。It can be understood that in some exemplary embodiments, neural radiation field training and rendering generation of the first new viewpoint image are both performed on the server side, which can fully utilize the powerful computing power of the server, ensure the rendering effect, and significantly reduce the computing pressure of the client.

一些示例性实施例中，所述服务器620还设置为，根据所述训练后的神经辐射场，生成三维模型，并将所述三维模型发送至所述客户端610，其中，所述三维模型包括三维网格和纹理贴图；In some exemplary embodiments, the server 620 is further configured to generate a three-dimensional model according to the trained neural radiation field, and send the three-dimensional model to the client 610, wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

所述客户端610还设置为，通过三维渲染方法对所述三维模型进行渲染，得到新视点图像，记为第二新视点图像。The client 610 is further configured to render the three-dimensional model by a three-dimensional rendering method to obtain a new viewpoint image, which is recorded as a second new viewpoint image.

一些示例性实施例中，所述客户端610还设置为，获取待渲染视点信息；根据所述待渲染视点信息，通过三维渲染方法对所述三维模型进行渲染，得到新视点图像，记为第二新视点图像。In some exemplary embodiments, the client 610 is further configured to obtain viewpoint information to be rendered; render the three-dimensional model by a three-dimensional rendering method according to the viewpoint information to be rendered, and obtain a new viewpoint image, which is recorded as a second new viewpoint image.

可以理解，一些示例性实施例中，神经辐射场训练在服务器端执行，第二新视点图像的渲染生成在客户端执行，可以克服一些应用场景下，客户端和服务器交互实时性较差，无法及时满足应用需要的情况下，可以采用客户端本地渲染生成新视点图像的方案。It can be understood that in some exemplary embodiments, neural radiation field training is performed on the server side, and the rendering and generation of the second new viewpoint image is performed on the client side. This can overcome the problem that in some application scenarios, the real-time interaction between the client and the server is poor and cannot meet the application needs in a timely manner. In this case, the client can use local rendering to generate a new viewpoint image.

需要说明的是，所述客户端包括一个或多个，提供样本图像的客户端与获取待渲染视点信息的客户端可以是同一个客户端，或者，是不同的客户端，不限于特定的方面。It should be noted that the client includes one or more clients, and the client that provides the sample image and the client that obtains the viewpoint information to be rendered can be the same client, or different clients, and is not limited to a specific aspect.

一些示例性实施例中，所述客户端610还设置为，获取多个候选的待渲染视点信息；将所述多个候选的待渲染视点信息和预览图像分辨率发送给服务器620；In some exemplary embodiments, the client 610 is further configured to obtain information of a plurality of candidate viewpoints to be rendered; send the information of the plurality of candidate viewpoints to be rendered and the preview image resolution to the server 620;

所述服务器620还设置为，根据多个候选的待渲染视点信息和预览图像分辨率，通过所述训练后的神经辐射场进行渲染，得到多个候选的新视点图像，并将所述多个候选的新视点图像发送给客户端610。The server 620 is also configured to render multiple candidate new viewpoint images based on multiple candidate viewpoint information to be rendered and preview image resolutions through the trained neural radiation field, and send the multiple candidate new viewpoint images to the client 610.

一些示例性实施例中，所述客户端610还设置为，响应用户的选择指令，从所述多个候选的新视点图像中确定一个待渲染视点信息；将所述待渲染视点信息和目标图像分辨率发送给服务器620； In some exemplary embodiments, the client 610 is further configured to, in response to a user's selection instruction, determine a viewpoint information to be rendered from the multiple candidate new viewpoint images; and send the viewpoint information to be rendered and the target image resolution to the server 620;

所述服务器620还设置为，根据所述待渲染视点信息和目标图像分辨率，通过所述训练后的神经辐射场进行渲染，得到新视点图像，记为第一新视点图像，并将所述新视点图像发送给所述客户端；The server 620 is further configured to perform rendering through the trained neural radiance field according to the viewpoint information to be rendered and the target image resolution, obtain a new viewpoint image, recorded as a first new viewpoint image, and send the new viewpoint image to the client;

本申请实施例还提供一种新视点图像合成系统，如图7所示，包括：客户端610和服务器620；The embodiment of the present application also provides a new viewpoint image synthesis system, as shown in FIG7 , including: a client 610 and a server 620;

所述客户端610包括：图像获取模块6110，SLAM模块6120和关键帧选取模块6130；所述服务器620包括：姿态优化模块6210，神经辐射场训练模块6220，神经辐射场渲染模块6230和三维模型生成模块6240；The client 610 includes: an image acquisition module 6110, a SLAM module 6120 and a key frame selection module 6130; the server 620 includes: a posture optimization module 6210, a neural radiation field training module 6220, a neural radiation field rendering module 6230 and a three-dimensional model generation module 6240;

其中，所述图像获取模块6110设置为，获取待处理图像；Wherein, the image acquisition module 6110 is configured to acquire an image to be processed;

所述SLAM模块6120设置为，采用SLAM算法获取所述待处理图像对应的相机姿态信息；The SLAM module 6120 is configured to use a SLAM algorithm to obtain camera posture information corresponding to the image to be processed;

关键帧选取模块6130设置为，根据视点多样性原则，选取所述待处理图像中的多帧图像作为所述样本图像，所述样本图像对应的相机姿态信息为第一相机姿态信息；发送所述样本图像和所述样本图像对应的第一相机姿态信息给服务器620；The key frame selection module 6130 is configured to select, according to the viewpoint diversity principle, multiple frames of images in the image to be processed as the sample images, and the camera posture information corresponding to the sample images is the first camera posture information; and send the sample images and the first camera posture information corresponding to the sample images to the server 620;

姿态优化模块6210设置为，通过空间匹配方法，对所述样本图像的第一相机姿态信息进行优化，得到所述样本图像对应的第二相机姿态信息。The posture optimization module 6210 is configured to optimize the first camera posture information of the sample image through a spatial matching method to obtain the second camera posture information corresponding to the sample image.

神经辐射场训练模块6220设置为，根据所述样本图像和所述第二相机姿态信息，对初始神经辐射场进行训练，得到训练后的神经辐射场。The neural radiation field training module 6220 is configured to train the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field.

一些示例性实施例中，所述待处理图像包括所拍摄视频中的多个帧图像，也称为多个图像帧，简称多帧。即通过拍摄视频的方式获取样本图像的最原始来源。可以是面向目标物体的拍摄视频，或者是环绕目标物体一周的拍摄视频，或者，其他针对目标物体的自由路径的拍摄视频，不限于特定的拍摄方式。In some exemplary embodiments, the image to be processed includes multiple frame images in the captured video, also referred to as multiple image frames, or multiple frames for short. That is, the original source of the sample image is obtained by capturing a video. It can be a video facing the target object, or a video circling the target object, or other videos captured on a free path to the target object, and is not limited to a specific shooting method.

一些示例性实施例中，所述客户端还包括：交互模块6140，所述服务器还包括：神经辐射场渲染模块6230；In some exemplary embodiments, the client further includes: an interaction module 6140, and the server further includes: a neural radiation field rendering module 6230;

其中，交互模块6140设置为，获取待渲染视点信息，并发送至所述服务器；The interaction module 6140 is configured to obtain viewpoint information to be rendered and send it to the server;

神经辐射场渲染模块6230设置为，根据所述待渲染视点信息，通过所述训练后的神经辐射场进行渲染，得到新视点图像，并将所述新视点图像发送给所述客户端；The neural radiation field rendering module 6230 is configured to perform rendering according to the viewpoint information to be rendered by using the trained neural radiation field to obtain a new viewpoint image, and send the new viewpoint image to the client;

交互模块6140还设置为，显示所述新视点图像。The interaction module 6140 is also configured to display the new viewpoint image.

一些示例性实施例中，所述客户端还包括：交互模块6140和实时渲染模块6150，所述服务器还包括：三维模型生成模块6240；In some exemplary embodiments, the client further includes: an interaction module 6140 and a real-time rendering module 6150, and the server further includes: a three-dimensional model generation module 6240;

其中，交互模块6140设置为，获取待渲染视点信息，并发送给所述实时渲染模块6150；The interaction module 6140 is configured to obtain viewpoint information to be rendered and send it to the real-time rendering module 6150;

三维模型生成模块6240设置为，根据所述训练后的神经辐射场，生成三维模型，并将所述三维模型发送至所述客户端，其中，所述三维模型包括三维网格和纹理贴图；The three-dimensional model generation module 6240 is configured to generate a three-dimensional model according to the trained neural radiation field, and send the three-dimensional model to the client, wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

实时渲染模块6150，根据待渲染视点信息，通过三维渲染方法对所述三维模型进行渲染，得到新视点图像；A real-time rendering module 6150 renders the three-dimensional model by a three-dimensional rendering method according to the viewpoint information to be rendered, to obtain a new viewpoint image;

存储装置，用于存储一个或多个程序， a storage device for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本申请任一实施例所述的新视点图像合成方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the new viewpoint image synthesis method as described in any embodiment of the present application.

可以看到，根据本申请实施例提供的新视点图像合成系统架构，以分布式部署方式，结合了服务器的算力优势进行神经辐射场模型训练，以及客户端的拍摄便利性和丰富性进行初始样本采集，避免了对客户端本身就并不强大的计算资源的占用，以高稳定性、合理的计算开销分布，使实时渲染的客户体验更佳。以客户端获取的第一相机姿态信息为基础，通过空间匹配方法，进行相机姿态优化后再进行训练，可以提高样本图像的相机姿态精度，减小匹配错误，提升了训练效果。一些示例性实施中提供了服务器渲染方案，另一些示例性实施例中提供了客户端基于三维模型的渲染方案，充分满足了灵活的渲染需求。It can be seen that according to the new viewpoint image synthesis system architecture provided in the embodiment of the present application, a distributed deployment method is used to combine the computing power advantage of the server for neural radiation field model training, and the shooting convenience and richness of the client for initial sample collection, thereby avoiding the occupation of computing resources that are not powerful on the client itself, and providing a better customer experience for real-time rendering with high stability and reasonable distribution of computing overhead. Based on the first camera posture information obtained by the client, the camera posture is optimized through a spatial matching method before training, which can improve the camera posture accuracy of the sample image, reduce matching errors, and improve the training effect. Some exemplary implementations provide a server rendering solution, and other exemplary embodiments provide a client-side rendering solution based on a three-dimensional model, which fully meets flexible rendering requirements.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器，如数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。 It will be appreciated by those skilled in the art that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium). As known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Claims

A new viewpoint image synthesis method, comprising:

Acquire a sample image and first camera posture information corresponding to the sample image;

By using a spatial matching method, the first camera posture information of the sample image is optimized to obtain the second camera posture information corresponding to the sample image;

Training the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field;

The trained neural radiation field is used to perform new viewpoint rendering to obtain a new viewpoint image.

The new viewpoint image synthesis method according to claim 1, wherein:

The obtaining of a sample image and first camera posture information corresponding to the sample image includes:

Acquire an image to be processed, and acquire camera posture information corresponding to the image to be processed by a real-time positioning and map building method;

According to the viewpoint diversity principle, multiple frames of images in the image to be processed are selected as the sample images, and the camera posture information corresponding to the sample images is the first camera posture information.

The new viewpoint image synthesis method according to claim 1, wherein:

The method of optimizing the first camera posture information of the sample image by a spatial matching method to obtain the second camera posture information corresponding to the sample image includes:

Extracting feature points of the sample image;

Establishing an initial map according to the first camera posture information;

According to the initial map, obtaining the spatial distance between the sample images;

Matching the feature points of the sample image whose spatial distance is within the spatial distance threshold to obtain a matching result;

According to the matching result, the first camera posture information is optimized to obtain the second camera posture information.

The new viewpoint image synthesis method according to claim 3, wherein:

The optimizing the first camera posture information according to the matching result to obtain the second camera posture information includes:

The first camera posture information is optimized through bundle adjustment to obtain the second camera posture information.

The new viewpoint image synthesis method according to claim 1, wherein:

The step of training the initial neural radiation field according to the sample image and the second camera posture information to obtain the trained neural radiation field includes:

Acquire a plurality of training samples according to the sample image and the second camera posture information, wherein each training sample is composed of light emitted by a pixel of the sample image and a color corresponding to the pixel;

The initial neural radiation field is trained according to the multiple training samples.

The new viewpoint image synthesis method according to claim 5, wherein the step of obtaining a plurality of training samples according to the sample image and the second camera posture information comprises:

The light emitted by the pixel point is determined according to the second camera posture information corresponding to the sample image where the pixel point is located and the position of the pixel point in the sample image.

The new viewpoint image synthesis method according to claim 1, wherein the performing new viewpoint rendering through the trained neural radiation field to obtain the new viewpoint image comprises:

According to the viewpoint information to be rendered, obtaining the new viewpoint image through the trained neural radiation field;

The viewpoint information to be rendered includes the camera posture of the viewpoint to be rendered and the preview image resolution, and the resolution of the new viewpoint image is not less than the preview image resolution.

Generate a three-dimensional model according to the trained neural radiation field, wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

The three-dimensional model is rendered by a three-dimensional rendering method to obtain the new viewpoint image.

A new viewpoint image synthesis system, comprising:

Client and server;

The client is configured to obtain a sample image and first camera posture information corresponding to the sample image;

The server is configured to optimize the first camera posture information of the sample image through a spatial matching method to obtain the second camera posture information corresponding to the sample image;

The server is also configured to train the initial neural radiation field according to the sample image and the second camera posture information to obtain a trained neural radiation field.

The new viewpoint image synthesis system according to claim 9, wherein:

The client is also configured to obtain viewpoint information to be rendered and send it to the server;

The server is also configured to perform rendering according to the viewpoint information to be rendered using the trained neural radiation field to obtain a new viewpoint image, and send the new viewpoint image to the client.

The new viewpoint image synthesis system according to claim 9 or 10, wherein:

The server is further configured to generate a three-dimensional model according to the trained neural radiation field, and send the three-dimensional model to the client, wherein the three-dimensional model includes a three-dimensional mesh and a texture map;

The client is also configured to render the three-dimensional model by a three-dimensional rendering method to obtain a new viewpoint image.

The new viewpoint image synthesis system according to claim 9, wherein:

The client is further configured to obtain information of a plurality of candidate viewpoints to be rendered; and send the information of the plurality of candidate viewpoints to be rendered and the preview image resolution to the server;

The server is also configured to render multiple candidate new viewpoint images based on multiple candidate viewpoint information to be rendered and preview image resolutions through the trained neural radiation field, and send the multiple candidate new viewpoint images to the client.

The new viewpoint image synthesis system according to claim 12, wherein:

The client is further configured to, in response to a user's selection instruction, determine a viewpoint information to be rendered from the multiple candidate new viewpoint images; and send the viewpoint information to be rendered and the target image resolution to the server;

The server is further configured to render the trained neural radiation field according to the viewpoint information to be rendered and the target image resolution to obtain a new viewpoint image, and send the new viewpoint image to the client;

The target image resolution is greater than or equal to the preview image resolution.

An electronic device comprising:

one or more processors;

a storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the new viewpoint image synthesis method as described in any one of claims 1 to 8.

A computer storage medium having a computer program stored therein, wherein the computer program is configured to execute the new viewpoint image synthesis method according to any one of claims 1 to 8 when running.