HK40091125A

HK40091125A - Method, apparatus, device and medium for generating video based on virtual reality

Info

Publication number: HK40091125A
Application number: HK42023080059.1A
Authority: HK
Inventors: 周鑫; 李锐; 李想
Original assignee: 腾讯科技（深圳）有限公司
Filing date: 2023-09-27
Publication date: 2023-12-01

Description

Virtual Reality-Based Video Generation Methods, Apparatus, Equipment, and Media

技术领域Technical Field

本申请涉及虚拟现实领域，特别涉及一种基于虚拟现实的视频生成方法、装置、设备及介质。This application relates to the field of virtual reality, and in particular to a video generation method, apparatus, device and medium based on virtual reality.

背景技术Background Technology

虚拟制片指计算机辅助制片和可视化电影制作方法。虚拟制片包括多种方式，比如，可视化(visualization)、表演捕捉(performance capture)、混合虚拟制作(hybridvirtual production)、实时LED墙镜头内虚拟制作(live LED wall in-camera)等。Virtual production refers to computer-aided production and visual filmmaking methods. Virtual production includes various approaches, such as visualization, performance capture, hybrid virtual production, and live LED wall in-camera production.

相关技术会在拍摄时，在演员、道具等现实前景的后方设置一面LED(LightEmitting Diode，发光二极管)墙，并在LED墙上投影实时的虚拟背景。现实摄像机会同时拍摄演员、道具、LED墙上的显示内容，并将拍摄到的内容输入到计算机中。计算机实时输出现实摄像机的拍摄内容。The technology involves placing an LED (Light Emitting Diode) wall behind actors, props, and other real-world foreground elements during filming, projecting a real-time virtual background onto the LED wall. A real-world camera simultaneously captures the actors, props, and the content displayed on the LED wall, inputting the captured content into a computer. The computer then outputs the footage from the real-world camera in real time.

但是现实摄像机采集到的每一帧图像的显示效果差，进而导致现实摄像机拍摄的视频效果较差。However, the display quality of each frame captured by a real camera is poor, which in turn leads to poor video quality when shot by a real camera.

发明内容Summary of the Invention

本申请实施例提供了一种基于虚拟现实的视频生成方法、装置、设备及介质，该方法会为虚拟背景更新深度信息，使得得到的视频更加自然，显示效果较好，技术方案如下：This application provides a video generation method, apparatus, device, and medium based on virtual reality. The method updates depth information for the virtual background, resulting in a more natural video with better display quality. The technical solution is as follows:

根据本申请的一个方面，提供了一种基于虚拟现实的视频生成方法，该方法包括：According to one aspect of this application, a video generation method based on virtual reality is provided, the method comprising:

从视频帧序列中获取目标视频帧，所述视频帧序列是由现实摄像机采集目标场景得到的，所述目标场景包括现实前景和虚拟背景，所述虚拟背景显示在现实环境中的物理屏幕上；The target video frame is obtained from a video frame sequence, which is obtained by capturing the target scene with a real camera. The target scene includes a real foreground and a virtual background, and the virtual background is displayed on a physical screen in the real environment.

获取所述目标视频帧的现实前景深度图和虚拟背景深度图，所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息，所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息；The real foreground depth map and virtual background depth map of the target video frame are obtained. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background mapped to the real environment to the real camera.

融合所述现实前景深度图和所述虚拟背景深度图，得到融合深度图，所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息；The real foreground depth map and the virtual background depth map are fused to obtain a fused depth map, which includes the depth information of each reference point in the target scene from the real environment to the real camera;

根据所述融合深度图调整所述目标视频帧的显示参数，生成所述目标视频帧的景深效果图；Adjust the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame;

基于所述目标视频帧的所述景深效果图，生成具有景深效果的目标视频。Based on the depth-of-field effect map of the target video frame, a target video with a depth-of-field effect is generated.

根据本申请的另一个方面，提供了一种基于虚拟现实的视频生成装置，该装置包括：According to another aspect of this application, a virtual reality-based video generation apparatus is provided, the apparatus comprising:

获取模块，用于从视频帧序列中获取目标视频帧，所述视频帧序列是由现实摄像机采集目标场景得到的，所述目标场景包括现实前景和虚拟背景，所述虚拟背景显示在现实环境中的物理屏幕上；The acquisition module is used to acquire target video frames from a video frame sequence, wherein the video frame sequence is obtained by capturing a target scene using a real camera, and the target scene includes a real foreground and a virtual background, wherein the virtual background is displayed on a physical screen in a real environment;

所述获取模块，还用于获取所述目标视频帧的现实前景深度图和虚拟背景深度图，所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息，所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息；The acquisition module is further configured to acquire the real foreground depth map and the virtual background depth map of the target video frame. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background to the real camera after being mapped onto the real environment.

融合模块，用于融合所述现实前景深度图和所述虚拟背景深度图，得到融合深度图，所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息；A fusion module is used to fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map, wherein the fused depth map includes the depth information of each reference point in the target scene from the real environment to the real camera;

更新模块，用于根据所述融合深度图调整所述目标视频帧的显示参数，生成所述目标视频帧的景深效果图；The update module is used to adjust the display parameters of the target video frame according to the fused depth map and generate a depth-of-field effect map of the target video frame;

所述更新模块，还用于基于所述目标视频帧的所述景深效果图，生成具有景深效果的目标视频。The update module is also used to generate a target video with a depth effect based on the depth effect map of the target video frame.

根据本申请的另一方面，提供了一种计算机设备，该计算机设备包括：处理器和存储器，存储器中存储有至少一条指令、至少一段程序、代码集或指令集，至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现如上方面的基于虚拟现实的视频生成方法。According to another aspect of this application, a computer device is provided, comprising: a processor and a memory, wherein the memory stores at least one instruction, at least one program, code set or instruction set, wherein the at least one instruction, at least one program, code set or instruction set is loaded and executed by the processor to implement the virtual reality-based video generation method as described above.

根据本申请的另一方面，提供了一种计算机存储介质，计算机可读存储介质中存储有至少一条程序代码，程序代码由处理器加载并执行以实现如上方面的基于虚拟现实的视频生成方法。According to another aspect of this application, a computer storage medium is provided, wherein at least one piece of program code is stored in the computer-readable storage medium, the program code being loaded and executed by a processor to implement the virtual reality-based video generation method as described above.

根据本申请的另一方面，提供了一种计算机程序产品或计算机程序，上述计算机程序产品或计算机程序包括计算机指令，上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令，上述处理器执行上述计算机指令，使得上述计算机设备执行如上方面的基于虚拟现实的视频生成方法。According to another aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the virtual reality-based video generation method described above.

本申请实施例提供的技术方案带来的有益效果至少包括：The beneficial effects of the technical solutions provided in this application include at least the following:

现实摄像机拍摄目标场景，生成视频帧序列。再根据视频帧序列获取目标视频帧，并对目标视频帧的深度信息进行更新，使得目标视频帧包括的深度信息更加准确，并基于目标视频帧生成具有景深效果的目标视频。由于虚拟背景的深度信息更加准确，使得虚拟背景和现实前景结合组成的视频更加自然，显示效果较好。A real-world camera captures the target scene, generating a sequence of video frames. The target video frames are then obtained from this sequence, and their depth information is updated to ensure greater accuracy. A target video with a depth-of-field effect is then generated based on these frames. Because the virtual background's depth information is more accurate, the video combining the virtual background and the real foreground appears more natural and has a better display effect.

附图说明Attached Figure Description

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

图1示出了本申请一个示例性实施例提供的计算机系统的示意图；Figure 1 shows a schematic diagram of a computer system provided in an exemplary embodiment of this application;

图2示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图；Figure 2 shows a schematic diagram of a virtual reality-based depth-of-field effect map generation method provided in an exemplary embodiment of this application;

图3示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图；Figure 3 shows a flowchart of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图4示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图；Figure 4 shows a schematic diagram of the interface of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图5示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图；Figure 5 shows a schematic diagram of the interface of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图6示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图；Figure 6 shows a schematic diagram of the interface of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图7示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图；Figure 7 shows a schematic diagram of the interface of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图8示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图；Figure 8 shows a flowchart of a virtual reality-based video generation method provided in an exemplary embodiment of this application;

图9示出了本申请一个示例性实施例提供的计算深度信息的示意图；Figure 9 illustrates a schematic diagram of the computational depth information provided in an exemplary embodiment of this application;

图10示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图；Figure 10 shows a schematic diagram of a virtual reality-based depth-of-field effect map generation method provided in an exemplary embodiment of this application;

图11示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成制作的示意图；Figure 11 shows a schematic diagram of video generation and production based on virtual reality provided in an exemplary embodiment of this application;

图12示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的界面示意图；Figure 12 shows a schematic diagram of the interface of a virtual reality-based depth-of-field effect map generation method provided in an exemplary embodiment of this application;

图13示出了本申请一个示例性实施例提供的计算机设备的示意图。Figure 13 shows a schematic diagram of a computer device provided in an exemplary embodiment of this application.

具体实施方式Detailed Implementation

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

景深(Depth Of field，DOF)：是指摄像机对焦点前后相对清晰的成像范围。在光学中，尤其是录影或是摄影，是一个描述在空间中，可以清楚成像的距离范围。摄像机使用的透镜只能够将光聚到某一固定的距离，远离此点的图像则会逐渐模糊，但是在某一段特定的距离内，图像模糊的程度是肉眼无法察觉的，这段特定的距离称之为景深。Depth of field (DOF): refers to the range of relatively sharp images in front of and behind the focal point of a camera. In optics, especially video recording or photography, it describes the range of distances in space where a clear image can be formed. Camera lenses can only focus light to a fixed distance; images further away from this point will gradually blur. However, within a certain specific distance, the degree of blur is imperceptible to the naked eye; this specific distance is called the depth of field.

现实前景：现实环境中的实物，一般包含演员以及周边真实堆叠场景。将靠近摄像机作为摄像机的前景。Realistic foreground: Physical objects in a real-world environment, typically including actors and surrounding real-world set elements. Objects close to the camera are used as the foreground for the camera.

虚拟背景：预先设计的虚拟环境。虚拟置景一般包含比较现实中不好做的场景以及类魔幻场景，经过程序引擎演算后输出到LED幕墙上，在现实置景后面作为摄像机的后景。Virtual background: A pre-designed virtual environment. Virtual sets generally include scenes that are difficult to create in reality, as well as fantasy-like scenes. After being processed by a program engine, the results are output to an LED screen, serving as the background for cameras behind the real-world set.

YUV：一种颜色编码方法，用在视频处理组件中。其中，“Y”表示明亮度(Luminance或Luma)，也就是灰阶值，“U”和“V”表示的则是色度(Chrominance或Chroma)，作用是描述色彩及饱和度，用于指定像素点的颜色。YUV: A color encoding method used in video processing components. "Y" represents luminance (or luma), which is the grayscale value, while "U" and "V" represent chrominance (or chroma), which describe color and saturation and are used to specify the color of a pixel.

内外参数：包括摄像机的内参数和外参数。内参数是与摄像机自身特性相关的参数，比如，内参数包括摄像机的焦距、像素大小等。外参数是摄像机在世界坐标系中的参数，比如，外参数包括摄像机的位置、旋转方向等。通过内外参数可将世界坐标系中的点映射到摄像机拍摄的像素点上。Intrinsic and extrinsic parameters: These include the camera's intrinsic and extrinsic parameters. Intrinsic parameters are those related to the camera's inherent characteristics, such as focal length and pixel size. Extrinsic parameters are the camera's parameters in the world coordinate system, such as position and rotation direction. Intrinsic and extrinsic parameters are used to map points in the world coordinate system to the pixels captured by the camera.

需要说明的是，本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号，均为经用户授权或者经过各方充分授权的，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, data stored, data displayed, etc.) and signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

在电影、电视等视频内容的制作过程中，会用到虚拟制片技术。相关技术会在视频拍摄场地中设置一块LED幕墙用于显示虚拟背景，并在LED幕墙的前方设置现实前景，现实前景指现实中的物体或生物。现实摄像机同时拍摄现实前景和虚拟背景，得到视频。Virtual production technology is used in the production of video content such as movies and television shows. This technology involves setting up an LED screen in the video shooting location to display a virtual background, and placing a real foreground—referring to real-world objects or creatures—in front of the LED screen. A camera simultaneously captures both the real foreground and the virtual background to obtain the video.

但是，相关技术中，现实摄像机只能获取到现实摄像机到LED幕墙的背景信息，获取不到虚拟背景的深度信息，导致中的虚拟背景缺乏真实感，且现实前景和虚拟背景的组合生硬。However, in related technologies, real cameras can only acquire background information from the real camera to the LED curtain wall, but cannot acquire depth information of the virtual background, resulting in the virtual background lacking realism and the combination of real foreground and virtual background being awkward.

图1示出了本申请一个示例性实施例提供的计算机系统的示意图。计算机系统100包括计算机设备110、物理屏幕120、现实摄像机130和深度摄像机140。Figure 1 illustrates a schematic diagram of a computer system provided in an exemplary embodiment of this application. The computer system 100 includes a computer device 110, a physical screen 120, a reality camera 130, and a depth camera 140.

计算机设备110上安装有视频制作的第一应用程序。第一应用程序可以是app(application，应用程序)中的小程序，也可以是专门的应用程序，也可以是网页客户端。计算机设备110上还安装有生成虚拟世界的第二应用程序。第一二应用程序可以是app中的小程序，也可以是专门的应用程序，也可以是网页客户端。第一应用程序和第二应用程序可以是同一个应用程序，也可以是不同的应用程序。在第一应用程序和第二应用程序是同一个应用程序的情况下，该应用程序可以同时实现视频制作和生成虚拟世界的功能。第一应用程序和第二应用程序是不同应用程序的情况下，第一应用程序和第二应用程序之间可实现数据互通。示例性的，第二应用程序将虚拟世界中各个虚拟物体到虚拟摄像机的深度信息给第一应用程序，第一应用程序根据前述深度信息对视频帧的图像进行更新。A first video editing application is installed on computer device 110. This first application can be a small program within an app, a dedicated application, or a web client. A second application for generating a virtual world is also installed on computer device 110. Both the first and second applications can be small programs within an app, dedicated applications, or web clients. The first and second applications can be the same application or different applications. When the first and second applications are the same application, this application can simultaneously perform video editing and virtual world generation functions. When the first and second applications are different applications, data exchange is possible between them. For example, the second application provides the first application with the depth information of each virtual object in the virtual world relative to the virtual camera, and the first application updates the image of the video frame based on the aforementioned depth information.

物理屏幕120用于显示虚拟背景。计算机设备110将虚拟世界的数据传输给物理屏幕120，物理屏幕120显示该虚拟世界作为虚拟背景。The physical screen 120 is used to display a virtual background. The computer device 110 transmits data of the virtual world to the physical screen 120, and the physical screen 120 displays the virtual world as a virtual background.

现实摄像机130用于拍摄现实前景150和物理屏幕120显示的虚拟背景。现实摄像机130会将拍摄到的视频传输给计算机设备110。现实摄像机130可以将拍摄到的视频实时传输给计算机设备110，现实摄像机130也可以每隔预设时长将拍摄到的视频传输给计算机设备110。The reality camera 130 is used to capture the real foreground 150 and the virtual background displayed on the physical screen 120. The reality camera 130 transmits the captured video to the computer device 110. The reality camera 130 can transmit the captured video to the computer device 110 in real time, or it can transmit the captured video to the computer device 110 at preset intervals.

深度摄像机140用于获取现实前景150的深度图。深度摄像机140和现实摄像机130的设置位置不同。深度摄像机140会将拍摄到的现实前景150的深度图传输给计算机设备110，计算机设备110根据该深度图确定现实前景150的深度信息。Depth camera 140 is used to acquire a depth map of the real foreground 150. Depth camera 140 and real camera 130 are positioned differently. Depth camera 140 transmits the captured depth map of the real foreground 150 to computer device 110, and computer device 110 determines the depth information of the real foreground 150 based on the depth map.

图2示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图。该方法可由图1所示的计算机系统100执行。Figure 2 illustrates a schematic diagram of a virtual reality-based depth-of-field rendering method provided in an exemplary embodiment of this application. This method can be executed by the computer system 100 shown in Figure 1.

如图2所示，现实摄像机210采集目标场景得到目标视频帧220。深度摄像机250也会采集目标场景得到深度图。通过像素点匹配的方法，匹配目标视频帧220和深度图中的像素点，将深度图中的深度信息提供给目标视频帧220，得到现实前景深度图260。另一方面，虚拟摄像机230获取与虚拟背景对应的渲染目标的深度信息，得到虚拟背景深度图240。融合现实前景深度图260和虚拟背景深度图240的深度信息，得到融合深度图270。根据融合深度图270为目标视频帧220增加景深效果，得到景深效果图280。As shown in Figure 2, the real-world camera 210 captures the target scene to obtain a target video frame 220. The depth camera 250 also captures the target scene to obtain a depth map. Using a pixel-matching method, pixels in the target video frame 220 and the depth map are matched, and the depth information from the depth map is provided to the target video frame 220 to obtain a real-world foreground depth map 260. On the other hand, the virtual camera 230 acquires the depth information of the rendered target corresponding to the virtual background to obtain a virtual background depth map 240. The depth information from the real-world foreground depth map 260 and the virtual background depth map 240 are fused to obtain a fused depth map 270. Based on the fused depth map 270, a depth-of-field effect is added to the target video frame 220 to obtain a depth-of-field effect map 280.

图3示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图。该方法可由图1所示的计算机系统100执行，该方法包括：Figure 3 illustrates a flowchart of a virtual reality-based video generation method provided in an exemplary embodiment of this application. The method can be executed by the computer system 100 shown in Figure 1, and includes:

步骤302：从视频帧序列中获取目标视频帧，视频帧序列是由现实摄像机采集目标场景得到的，目标场景包括现实前景和虚拟背景，虚拟背景显示在现实环境中的物理屏幕上。Step 302: Obtain the target video frame from the video frame sequence. The video frame sequence is obtained by capturing the target scene with a real camera. The target scene includes a real foreground and a virtual background. The virtual background is displayed on a physical screen in the real environment.

可选地，现实摄像机将拍摄到的视频帧序列传输给计算机设备。Alternatively, the camera transmits the captured video frame sequence to a computer device.

目标视频帧是视频帧序列中任意一帧的图像。示例性的，视频帧序列包括120帧图像，随机取其中的第45帧图像作为目标视频帧。可选地，按照预设帧率播放视频帧序列，生成视频。示例性的，在视频帧序列中每24帧图像构成1秒的视频。The target video frame is any image in the video frame sequence. For example, the video frame sequence includes 120 frames, and the 45th frame is randomly selected as the target video frame. Optionally, the video frame sequence is played at a preset frame rate to generate a video. For example, every 24 frames in the video frame sequence constitutes one second of video.

可选地，现实前景包括现实物体、现实生物、真人中的至少一种。示例性的，如图4所示，现实前景401是视频拍摄现场的演员。Optionally, the realistic foreground includes at least one of real objects, real creatures, and real people. For example, as shown in Figure 4, the realistic foreground 401 is an actor on a video shooting set.

虚拟背景指显示在显示屏上的虚拟内容。可选地，虚拟内容包括虚拟环境、虚拟人物、虚拟物体、虚拟道具、虚拟影像中的至少一种。本申请实施例对虚拟背景的显示内容不做具体限定。示例性的，如图4所示，虚拟背景402显示在显示屏403上。虚拟背景402是城市的虚拟影像。A virtual background refers to virtual content displayed on a screen. Optionally, virtual content includes at least one of virtual environment, virtual character, virtual object, virtual prop, and virtual image. This application embodiment does not specifically limit the content displayed on the virtual background. For example, as shown in FIG4, a virtual background 402 is displayed on a screen 403. The virtual background 402 is a virtual image of a city.

步骤304：获取目标视频帧的现实前景深度图和虚拟背景深度图，现实前景深度图包括现实前景到现实摄像机的深度信息，虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息。Step 304: Obtain the real foreground depth map and virtual background depth map of the target video frame. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background mapped to the real environment to the real camera.

可选地，通过深度摄像机提供的深度图和目标视频帧，确定现实前景深度图。示例性的，通过深度摄像机提供的深度图获取目标视频帧内各个像素点的深度信息，该深度信息用于表示目标视频帧中各个像素点对应的现实参考点到现实摄像机的距离；在第一像素点的深度值大于第一深度阈值的情况下，确定该第一像素点属于虚拟背景；在第二像素点的深度值大于第二深度阈值的情况下，确定该第二像素点属于现实前景。其中，第一深度阈值不小于第二深度阈值，第一深度阈值和第二深度阈值可由技术人员自行设置。Optionally, a real-world foreground depth map is determined using a depth map provided by a depth camera and a target video frame. For example, depth information of each pixel within the target video frame is obtained using the depth map provided by the depth camera. This depth information represents the distance from the real-world reference point corresponding to each pixel in the target video frame to the real-world camera. If the depth value of a first pixel is greater than a first depth threshold, the first pixel is determined to belong to the virtual background. If the depth value of a second pixel is greater than a second depth threshold, the second pixel is determined to belong to the real-world foreground. The first depth threshold is not less than the second depth threshold, and both the first and second depth thresholds can be set by a technician.

可选地，通过虚拟摄像机生成虚拟背景深度图，虚拟摄像机用于在虚拟环境中中拍摄与虚拟背景对应的渲目标。示例性的，计算机设备可获取到该渲染目标到虚拟摄像机的距离，将该距离转化为现实距离，得到虚拟背景深度图。Optionally, a virtual background depth map is generated using a virtual camera, which is used to capture a rendered target corresponding to the virtual background in a virtual environment. For example, a computer device can obtain the distance from the rendered target to the virtual camera, convert this distance into a real-world distance, and obtain the virtual background depth map.

示例性的，如图5所示，图5示出了现实前景深度图。现实前景深度图包括现实前景到现实摄像机的深度信息，且现实前景深度图不包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息。For example, as shown in Figure 5, a real-world foreground depth map is displayed. The real-world foreground depth map includes depth information from the real-world foreground to the real-world camera, but does not include depth information from the virtual background mapped onto the real-world environment to the real-world camera.

示例性的，如图6所示，图6示出了虚拟背景深度图。虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息，且虚拟背景深度图不包括现实前景到现实摄像机的深度信息。For example, as shown in Figure 6, a virtual background depth map is illustrated. The virtual background depth map includes depth information from the virtual background mapped onto the real environment to the real camera, but does not include depth information from the real foreground to the real camera.

步骤306：融合现实前景深度图和虚拟背景深度图，得到融合深度图，融合深度图包括目标场景内的各个参考点在现实环境中到现实摄像机的深度信息。Step 306: Fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map. The fused depth map includes the depth information of each reference point in the target scene from the real environment to the real camera.

可选地，根据虚拟背景深度图内各个像素点的第二深度信息，更新现实前景深度图内各个像素点的第一深度信息，得到融合深度图。Optionally, the first depth information of each pixel in the real foreground depth map is updated based on the second depth information of each pixel in the virtual background depth map to obtain a fused depth map.

其中，现实前景深度图包括与现实前景对应的前景区域和与虚拟背景对应的背景区域。本申请实施例可以对背景区域内的像素点的第一深度信息进行更新，也可以对前景区域内的像素点的第一深度信息进行更新。The real-world foreground depth map includes a foreground region corresponding to the real-world foreground and a background region corresponding to the virtual background. This application embodiment can update the first depth information of pixels within the background region, or it can update the first depth information of pixels within the foreground region.

示例性的，根据虚拟背景深度图内属于背景区域的第二像素点的第二深度信息，更新现实前景深度图内属于背景区域的第一像素点的第一深度信息，得到融合深度图。For example, based on the second depth information of the second pixel in the virtual background depth map that belongs to the background region, the first depth information of the first pixel in the real foreground depth map that belongs to the background region is updated to obtain a fused depth map.

示例性的，更新现实前景深度图内属于前景区域的第三像素点的第三深度信息，第三像素点是与目标物体对应的像素点。For example, update the third depth information of the third pixel point belonging to the foreground region within the real-world foreground depth map, where the third pixel point is the pixel point corresponding to the target object.

示例性的，如图7所示，融合深度图包括现实前景701和虚拟背景702的深度信息。对比图5和图7可得，相较于现实前景深度图，融合深度图还提供了虚拟背景702的深度信息。For example, as shown in Figure 7, the fused depth map includes depth information of the real foreground 701 and the virtual background 702. Comparing Figures 5 and 7, it can be seen that, compared to the real foreground depth map, the fused depth map also provides depth information of the virtual background 702.

步骤308：根据融合深度图调整目标视频帧的显示参数，生成目标视频帧的景深效果图。Step 308: Adjust the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame.

可选地，显示参数包括清晰度、亮度、灰度、对比度、饱和度中的至少一种。可选地，根据技术人员的实际需求调整目标视频帧的显示参数。示例性的，根据技术人员的实际需求，增加融合深度图中与虚拟背景对应的像素点的亮度。Optionally, the display parameters include at least one of sharpness, brightness, grayscale, contrast, and saturation. Optionally, the display parameters of the target video frame can be adjusted according to the actual needs of the technicians. For example, the brightness of the pixels corresponding to the virtual background in the fused depth map can be increased according to the actual needs of the technicians.

可选地，根据现实摄像机的预设光圈或预设焦距，确定距离区间，距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到现实摄像机的距离；根据融合深度图和距离区间，调整目标视频帧内各个像素点的清晰度，生成目标场景的景深效果图。Optionally, a distance range is determined based on the preset aperture or preset focal length of the real camera. The distance range represents the distance from the reference point corresponding to the pixel with a sharpness greater than the sharpness threshold to the real camera. Based on the fused depth map and the distance range, the sharpness of each pixel in the target video frame is adjusted to generate a depth-of-field effect map of the target scene.

可选地，根据现实摄像机的对焦距离和融合深度图，调整目标视频帧内虚拟背景对应的区域的清晰度，生成所述目标场景的所述景深效果图。Optionally, based on the focus distance and fusion depth map of the real camera, the sharpness of the area corresponding to the virtual background within the target video frame is adjusted to generate the depth-of-field effect map of the target scene.

可选地，根据预设条件调整目标视频帧内的各个像素点的清晰度。预设条件是由技术人员根据实际需求确定的。示例性的，调整目标视频帧中预设区域内的像素点的清晰度，预设区域可由技术人员自行设置。Optionally, the sharpness of each pixel within the target video frame can be adjusted according to preset conditions. These preset conditions are determined by technicians based on actual needs. For example, the sharpness of pixels within a preset area in the target video frame can be adjusted; this preset area can be set by the technicians themselves.

步骤310：基于目标视频帧的景深效果图，生成具有景深效果的目标视频。Step 310: Generate a target video with depth-of-field effect based on the depth-of-field effect map of the target video frame.

可选地，按照预设帧率播放目标视频帧的景深效果图，得到具有景深效果的目标视频。Optionally, the depth-of-field effect map of the target video frame is played at a preset frame rate to obtain a target video with a depth-of-field effect.

示例性的，目标视频帧包括至少两个视频帧，按照时间顺序排列目标视频帧的景深效果图，得到具有景深效果的目标视频。可选地，按照时间顺序排列连续的目标视频真的景深效果图，生成具有景深效果的目标视频。For example, the target video frame includes at least two video frames. The depth-of-field effect maps of the target video frames are arranged in chronological order to obtain a target video with a depth-of-field effect. Optionally, the depth-of-field effect maps of consecutive target video frames are arranged in chronological order to generate a target video with a depth-of-field effect.

综上所述，本申请实施例中，现实摄像机拍摄目标场景，生成视频帧序列。再根据视频帧序列获取目标视频帧，并对目标视频帧的深度信息进行更新，使得目标视频帧包括的深度信息更加准确，并基于目标视频帧生成具有景深效果的目标视频。由于虚拟背景的深度信息更加准确，使得虚拟背景和现实前景结合组成的视频更加自然，显示效果较好。In summary, in this embodiment, a real-world camera captures the target scene, generating a video frame sequence. Then, a target video frame is obtained from the video frame sequence, and its depth information is updated to make the depth information more accurate. A target video with a depth-of-field effect is then generated based on the target video frame. Because the depth information of the virtual background is more accurate, the video composed of the virtual background and the real foreground is more natural and has a better display effect.

此外，景深效果图是通过现实摄像机的对焦距离、焦距、光圈等参数生成的，因此，景深效果图中的虚拟背景可以仿真出现实摄像机的拍摄效果，虚拟背景的显示更加自然，虚拟背景更加接近现实物体。Furthermore, the depth-of-field effect map is generated using parameters such as the focusing distance, focal length, and aperture of a real camera. Therefore, the virtual background in the depth-of-field effect map can simulate the shooting effect of a real camera, making the virtual background display more natural and closer to real objects.

在接下来的实施例中，对现实前景深度图中背景区域的深度信息进行更新，使得背景区域的深度信息更加准确。以现实置景深度图包括两种可选的实施方式为例，现实置景深度图可以通过设置一个深度摄像机获得，也可以通过设置一个辅助摄像机获得。而且，虚拟背景的像素点的清晰度可以通过现实摄像机的参数来进行更新，比如，通过对焦距离、光圈、焦距等对像素点的清晰度进行更新。In the following embodiments, the depth information of the background region in the real-world foreground depth map is updated to make the depth information of the background region more accurate. Taking the real-world scene depth map as an example, which includes two optional implementation methods, the real-world scene depth map can be obtained by setting up a depth camera or by setting up an auxiliary camera. Furthermore, the sharpness of the virtual background pixels can be updated using parameters of the real-world camera, such as focusing distance, aperture, and focal length.

图8示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图。方法可由图1所示的计算机系统100执行，该方法包括：Figure 8 illustrates a flowchart of a virtual reality-based video generation method provided in an exemplary embodiment of this application. The method can be executed by the computer system 100 shown in Figure 1, and includes:

步骤801：从视频帧序列中获取目标视频帧。Step 801: Obtain the target video frame from the video frame sequence.

目标视频帧是视频帧序列中任意一帧的图像。示例性的，视频帧序列包括120帧图像，随机取其中的第45帧图像作为目标视频帧。可选地，按照预设帧率播放视频帧序列，生成具有景深效果的目标视频。示例性的，在视频帧序列中每24帧图像构成1秒的视频。The target video frame is any image in the video frame sequence. For example, the video frame sequence includes 120 frames, and the 45th frame is randomly selected as the target video frame. Optionally, the video frame sequence is played at a preset frame rate to generate a target video with a depth-of-field effect. For example, every 24 frames in the video frame sequence constitute one second of video.

可选地，现实前景包括现实物体、现实生物、真人中的至少一种。Optionally, the realistic foreground includes at least one of a real object, a real creature, or a real person.

虚拟背景指显示在显示屏上的虚拟内容。可选地，虚拟内容包括虚拟环境、虚拟人物、虚拟物体、虚拟道具、虚拟影像中的至少一种。本申请实施例对虚拟背景的显示内容不做具体限定。A virtual background refers to virtual content displayed on a screen. Optionally, virtual content includes at least one of virtual environments, virtual characters, virtual objects, virtual props, and virtual images. This application does not specifically limit the content displayed as a virtual background.

步骤802：获取目标视频帧的现实前景深度图。Step 802: Obtain the real foreground depth map of the target video frame.

在一种可选的实现方式中，通过深度摄像机获取现实前景的深度信息，该方法可包括以下步骤：In one alternative implementation, depth information of the real foreground is acquired using a depth camera, and the method may include the following steps:

1、根据深度摄像机的内外参数和现实摄像机的内外参数，生成深度摄像机与现实摄像机之间的空间偏移信息。1. Generate spatial offset information between the depth camera and the real camera based on the intrinsic and extrinsic parameters of the depth camera and the real camera.

可选地，深度摄像机的内外参数包括内参数和外参数。内参数是与深度摄像机自身特性相关的参数，内参数包括焦距、像素大小等。外参数是深度摄像机在世界坐标系中的参数，外参数包括相机的位置、旋转方向等。Optionally, the intrinsic and extrinsic parameters of the depth camera include intrinsic parameters and extrinsic parameters. Intrinsic parameters are parameters related to the characteristics of the depth camera itself, such as focal length and pixel size. Extrinsic parameters are parameters of the depth camera in the world coordinate system, such as camera position and rotation direction.

可选地，现实摄像机的内外参数包括内参数和外参数。内参数是与现实摄像机自身特性相关的参数，内参数包括焦距、像素大小等。外参数是现实摄像机在世界坐标系中的参数，外参数包括相机的位置、旋转方向等。Optionally, the intrinsic and extrinsic parameters of the real-world camera include intrinsic parameters and extrinsic parameters. Intrinsic parameters are those related to the characteristics of the real-world camera itself, such as focal length and pixel size. Extrinsic parameters are those of the real-world camera in the world coordinate system, such as the camera's position and rotation direction.

在一种可选地实施方式中，空间偏移信息指深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系。示例性的，根据深度摄像机的内外参数，确定深度摄像机的深度映射关系，深度映射关系指深度摄像机的相机坐标系到现实坐标系的映射关系；根据现实摄像机的内外参数，确定现实摄像机的现实映射关系，现实映射关系指现实摄像机的相机坐标系到现实坐标系的映射关系；通过现实坐标系，确定深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系，得到深度摄像机与现实摄像机之间的空间偏移信息。In one optional implementation, the spatial offset information refers to the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real-world camera. For example, based on the intrinsic and extrinsic parameters of the depth camera, the depth mapping relationship of the depth camera is determined, which refers to the mapping relationship from the camera coordinate system of the depth camera to the real-world coordinate system; based on the intrinsic and extrinsic parameters of the real-world camera, the real-world mapping relationship of the real-world camera is determined, which refers to the mapping relationship from the camera coordinate system of the real-world camera to the real-world coordinate system; through the real-world coordinate system, the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real-world camera is determined, thus obtaining the spatial offset information between the depth camera and the real-world camera.

可选地，深度摄像机和现实摄像机拍摄目标场景的角度不同。可选地，深度摄像机和现实摄像机的设置位置不同。Optionally, the depth camera and the reality camera may capture the target scene from different angles. Optionally, the depth camera and the reality camera may be positioned differently.

2、获取深度摄像机采集的深度图。2. Obtain the depth map captured by the depth camera.

可选地，深度摄像机将采集的深度图通过有线连接或无线连接传输给计算机设备。Optionally, the depth camera transmits the acquired depth map to a computer device via a wired or wireless connection.

3、根据空间偏移信息，将深度图的深度信息映射到目标视频帧上，得到现实前景深度图。3. Based on the spatial offset information, the depth information of the depth map is mapped onto the target video frame to obtain the real foreground depth map.

由于空间偏移信息指深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系。因此，可以得到深度图和目标视频帧上的像素点之间的对应关系；根据该对应关系，将深度图上各个像素点的深度信息映射到目标视频帧上的各个像素点上，得到现实前景深度图。Since spatial offset information refers to the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real camera, the correspondence between the depth map and the pixels on the target video frame can be obtained. Based on this correspondence, the depth information of each pixel on the depth map is mapped to each pixel on the target video frame to obtain the real foreground depth map.

在一种可选的实现方式中，通过另一个参考摄像机获取现实前景的深度信息，该方法可包括以下步骤：In one alternative implementation, depth information of the real foreground is acquired via another reference camera, and the method may include the following steps:

1、根据参考摄像机的内外参数，获取第一映射关系，第一映射关系用于表示参考摄像机的摄像机坐标系和现实坐标系之间的映射关系，参考摄像机用于从第二角度拍摄目标场景，第二角度与第一角度不同。1. Based on the intrinsic and extrinsic parameters of the reference camera, obtain the first mapping relationship. The first mapping relationship is used to represent the mapping relationship between the camera coordinate system of the reference camera and the real coordinate system. The reference camera is used to capture the target scene from a second angle, which is different from the first angle.

可选地，参考摄像机的内外参数包括内参数和外参数。内参数是与参考摄像机自身特性相关的参数，内参数包括焦距、像素大小等。外参数是参考摄像机在世界坐标系中的参数，外参数包括相机的位置、旋转方向等。Optionally, the intrinsic and extrinsic parameters of the reference camera include intrinsic parameters and extrinsic parameters. Intrinsic parameters are parameters related to the characteristics of the reference camera itself, such as focal length and pixel size. Extrinsic parameters are parameters of the reference camera in the world coordinate system, such as camera position and rotation direction.

可选地，第一映射关系还用于表示参考摄像机的拍摄的参考图像上的像素点与现实点的位置对应关系。例如，参考图像上像素点A的坐标是(x1，x2)，第一映射关系满足函数关系f，函数关系f是根据参考摄像机的内外参数生成的，而在现实环境中与像素点A对应的现实点是(y1，y2，y3)＝f(x1，x2)。Optionally, the first mapping relationship is also used to represent the positional correspondence between pixels on a reference image captured by a reference camera and real-world points. For example, the coordinates of pixel A on the reference image are (x1, x2), the first mapping relationship satisfies the function f, which is generated based on the intrinsic and extrinsic parameters of the reference camera, while the real-world point corresponding to pixel A in the real environment is (y1, y2, y3) = f(x1, x2).

2、根据现实摄像机的内外参数，获取第二映射关系，第二映射关系用于表示现实摄像机的摄像机坐标系和现实坐标系之间的映射关系。2. Based on the intrinsic and extrinsic parameters of the real camera, obtain the second mapping relationship, which is used to represent the mapping relationship between the camera coordinate system and the real coordinate system.

可选地，第一映射关系还用于表示现实摄像机的拍摄的目标视频帧上的像素点与现实点的位置对应关系。例如，参考图像上像素点B的坐标是(x3，x4)，第一映射关系满足函数关系函数关系是根据现实摄像机的内外参数生成的，而在现实环境中与像素点B对应的现实点是Optionally, the first mapping relationship is also used to represent the positional correspondence between pixels on the target video frame captured by the real camera and real-world points. For example, if the coordinates of pixel B on the reference image are (x3, x4), the first mapping relationship satisfies a functional relationship generated based on the intrinsic and extrinsic parameters of the real camera, while the real-world point corresponding to pixel B in the real environment is...

3、根据第一映射关系对参考摄像机拍摄的参考图像进行重构，得到重构参考图像。3. Reconstruct the reference image captured by the reference camera according to the first mapping relationship to obtain the reconstructed reference image.

可选地，根据第一映射关系将参考图像上的各个像素点映射到现实环境中，得到重构参考图像。其中，重构参考图像包括参考图像中的各个像素点对应的参考点在现实环境中的位置。Optionally, each pixel in the reference image is mapped to the real-world environment according to the first mapping relationship to obtain a reconstructed reference image. The reconstructed reference image includes the positions of reference points corresponding to each pixel in the reference image in the real-world environment.

4、根据第二映射关系对现实摄像机拍摄的目标视频帧进行重构，得到重构目标场景图像。4. Reconstruct the target video frames captured by the real camera according to the second mapping relationship to obtain the reconstructed target scene image.

可选地，根据第二映射关系将目标视频帧上的各个像素点映射到现实环境中，得到重构目标场景图像。其中，重构目标场景图像包括目标视频帧中的各个像素点对应的参考点在现实环境中的位置。Optionally, each pixel in the target video frame is mapped to the real-world environment according to the second mapping relationship to obtain a reconstructed target scene image. The reconstructed target scene image includes the positions of reference points corresponding to each pixel in the target video frame in the real-world environment.

5、根据重构参考图像和重构目标场景图像之间的视差，确定目标视频帧内各个像素点的深度信息，得到现实前景深度图。5. Based on the parallax between the reconstructed reference image and the reconstructed target scene image, determine the depth information of each pixel within the target video frame to obtain the real foreground depth map.

可选地，为方便计算视差，将重构参考图像和重构目标场景图像映射到同一平面上，确定重构参考图像和重构目标场景图像上对应同一个现实点的两个像素点；根据前述两个像素点的视差确定目标视频帧内各个像素点的深度信息，得到现实前景深度图。Optionally, to facilitate the calculation of disparity, the reconstructed reference image and the reconstructed target scene image are mapped onto the same plane, and two pixels corresponding to the same real point on the reconstructed reference image and the reconstructed target scene image are determined; the depth information of each pixel in the target video frame is determined based on the disparity of the aforementioned two pixels, and a real foreground depth map is obtained.

示例性的，如图9所示，假设现实摄像机和参考摄像机位于同一平面上，预先将重构参考图像和重构目标场景图像映射到X轴所在平面上，使得参考点903到X轴的距离与参考点903到X轴的距离相同。参考点903通过现实摄像机的中心点904在目标视频帧上形成像素点901，参考点903通过参考摄像机的中心点905在参考图像上形成像素点902。在图9中，f是现实摄像机和参考摄像机的焦距，现实摄像机和参考摄像机的焦距相同。z是参考点903到现实摄像机的距离，即参考点903到现实摄像机的深度信息。x是参考点903到Z轴的距离。x1是像素点901在目标视频帧上的位置。xr是像素点902在参考图像上的位置。则通过图9中的相似三角形可得到以下等式：For example, as shown in Figure 9, assuming the real camera and the reference camera are located on the same plane, the reconstructed reference image and the reconstructed target scene image are pre-mapped onto the plane containing the X-axis, such that the distance from reference point 903 to the X-axis is the same as the distance from reference point 903 to the X-axis. Reference point 903 forms pixel 901 on the target video frame through the center point 904 of the real camera, and reference point 903 forms pixel 902 on the reference image through the center point 905 of the reference camera. In Figure 9, f is the focal length of the real camera and the reference camera, and the focal lengths of the real camera and the reference camera are the same. z is the distance from reference point 903 to the real camera, that is, the depth information from reference point 903 to the real camera. x is the distance from reference point 903 to the Z-axis. x1 is the position of pixel 901 on the target video frame. xr is the position of pixel 902 on the reference image. Then, the following equation can be obtained through the similar triangles in Figure 9:

因此，根据上述等式可得到z＝f*b/(x1-xr)。其中，(x1-xr)即为视差。Therefore, according to the above equation, we can obtain z = f*b/(x1-xr). Where (x1-xr) is the parallax.

需要说明的是，本申请实施例对获取目标视频帧的现实前景深度图的方法不做具体限定。除上述两种可选方式，技术人员可根据实际需求选择其他方式来获取目标视频帧的现实前景深度图，这里不再赘述。It should be noted that the embodiments of this application do not specifically limit the method for obtaining the real foreground depth map of the target video frame. In addition to the two optional methods mentioned above, those skilled in the art can choose other methods to obtain the real foreground depth map of the target video frame according to actual needs, which will not be elaborated here.

步骤803：获取目标视频帧的虚拟背景深度图。Step 803: Obtain the virtual background depth map of the target video frame.

在一种可选的实现方式中，通过虚拟摄像机获取到虚拟背景深度图，该方法可包括以下步骤：In one alternative implementation, a virtual background depth map is acquired using a virtual camera. This method may include the following steps:

1、获取虚拟环境中与虚拟背景对应的渲染目标。1. Obtain the rendering target corresponding to the virtual background in the virtual environment.

可选地，根据现实摄像机的拍摄角度和位置，确定现实摄像机拍摄到的物理屏幕区域；根据物理屏幕区域确定物理屏幕区域上的显示内容；根据前述显示内容，获取虚拟环境中的渲染目标。示例性的，若物理屏幕的尺寸是30×4(m×m)，而现实摄像机设置在距物理屏幕30米处的位置，现实摄像机与物理屏幕之间的夹角为90度，确定现实摄像机拍摄到了物理屏幕的一部分，这部分物理屏幕的大小是20×3(m×m)。Optionally, the physical screen area captured by the real camera is determined based on the shooting angle and position of the real camera; the display content on the physical screen area is determined based on the physical screen area; and the rendering target in the virtual environment is obtained based on the aforementioned display content. For example, if the size of the physical screen is 30×4 (m×m), and the real camera is set at a position 30 meters away from the physical screen, with an angle of 90 degrees between the real camera and the physical screen, it is determined that the real camera has captured a portion of the physical screen, and the size of this portion of the physical screen is 20×3 (m×m).

2、生成虚拟环境中的渲染目标的渲染目标深度图，渲染目标深度图包括虚拟深度信息，虚拟深度信息用于表示在虚拟环境中渲染目标到虚拟摄像机的距离。2. Generate a rendering target depth map of the rendering target in the virtual environment. The rendering target depth map includes virtual depth information, which is used to represent the distance from the rendering target to the virtual camera in the virtual environment.

示例性的，计算机设备存储有虚拟环境的各项数据，其中，该数据包括虚拟摄像机到虚拟环境中的各个点的距离。在确定渲染目标后，可直接确定渲染目标到虚拟摄像机的距离，得到渲染目标深度图。For example, a computer device stores various data about a virtual environment, including the distances from a virtual camera to various points in the virtual environment. After determining the rendering target, the distance from the rendering target to the virtual camera can be directly determined to obtain a depth map of the rendering target.

3、将渲染目标深度图中的虚拟深度信息转化为现实深度信息，得到虚拟背景深度图，现实深度信息用于表示被映射到现实环境中的渲染目标到现实摄像机的距离。3. Convert the virtual depth information in the rendered target depth map into real depth information to obtain a virtual background depth map. The real depth information is used to represent the distance from the rendered target mapped to the real environment to the real camera.

可选地，确定虚拟摄像机在虚拟环境中的第一位置；确定现实摄像机在现实环境中的第二位置；根据第一位置和第二位置之间的位置关系将渲染目标深度图中的虚拟深度信息转化为现实深度信息。示例性的，虚拟摄像机在虚拟环境中位于位置A，现实摄像机在虚拟环境中位于位置B，此时，虚拟环境中的点1到虚拟摄像机的距离为x，设被映射到现实环境中的渲染目标到现实摄像机的距离为y，则有y＝f(x)，f表示函数关系。Optionally, a first position of the virtual camera in the virtual environment is determined; a second position of the real camera in the real environment is determined; and the virtual depth information in the rendering target depth map is converted into real depth information based on the positional relationship between the first and second positions. For example, the virtual camera is located at position A in the virtual environment, and the real camera is located at position B in the virtual environment. In this case, the distance from point 1 in the virtual environment to the virtual camera is x. Let the distance from the rendering target mapped to the real environment to the real camera be y, then y = f(x), where f represents a functional relationship.

可选地，渲染目标深度图是以虚拟摄像机为视角得到的图像，而虚拟背景深度图是以现实摄像机为视角的图像，此时，需要对渲染目标深度图中的像素点做坐标变换。示例性的，确定渲染目标深度图的像素点在物理屏幕上的第一位置坐标；根据第一位置坐标和现实摄像机的内外参数，将第一位置坐标映射为第二位置坐标，第二位置坐标是虚拟背景深度图上的位置坐标；将前述像素点的深度信息填入到第二位置坐标对应的像素点上，得到虚拟背景深度图。Optionally, the target depth map is an image obtained from the perspective of a virtual camera, while the virtual background depth map is an image obtained from the perspective of a real camera. In this case, coordinate transformation of the pixels in the target depth map is required. For example, the first position coordinates of the pixels in the target depth map on the physical screen are determined; based on the first position coordinates and the intrinsic and extrinsic parameters of the real camera, the first position coordinates are mapped to second position coordinates, which are position coordinates on the virtual background depth map; the depth information of the aforementioned pixels is filled into the pixels corresponding to the second position coordinates to obtain the virtual background depth map.

需要说明的是，步骤802和步骤803不存在先后顺序，可以同时执行步骤802和步骤803，也可以先执行步骤802，后执行步骤803，还可以先执行步骤803，后执行步骤802。It should be noted that steps 802 and 803 are not in any particular order. Steps 802 and 803 can be executed simultaneously, or step 802 can be executed first and then step 803, or step 803 can be executed first and then step 802.

步骤804：在虚拟背景深度图中，确定与现实前景深度图中属于背景区域的第i个第一像素点对应的第j个第二像素点。Step 804: In the virtual background depth map, determine the j-th second pixel point that corresponds to the i-th first pixel point belonging to the background region in the real foreground depth map.

其中，i，j为正整数，i的初始值可以为任意整数。Where i and j are positive integers, and the initial value of i can be any integer.

可选地，根据现实摄像机的内外参数，确定背景区域的第i个第一像素点在物理屏幕上的屏幕坐标；在虚拟环境中，根据屏幕坐标确定与第i个第一像素点对应的虚拟点的坐标；根据虚拟摄像机的内外参数，将虚拟点的坐标映射到虚拟背景深度图上，得到第j个第二像素点。Optionally, based on the intrinsic and extrinsic parameters of the real camera, the screen coordinates of the i-th first pixel in the background region on the physical screen are determined; in the virtual environment, the coordinates of the virtual point corresponding to the i-th first pixel are determined based on the screen coordinates; based on the intrinsic and extrinsic parameters of the virtual camera, the coordinates of the virtual point are mapped onto the virtual background depth map to obtain the j-th second pixel.

可选地，根据第i个第一像素点的在现实前景深度图中的位置坐标，在虚拟背景深度图中确定第j个第二像素点的位置坐标。例如，第i个第一像素点在现实前景深度图中的位置坐标是(4，6)，则第j个第二像素点在虚拟背景深度图中的位置坐标也是(4，6)。Optionally, the position coordinates of the j-th second pixel are determined in the virtual background depth map based on the position coordinates of the i-th first pixel in the real foreground depth map. For example, if the position coordinates of the i-th first pixel in the real foreground depth map are (4, 6), then the position coordinates of the j-th second pixel in the virtual background depth map are also (4, 6).

步骤805：使用第j个第二像素点的第二深度信息，在现实前景深度图中替换第i个第一像素点的第一深度信息。Step 805: Use the second depth information of the j-th second pixel to replace the first depth information of the i-th first pixel in the real foreground depth map.

示例性的，使用第j个第二像素点的第二深度信息中的深度值，在现实前景深度图中替换第i个第一像素点的第一深度信息中的深度值。例如，在现实前景深度图中，第i个第一像素点的深度值是20，与第i个第一像素点对应的第j个第二像素点的深度值是80，则将第i个第一像素点的深度值修改为80。For example, the depth value in the second depth information of the j-th second pixel is used to replace the depth value in the first depth information of the i-th first pixel in the real foreground depth map. For instance, in the real foreground depth map, if the depth value of the i-th first pixel is 20 and the depth value of the j-th second pixel corresponding to the i-th first pixel is 80, then the depth value of the i-th first pixel is modified to 80.

在本申请的另一个可选实现方式中，在使用第j个第二像素点的第二深度信息替换第i个第一像素点的第一深度信息前，还可以对第j个第二像素点的第二深度信息进行修改。示例性的，将第二深度信息中的深度值修改为第一目标深度值，或者，为第二深度信息中的深度值增加第二目标深度值，或者，为第二深度信息中的深度值减小第三目标深度值。其中，第一目标深度值、第二目标深度值和第三目标深度值都可以由技术人员自行设置。例如，假设有3个第二像素点，3个第二像素点的深度值分别是20、43和36，将这3个第二像素点的深度值统一设置为40。In another optional implementation of this application, before replacing the first depth information of the i-th first pixel with the second depth information of the j-th second pixel, the second depth information of the j-th second pixel can be modified. For example, the depth value in the second depth information can be modified to a first target depth value, or the depth value in the second depth information can be increased by a second target depth value, or the depth value in the second depth information can be decreased by a third target depth value. The first target depth value, the second target depth value, and the third target depth value can all be set by a person skilled in the art. For example, assuming there are 3 second pixels with depth values of 20, 43, and 36 respectively, the depth values of these 3 second pixels can be uniformly set to 40.

示例性的，请参考图5和图7，图7相较于图5，图7中虚拟背景的深度信息已经被替换。For example, please refer to Figures 5 and 7. In Figure 7, the depth information of the virtual background has been replaced compared to Figure 5.

步骤806：将i更新为i+1，重复上述两个步骤，直至遍历现实前景深度图中属于背景区域的各个第一像素点的第一深度信息，得到融合深度图。Step 806: Update i to i+1, repeat the above two steps until the first depth information of each first pixel in the background region of the real foreground depth map is traversed to obtain the fused depth map.

背景区域包括至少两个第一像素点，则遍历现实前景深度图中属于背景区域的各个第一像素点，直至将背景区域中各个第一像素点的第一深度信息替换为第二像素点的第二深度信息。If the background region includes at least two first pixels, then iterate through each first pixel in the real foreground depth map that belongs to the background region until the first depth information of each first pixel in the background region is replaced with the second depth information of the second pixel.

步骤807：根据融合深度图调整目标视频帧的显示参数，生成目标视频帧的景深效果图。Step 807: Adjust the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame.

可选地，显示参数包括清晰度、亮度、灰度、对比度、饱和度中的至少一种。Optionally, the display parameters include at least one of sharpness, brightness, grayscale, contrast, and saturation.

可选地，根据现实摄像机的预设光圈或预设焦距，确定距离区间，距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到现实摄像机的距离；根据融合深度图和距离区间，调整目标视频帧内各个像素点的清晰度，生成目标视频帧的景深效果图。示例性的，距离区间是[0，20]，则将位于距离区间内的像素点的清晰度设置为100％，将位于距离区间外的像素点的清晰度设置为40％。Optionally, a distance range is determined based on the preset aperture or preset focal length of the real camera. This distance range represents the distance from the reference point corresponding to a pixel with a sharpness greater than a sharpness threshold to the real camera. Based on the fused depth map and the distance range, the sharpness of each pixel within the target video frame is adjusted to generate a depth-of-field effect map of the target video frame. For example, if the distance range is [0, 20], then the sharpness of pixels within the distance range is set to 100%, and the sharpness of pixels outside the distance range is set to 40%.

可选地，根据现实摄像机的对焦距离和融合深度图，调整目标视频帧内虚拟背景对应的区域的清晰度，生成目标视频帧的景深效果图。Optionally, based on the focus distance and fusion depth map of the real camera, the sharpness of the area corresponding to the virtual background within the target video frame is adjusted to generate a depth-of-field effect map of the target video frame.

步骤808：基于目标视频帧的景深效果图，生成具有景深效果的目标视频。Step 808: Generate a target video with depth-of-field effect based on the depth-of-field effect map of the target video frame.

而且，本实施例提供多种方法获取现实前景深度图，以便技术人员根据实际需求调整获取现实前景深度图的方式，不仅可以通过深度摄像机获取现实前景的深度信息，还可以通过两个现实摄像头获取现实前景的深度信息，增加了方案的灵活性。又因为融合深度图中背景区域的深度信息是通过虚拟背景深度图更新得到的，虚拟背景深度图的深度信息又是由虚拟摄像机采集虚拟环境生成的，这种方式获得的深度信息更加准确，得到的景深效果图更加符合实际需求，表现效果较好。Furthermore, this embodiment provides multiple methods for acquiring the real-world foreground depth map, allowing technicians to adjust the acquisition method according to actual needs. It can acquire depth information of the real-world foreground not only through a depth camera but also through two real-world cameras, increasing the flexibility of the solution. Moreover, because the depth information of the background region in the fused depth map is obtained by updating a virtual background depth map, and the depth information of the virtual background depth map is generated by a virtual camera capturing the virtual environment, this method yields more accurate depth information, resulting in a depth-of-field effect that better meets actual requirements and provides superior performance.

在接下来的实施例中，考虑到在一些场景中，现实前景不便于移动，但是又希望可以修改现实前景在目标视频中的显示效果。因此，在接下来的实施例中，还可以对现实前景的深度信息进行调整，使得现实前景的深度信息符合预设的要求。In the following embodiments, considering that in some scenarios the real-world foreground is not easily movable, but it is desirable to modify the display effect of the real-world foreground in the target video, the depth information of the real-world foreground can also be adjusted to meet preset requirements.

图10示出了本申请一个示例性实施例提供的基于虚拟现实的深度信息更新方法的流程示意图。方法可由图1所示的计算机系统100执行，该方法包括：Figure 10 illustrates a flowchart of a virtual reality-based depth information update method provided in an exemplary embodiment of this application. The method can be executed by the computer system 100 shown in Figure 1, and includes:

步骤1001：在现实前景深度图的前景区域中，确定属于目标物体的第三像素点。Step 1001: In the foreground region of the real-world foreground depth map, determine the third pixel point belonging to the target object.

第三像素点是与目标物体对应的像素点。目标物体是现实环境中的物体。The third pixel is the pixel corresponding to the target object. The target object is an object in the real environment.

可选地，将前景区域内属于深度阈值区间的像素点确定为第三像素点。深度阈值区间可由技术人员自行设置。Optionally, pixels within the depth threshold range in the foreground region can be designated as the third pixel. The depth threshold range can be set by the technician.

可选地，将前景区域内属于目标物体区域的像素点确定为第三像素点。目标物体区域可由技术人员自行设置。可选地，通过选取框在前景区域内确定目标物体区域。Optionally, a pixel within the foreground region that belongs to the target object region can be designated as the third pixel. The target object region can be set by the technician. Alternatively, the target object region can be defined within the foreground region using a selection box.

第三像素点是前景区域中的任意一个像素点。The third pixel is any pixel in the foreground area.

步骤1002：响应于深度值更新指令，更新第三像素点的深度值。Step 1002: In response to the depth value update instruction, update the depth value of the third pixel.

可选地，响应于深度值设置指令，将第三像素点的深度值设置为第一预设深度值。第一预设深度值可由技术人员根据实际需求进行设置。示例性的，在一些场景中，目标物体不便于移动，或者希望该目标物体的深度值是一个较大的值，但是受场地限制，无法将该目标物体移动到希望的位置，此时，可以选择将目标物体对应的第三像素点的深度值统一设置为第一预设深度值，使得目标物体在景深效果图中的深度信息贴合实际需求。Optionally, in response to a depth value setting command, the depth value of the third pixel is set to a first preset depth value. The first preset depth value can be set by a technician according to actual needs. For example, in some scenarios, the target object is difficult to move, or a larger depth value is desired, but space constraints prevent moving the target object to the desired location. In such cases, the depth value of the third pixel corresponding to the target object can be uniformly set to the first preset depth value, ensuring that the depth information of the target object in the depth-of-field rendering matches the actual requirements.

可选地，响应于深度值增加指令，为第三像素点的深度值增加第二预设深度值，第二预设深度值可由技术人员根据实际需求进行设置。可选地，响应于深度值减小指令，为第三像素点的深度值减小第三预设深度值，第三预设深度值可由技术人员根据实际需求进行设置。示例性的，在一些场景中，希望目标物体与其他物体的位置进行交换，或者，希望目标物体能够移动到其它物体的前方，或者，希望目标物体能够移动到其它物体的后方，此时，可以改变目标物体对应的第三像素点的深度值，使得生成的景深效果图能够体现出目标物体与其他物体之间的位置关系。例如，现实前景中存在参考物体，参考物体对应的像素点对应的深度值是10，说明该参考物体距现实摄像机有10米，而目标物体对应的像素点对应的深度值是15，说明该目标物体距现实摄像机有15米，而实际需求希望现实前景深度图能够体现出目标物体与现实摄像机的距离小于参考物体与现实摄像机的距离，因此，可以选择将目标物体对应的第三像素点的深度值减小8，那么，目标物体对应第三像素点的深度值是7，可以满足上述实际需求。Optionally, in response to a depth value increase command, a second preset depth value is added to the depth value of the third pixel. This second preset depth value can be set by a technician according to actual needs. Optionally, in response to a depth value decrease command, a third preset depth value is decreased to the depth value of the third pixel. This third preset depth value can be set by a technician according to actual needs. For example, in some scenarios, it may be desirable for the target object to exchange positions with other objects, or for the target object to move in front of or behind other objects. In such cases, the depth value of the third pixel corresponding to the target object can be changed so that the generated depth-of-field effect map can reflect the positional relationship between the target object and other objects. For example, in the foreground, there is a reference object. The depth value of the pixel corresponding to the reference object is 10, indicating that the reference object is 10 meters away from the camera. The depth value of the pixel corresponding to the target object is 15, indicating that the target object is 15 meters away from the camera. However, the actual requirement is that the depth map of the foreground should show that the distance between the target object and the camera is less than the distance between the reference object and the camera. Therefore, we can choose to reduce the depth value of the third pixel corresponding to the target object by 8. Then, the depth value of the third pixel corresponding to the target object is 7, which can meet the above requirement.

在一个具体的例子中，现实前景中有一棵树，这棵树距现实摄像机有20米，由于移动这棵树并不方便，但是又希望目标视频的显示效果是能体现出这棵树距现实摄像机有40米，此时，可以让技术人员输入深度值设置指令，将这棵树对应的第三像素点的深度值统一设置为40，这样得到的景深效果图和目标视频的显示效果都能体现出这棵树距现实摄像机有40米，而且，实现这样的显示效果不需要移动这棵树，操作简便，效率高。In a specific example, there is a tree in the foreground, 20 meters away from the camera. Since moving the tree is inconvenient, but we want the target video to show that the tree is 40 meters away, technicians can input a depth value setting command to uniformly set the depth value of the third pixel corresponding to the tree to 40. In this way, both the resulting depth map and the target video display will show that the tree is 40 meters away from the camera. Moreover, achieving this display effect does not require moving the tree, making the operation simple and efficient.

在另一个具体的例子中，现实前景中有树A和树B，树A距现实摄像机有20米，树B距现实摄像机有25米，而技术人员希望拍摄出的目标视频中树A在树B的后方，又因为直接移动树A或树B是不太容易实现的方案。此时，技术人员可以通过深度值设置指令，直接将树A的深度值设置为30，此时，在目标视频的显示效果中，树A距现实摄像机有30米，树B距现实摄像机还是25米，满足树A在树B的后方的显示效果。或者，技术人员也可以通过深度增加指令，将树A的深度值增加15，这样得到的树A的深度值是35，此时，在目标视频的显示效果中，树A距现实摄像机有35米，树B距现实摄像机还是25米，满足树A在树B的后方的显示效果。或者，技术人员还可以通过深度值减小指令，将树B的深度值减小10，这样得到的树B的深度值是15，此时，在目标视频的显示效果中，树A距现实摄像机还是20米，树B距现实摄像机有15米，满足树A在树B的后方的显示效果。In another specific example, there are tree A and tree B in the real foreground. Tree A is 20 meters away from the camera, and tree B is 25 meters away. The technician wants tree A to appear behind tree B in the target video, but directly moving tree A or tree B is not a practical solution. In this case, the technician can use a depth setting command to directly set the depth value of tree A to 30. In this case, the target video will display tree A 30 meters away from the camera, while tree B will remain 25 meters away, achieving the desired effect of tree A being behind tree B. Alternatively, the technician can use a depth increase command to increase the depth value of tree A by 15, resulting in a depth value of 35. In this case, the target video will also display tree A 35 meters away from the camera, while tree B will remain 25 meters away, again achieving the desired effect of tree A being behind tree B. Alternatively, technicians can reduce the depth value of tree B by 10 using a depth reduction command, resulting in a depth value of 15 for tree B. In this case, in the display effect of the target video, tree A is still 20 meters away from the real camera, and tree B is 15 meters away from the real camera, satisfying the display effect of tree A being behind tree B.

综上所述，本实施例可以修改前景区域内的各个第三像素点，使得前景区域内的像素点的深度信息符合要求，不仅可以减少现实前景的物体的移动，而且还使现实前景的深度信息更加准确。In summary, this embodiment can modify each third pixel in the foreground region so that the depth information of the pixels in the foreground region meets the requirements. This not only reduces the movement of objects in the real foreground but also makes the depth information of the real foreground more accurate.

而且，在不便移动目标物体的情况下，可以根据技术人员的实际需求直接调整前景区域内的各个第三像素点的深度信息，使得目标视频或景深效果图中的前景区域能够呈现出技术人员希望的显示效果。Moreover, when it is inconvenient to move the target object, the depth information of each third pixel in the foreground area can be directly adjusted according to the actual needs of the technicians, so that the foreground area in the target video or depth-of-field effect map can present the display effect desired by the technicians.

图11示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图。该方法在UE4(Unreal Engine 4，虚幻4引擎)中以插件的形式实现。可选地，该方法也可以在Unity3D(一种实时3D互动内容创作和运营平台，属于创作引擎、开发工具)中以插件的形式实现。本申请实施例对该方法的应用平台不做具体限定。在图11所示的实施例中，基于虚拟现实的景深效果图生成方法通过虚拟制作景深插件1101实现，具体步骤如下所示：Figure 11 illustrates a schematic diagram of a virtual reality-based depth-of-field effect generation method provided in an exemplary embodiment of this application. This method is implemented as a plugin in UE4 (Unreal Engine 4). Optionally, this method can also be implemented as a plugin in Unity3D (a real-time 3D interactive content creation and operation platform, belonging to the creation engine and development tools). This application embodiment does not specifically limit the application platform of this method. In the embodiment shown in Figure 11, the virtual reality-based depth-of-field effect generation method is implemented through a virtual depth-of-field plugin 1101, and the specific steps are as follows:

1、插件二线程同步处理。1. Plugin implements two-thread synchronous processing.

1.1、现实前景深度处理线程：处理包含现实摄像机1102与深度摄像机1104的数据，包含从原始YUV转RGBA，结合深度摄像机1104提供的深度信息使用opencv(一个跨平台计算机视觉和机器学习软件库)处理得到现实摄像机1102下的现实前景的深度信息。并在这个现实前景深度处理线程里，利用dx11共享纹理，把虚拟背景深度图复制到当前线程，融合成包含现实前景和虚拟背景的深度信息的融合深度图1107，并利用Compute shader(一种计算机技术，可以实现GPU的并行处理，GPU即Graphics Processing Unit，图形处理器)根据现实摄像机1102拍摄到的与融合深度图1107得到景深表现效果1108。1.1 Realistic Foreground Depth Processing Thread: This thread processes data from both the real-world camera 1102 and the depth camera 1104. This includes converting the original YUV data to RGBA and using OpenCV (a cross-platform computer vision and machine learning library) to process the data, combining the depth information provided by the depth camera 1104 to obtain the depth information of the real-world foreground from the real-world camera 1102. Within this thread, a shared texture from DX11 is used to copy the virtual background depth map to the current thread, fusing them into a fused depth map 1107 containing depth information from both the real-world foreground and the virtual background. Then, using a Compute Shader (a computer technology that enables parallel processing on a GPU, where GPU stands for Graphics Processing Unit), the depth-of-field effect 1108 is obtained based on the data captured by the real-world camera 1102 and the fused depth map 1107.

1.2、虚拟背景深度处理线程：在虚拟摄像机1103对应的Render Target(渲染目标)拿到深度信息，生成虚拟背景深度图1106，并复制给现实前景深度处理线程里的共享纹理。1.2 Virtual Background Depth Processing Thread: Obtain depth information from the Render Target corresponding to the virtual camera 1103, generate a virtual background depth map 1106, and copy it to the shared texture in the real foreground depth processing thread.

2、在现实摄像机数据处理线程中，先确定现实前景深度图的选择方案。本申请实施例包括以下两种可选的实施方式：2. In the real-world camera data processing thread, the selection scheme for the real-world foreground depth map is first determined. This application's embodiments include the following two optional implementation methods:

2.1、选择深度摄像机：标定深度摄像机1104的内外参数。根据深度摄像机1104的内外参数和现实摄像机1102的内外参数，把深度摄像机1104拍摄的深度图映射到现实摄像机1102上，得到现实摄像机1102在对应场景下的深度图，即得到现实前景深度图1105。2.1 Selecting a depth camera: Calibrate the intrinsic and extrinsic parameters of depth camera 1104. Based on the intrinsic and extrinsic parameters of depth camera 1104 and reality camera 1102, map the depth map captured by depth camera 1104 onto reality camera 1102 to obtain the depth map of reality camera 1102 in the corresponding scene, i.e., obtain the real foreground depth map 1105.

2.2、选择再加个辅助摄像机：标定辅助摄像机的内外参数，根据辅助摄像机的内外参数进行图像的立体更正，得到更正的映射关系。根据现实摄像机1102的内外参数进行图像的立体更正，得到更正的映射关系。然后在每帧数据处理中，使用前述的两种映射关系分别对二个摄像机提供的数据进行重构，然后生成视差图，根据视差图得到目标视频帧的深度信息，得到现实前景深度图1105。2.2. Selecting an auxiliary camera: Calibrate the intrinsic and extrinsic parameters of the auxiliary camera, and perform stereo correction on the image based on the auxiliary camera's intrinsic and extrinsic parameters to obtain a corrected mapping relationship. Perform stereo correction on the image based on the intrinsic and extrinsic parameters of the real camera 1102 to obtain a corrected mapping relationship. Then, in the data processing of each frame, use the aforementioned two mapping relationships to reconstruct the data provided by the two cameras respectively, and then generate a disparity map. Based on the disparity map, obtain the depth information of the target video frame to obtain the real foreground depth map 1105.

3、在虚拟背景深度处理线程中，在得到虚拟摄像机1103拍摄的渲染目标，再从渲染目标得到深度信息，转化此深度信息为对应的现实线性距离，得到虚拟背景深度图1106，然后同步复制给现实前景深度处理线程的另一个纹理中。3. In the virtual background depth processing thread, after obtaining the rendering target captured by the virtual camera 1103, the depth information is obtained from the rendering target, and this depth information is converted into the corresponding real linear distance to obtain the virtual background depth map 1106. Then, it is synchronously copied to another texture in the real foreground depth processing thread.

4、现实摄像机数据数据处理线程中，融合成融合深度图1107，这样在现实摄像机的成像中，根据对焦距离，知道虚拟背景的像素点的清晰度；根据设定的光圈或焦距，决定显示清晰像素点的最大距离与最小距离，以及模糊区域的模糊程度。根据上述参数匹配融合深度图1107，可以决定像素点如何显示，生成最终的景深效果图1108。4. In the real-world camera data processing thread, the data is fused into a fusion depth map 1107. This allows the image from the real-world camera to determine the sharpness of the virtual background pixels based on the focus distance; and to determine the maximum and minimum distances for displaying sharp pixels, as well as the degree of blur in blurred areas, based on the set aperture or focal length. By matching the fusion depth map 1107 with these parameters, the display method of the pixels can be determined, generating the final depth-of-field effect map 1108.

请参考图12，其示出了本申请一个实施例提供的基于虚拟现实的视频生成装置的示意图。上述功能可以由硬件实现，也可以由硬件执行相应的软件实现。该装置1200包括：Please refer to Figure 12, which shows a schematic diagram of a virtual reality-based video generation apparatus according to an embodiment of this application. The above functions can be implemented in hardware or by hardware executing corresponding software. The apparatus 1200 includes:

获取模块1201，用于从视频帧序列中获取目标视频帧，所述视频帧序列是由现实摄像机采集目标场景得到的，所述目标场景包括现实前景和虚拟背景，所述虚拟背景显示在现实环境中的物理屏幕上；The acquisition module 1201 is used to acquire target video frames from a video frame sequence, wherein the video frame sequence is obtained by capturing a target scene using a real camera, and the target scene includes a real foreground and a virtual background, wherein the virtual background is displayed on a physical screen in a real environment;

所述获取模块1201，还用于获取所述目标视频帧的现实前景深度图和虚拟背景深度图，所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息，所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息；The acquisition module 1201 is further configured to acquire the real foreground depth map and the virtual background depth map of the target video frame. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background to the real camera after being mapped onto the real environment.

融合模块1202，用于融合所述现实前景深度图和所述虚拟背景深度图，得到融合深度图，所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息；The fusion module 1202 is used to fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map. The fused depth map includes the depth information of each reference point in the target scene from the real environment to the real camera.

更新模块1203，用于根据所述融合深度图调整所述目标视频帧的显示参数，生成所述目标视频帧的景深效果图；The update module 1203 is used to adjust the display parameters of the target video frame according to the fused depth map and generate a depth effect map of the target video frame.

所述更新模块1203，还用于基于所述目标视频帧的所述景深效果图，生成具有景深效果的目标视频。The update module 1203 is also used to generate a target video with a depth effect based on the depth effect map of the target video frame.

在本申请的一个可选设计中，所述现实前景深度图包括与所述虚拟背景对应的背景区域；所述融合模块1202，还用于根据所述虚拟背景深度图内属于所述背景区域的第二像素点的第二深度信息，更新所述现实前景深度图内属于所述背景区域的第一像素点的第一深度信息，得到所述融合深度图。In an optional design of this application, the real foreground depth map includes a background region corresponding to the virtual background; the fusion module 1202 is further configured to update the first depth information of the first pixel in the real foreground depth map belonging to the background region according to the second depth information of the second pixel in the virtual background depth map belonging to the background region, so as to obtain the fused depth map.

在本申请的一个可选设计中，所述获取模块1201，还用于在所述虚拟背景深度图中，确定与所述现实前景深度图中属于所述背景区域的第i个第一像素点对应的第j个第二像素点，所述i，j为正整数；使用所述第j个第二像素点的所述第二深度信息，在所述现实前景深度图中替换所述第i个第一像素点的所述第一深度信息；将i更新为i+1，重复上述两个步骤，直至遍历所述现实前景深度图中属于所述背景区域的各个第一像素点的所述第一深度信息，得到所述融合深度图。In an optional design of this application, the acquisition module 1201 is further configured to: determine, in the virtual background depth map, a j-th second pixel corresponding to the i-th first pixel in the real foreground depth map that belongs to the background region, where i and j are positive integers; use the second depth information of the j-th second pixel to replace the first depth information of the i-th first pixel in the real foreground depth map; update i to i+1, and repeat the above two steps until the first depth information of each first pixel in the real foreground depth map belonging to the background region is traversed to obtain the fused depth map.

在本申请的一个可选设计中，所述获取模块1201，还用于根据所述现实摄像机的内外参数，确定所述背景区域的所述第i个第一像素点在所述物理屏幕上的屏幕坐标；在虚拟环境中，根据所述屏幕坐标确定与所述第i个第一像素点对应的虚拟点的坐标；根据虚拟摄像机的内外参数，将所述虚拟点的坐标映射到所述虚拟背景深度图上，得到所述第j个第二像素点，所述虚拟摄像机用于在所述虚拟环境中拍摄与所述虚拟背景对应的渲染目标。In an optional design of this application, the acquisition module 1201 is further configured to determine the screen coordinates of the i-th first pixel in the background region on the physical screen based on the intrinsic and extrinsic parameters of the real camera; in the virtual environment, determine the coordinates of the virtual point corresponding to the i-th first pixel based on the screen coordinates; and map the coordinates of the virtual point onto the virtual background depth map based on the intrinsic and extrinsic parameters of the virtual camera to obtain the j-th second pixel. The virtual camera is used to capture a rendering target corresponding to the virtual background in the virtual environment.

在本申请的一个可选设计中，所述现实前景深度图包括与所述现实前景对应的前景区域；所述融合模块1202，还用于更新所述现实前景深度图内属于所述前景区域的第三像素点的第三深度信息，所述第三像素点是与目标物体对应的像素点。In an optional design of this application, the real foreground depth map includes a foreground region corresponding to the real foreground; the fusion module 1202 is further configured to update the third depth information of a third pixel point belonging to the foreground region within the real foreground depth map, wherein the third pixel point is a pixel point corresponding to the target object.

在本申请的一个可选设计中，所述融合模块1202，还用于在所述现实前景深度图的所述前景区域中，确定属于所述目标物体的第三像素点；响应于深度值更新指令，更新所述第三像素点的深度值。In an optional design of this application, the fusion module 1202 is further configured to determine a third pixel belonging to the target object in the foreground region of the real foreground depth map; and update the depth value of the third pixel in response to a depth value update instruction.

在本申请的一个可选设计中，所述融合模块1202，还用于根据深度值设置指令，将所述第三像素点的深度值设置为第一预设深度值；或，根据深度值增加指令，为所述第三像素点的深度值增加第二预设深度值；或，根据深度值减小指令，为所述第三像素点的深度值减小第三预设深度值。In an optional design of this application, the fusion module 1202 is further configured to set the depth value of the third pixel to a first preset depth value according to a depth value setting instruction; or, to increase the depth value of the third pixel by a second preset depth value according to a depth value increase instruction; or, to decrease the depth value of the third pixel by a third preset depth value according to a depth value decrease instruction.

在本申请的一个可选设计中，所述获取模块1201，还用于根据深度摄像机的内外参数和所述现实摄像机的内外参数，生成所述深度摄像机与所述现实摄像机之间的空间偏移信息；获取所述深度摄像机采集的深度图；根据所述空间偏移信息，将所述深度图的深度信息映射到所述目标视频帧上，得到所述现实前景深度图。In an optional design of this application, the acquisition module 1201 is further configured to generate spatial offset information between the depth camera and the real camera based on the intrinsic and extrinsic parameters of the depth camera and the intrinsic and extrinsic parameters of the real camera; acquire the depth map captured by the depth camera; and map the depth information of the depth map onto the target video frame based on the spatial offset information to obtain the real foreground depth map.

在本申请的一个可选设计中，所述现实摄像机用于从第一角度拍摄所述目标场景；所述获取模块1201，还用于根据参考摄像机的内外参数，获取第一映射关系，所述第一映射关系用于表示所述参考摄像机的摄像机坐标系和现实坐标系之间的映射关系，所述参考摄像机用于从第二角度拍摄所述目标场景，所述第二角度与所述第一角度不同；根据所述现实摄像机的内外参数，获取第二映射关系，所述第二映射关系用于表示所述现实摄像机的摄像机坐标系和所述现实坐标系之间的映射关系；根据所述第一映射关系对所述参考摄像机拍摄的参考图像进行重构，得到重构参考图像；根据所述第二映射关系对所述现实摄像机拍摄的所述目标视频帧进行重构，得到重构目标场景图像；根据所述重构参考图像和所述重构目标场景图像之间的视差，确定所述目标视频帧内各个像素点的深度信息，得到所述现实前景深度图。In an optional design of this application, the real-world camera is used to capture the target scene from a first angle; the acquisition module 1201 is further used to acquire a first mapping relationship based on the intrinsic and extrinsic parameters of a reference camera, the first mapping relationship representing the mapping relationship between the camera coordinate system of the reference camera and the real-world coordinate system, the reference camera being used to capture the target scene from a second angle, the second angle being different from the first angle; acquire a second mapping relationship based on the intrinsic and extrinsic parameters of the real-world camera, the second mapping relationship representing the mapping relationship between the camera coordinate system of the real-world camera and the real-world coordinate system; reconstruct the reference image captured by the reference camera based on the first mapping relationship to obtain a reconstructed reference image; reconstruct the target video frame captured by the real-world camera based on the second mapping relationship to obtain a reconstructed target scene image; determine the depth information of each pixel in the target video frame based on the parallax between the reconstructed reference image and the reconstructed target scene image to obtain the real-world foreground depth map.

在本申请的一个可选设计中，所述获取模块1201，还用于获取虚拟环境中与所述虚拟背景对应的渲染目标；生成虚拟环境中的所述渲染目标的渲染目标深度图，所述渲染目标深度图包括虚拟深度信息，所述虚拟深度信息用于表示在所述虚拟环境中所述渲染目标到虚拟摄像机的距离；将所述渲染目标深度图中的所述虚拟深度信息转化为现实深度信息，得到所述虚拟背景深度图，所述现实深度信息用于表示被映射到所述现实环境中的所述渲染目标到所述现实摄像机的距离。In an optional design of this application, the acquisition module 1201 is further configured to acquire a rendering target in a virtual environment corresponding to the virtual background; generate a rendering target depth map of the rendering target in the virtual environment, the rendering target depth map including virtual depth information, the virtual depth information being used to represent the distance from the rendering target to the virtual camera in the virtual environment; convert the virtual depth information in the rendering target depth map into real depth information to obtain the virtual background depth map, the real depth information being used to represent the distance from the rendering target mapped to the real environment to the real camera.

在本申请的一个可选设计中，所述更新模块1203，还用于根据所述现实摄像机的预设光圈或预设焦距，确定距离区间，所述距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到所述现实摄像机的距离；根据所述融合深度图和所述距离区间，调整所述目标视频帧内各个像素点的清晰度，生成所述目标视频帧的所述景深效果图。In an optional design of this application, the update module 1203 is further configured to determine a distance range based on the preset aperture or preset focal length of the real camera, wherein the distance range represents the distance from the reference point corresponding to a pixel with a sharpness greater than a sharpness threshold to the real camera; and adjust the sharpness of each pixel in the target video frame based on the fused depth map and the distance range to generate the depth-of-field effect map of the target video frame.

在本申请的一个可选设计中，所述更新模块1203，还用于根据所述现实摄像机的对焦距离和所述融合深度图，调整所述目标视频帧内所述虚拟背景对应的区域的清晰度，生成所述目标视频帧的所述景深效果图。In an optional design of this application, the update module 1203 is further configured to adjust the sharpness of the region corresponding to the virtual background within the target video frame based on the focus distance of the real camera and the fusion depth map, and generate the depth-of-field effect map of the target video frame.

在本申请的一个可选设计中，所述获取模块1201，还用于获取所述视频帧序列对应的的至少两张景深效果图；所述更新模块1203，还用于按照时间顺序排列所述至少两张景深效果图，得到所述视频帧序列对应的景深视频。In an optional design of this application, the acquisition module 1201 is further configured to acquire at least two depth-of-field effect images corresponding to the video frame sequence; the update module 1203 is further configured to arrange the at least two depth-of-field effect images in chronological order to obtain the depth-of-field video corresponding to the video frame sequence.

综上所述，本申请实施例中，现实摄像机拍摄目标场景，生成视频帧序列。再根据视频帧序列获取目标视频帧，并对目标视频帧的深度信息进行更新，使得目标视频帧包括的深度信息更加准确。由于增加了虚拟背景的深度信息，使得虚拟背景和现实前景结合组成的图更加自然，显示效果较好。In summary, in this embodiment, a real-world camera captures the target scene, generating a video frame sequence. The target video frame is then obtained from the video frame sequence, and its depth information is updated to ensure greater accuracy. The addition of depth information from the virtual background results in a more natural and visually appealing image formed by the combination of the virtual background and the real foreground.

图13是根据一示例性实施例示出的一种计算机设备的结构示意图。计算机设备1300包括中央处理单元(Central Processing Unit，CPU)1301、包括随机存取存储器(Random Access Memory，RAM)1302和只读存储器(Read-Only Memory，ROM)1303的系统存储器1304，以及连接系统存储器1304和中央处理单元1301的系统总线1305。计算机设备1300还包括帮助计算机设备内的各个器件之间传输信息的基本输入/输出系统(Input/Output，I/O系统)1306，和用于存储操作系统1313、应用程序1314和其他程序模块1315的大容量存储设备1307。Figure 13 is a schematic diagram of a computer device according to an exemplary embodiment. The computer device 1300 includes a central processing unit (CPU) 1301, a system memory 1304 including random access memory (RAM) 1302 and read-only memory (ROM) 1303, and a system bus 1305 connecting the system memory 1304 and the CPU 1301. The computer device 1300 also includes a basic input/output system (I/O system) 1306 that facilitates the transfer of information between various devices within the computer device, and a mass storage device 1307 for storing the operating system 1313, application programs 1314, and other program modules 1315.

基本输入/输出系统1306包括有用于显示信息的显示器1308和用于用户输入信息的诸如鼠标、键盘之类的输入设备1309。其中显示器1308和输入设备1309都通过连接到系统总线1305的输入输出控制器1310连接到中央处理单元1301。基本输入/输出系统1306还可以包括输入输出控制器1310以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地，输入输出控制器1310还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 1306 includes a display 1308 for displaying information and an input device 1309 for user input, such as a mouse or keyboard. Both the display 1308 and the input device 1309 are connected to the central processing unit 1301 via an input/output controller 1310 connected to the system bus 1305. The basic input/output system 1306 may also include the input/output controller 1310 for receiving and processing input from multiple other devices such as a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 1310 also provides output to a display screen, printer, or other types of output devices.

大容量存储设备1307通过连接到系统总线1305的大容量存储控制器(未示出)连接到中央处理单元1301。大容量存储设备1307及其相关联的计算机设备可读介质为计算机设备1300提供非易失性存储。也就是说，大容量存储设备1307可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory，CD-ROM)驱动器之类的计算机设备可读介质(未示出)。Mass storage device 1307 is connected to central processing unit 1301 via a mass storage controller (not shown) connected to system bus 1305. Mass storage device 1307 and its associated computer device readable media provide non-volatile storage for computer device 1300. That is, mass storage device 1307 may include computer device readable media (not shown) such as hard disk or compact disc read-only memory (CD-ROM) drive.

不失一般性，计算机设备可读介质可以包括计算机设备存储介质和通信介质。计算机设备存储介质包括以用于存储诸如计算机设备可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机设备存储介质包括RAM、ROM、可擦除可编程只读存储器(Erasable Programmable ReadOnly Memory，EPROM)、带电可擦可编程只读存储器(Electrically ErasableProgrammable Read-Only Memory，EEPROM)，CD-ROM、数字视频光盘(Digital Video Disc，DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然，本领域技术人员可知计算机设备存储介质不局限于上述几种。上述的系统存储器1304和大容量存储设备1307可以统称为存储器。Without loss of generality, computer device readable media can include computer device storage media and communication media. Computer device storage media includes volatile and non-volatile, removable and non-removable media implemented using any method or technology for storing information such as computer device readable instructions, data structures, program modules, or other data. Computer device storage media includes RAM, ROM, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), CD-ROM, digital video disc (DVD) or other optical storage, magnetic tape cassettes, magnetic tape, disk storage, or other magnetic storage devices. Of course, those skilled in the art will recognize that computer device storage media are not limited to the above-mentioned types. The system memory 1304 and mass storage device 1307 described above can be collectively referred to as memory.

根据本公开的各种实施例，计算机设备1300还可以通过诸如因特网等网络连接到网络上的远程计算机设备运行。也即计算机设备1300可以通过连接在系统总线1305上的网络接口单元1311连接到网络1312，或者说，也可以使用网络接口单元1311来连接到其他类型的网络或远程计算机设备系统(未示出)。According to various embodiments of this disclosure, computer device 1300 can also be connected to and operated on a remote computer device on a network, such as the Internet. That is, computer device 1300 can be connected to network 1312 via network interface unit 1311 connected to system bus 1305, or network interface unit 1311 can be used to connect to other types of networks or remote computer device systems (not shown).

存储器还包括一个或者一个以上的程序，一个或者一个以上程序存储于存储器中，中央处理器1301通过执行该一个或一个以上程序来实现上述基于虚拟现实的视频生成方法的全部或者部分步骤。The memory also includes one or more programs, which are stored in the memory. The central processing unit 1301 executes the one or more programs to implement all or part of the steps of the above-described virtual reality-based video generation method.

在示例性实施例中，还提供了一种计算机可读存储介质，计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述各个方法实施例提供的基于虚拟现实的视频生成方法。In an exemplary embodiment, a computer-readable storage medium is also provided, which stores at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, at least one program, code set, or instruction set is loaded and executed by a processor to implement the virtual reality-based video generation method provided in the above-described method embodiments.

本申请还提供一种计算机可读存储介质，存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述方法实施例提供的基于虚拟现实的视频生成方法。This application also provides a computer-readable storage medium storing at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, at least one program, code set, or instruction set is loaded and executed by a processor to implement the virtual reality-based video generation method provided in the above-described method embodiments.

本申请还提供一种计算机程序产品或计算机程序，上述计算机程序产品或计算机程序包括计算机指令，上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令，上述处理器执行上述计算机指令，使得上述计算机设备执行如上方面实施例提供的基于虚拟现实的视频生成方法。This application also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the virtual reality-based video generation method provided in the above embodiments.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware, or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

以上仅为本申请的可选实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above are merely optional embodiments of this application and are not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A video generation method based on virtual reality, characterized in that the method comprises:

The target video frame is obtained from a video frame sequence, which is obtained by capturing the target scene with a real camera. The target scene includes a real foreground and a virtual background, and the virtual background is displayed on a physical screen in the real environment.

The real foreground depth map and virtual background depth map of the target video frame are obtained. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background mapped to the real environment to the real camera.

The real foreground depth map and the virtual background depth map are fused to obtain a fused depth map, which includes the depth information of each reference point in the target scene from the real environment to the real camera;

Adjust the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame;

Based on the depth-of-field effect map of the target video frame, a target video with a depth-of-field effect is generated.

2. The method according to claim 1, wherein the real foreground depth map includes a background region corresponding to the virtual background;

The process of fusing the real foreground depth map and the virtual background depth map to obtain a fused depth map includes:

Based on the second depth information of the second pixel point belonging to the background region in the virtual background depth map, the first depth information of the first pixel point belonging to the background region in the real foreground depth map is updated to obtain the fused depth map.

3. The method according to claim 2, characterized in that, updating the first depth information of the first pixel in the real foreground depth map belonging to the background region based on the second depth information of the second pixel in the virtual background depth map belonging to the background region, to obtain the fused depth map, includes:

In the virtual background depth map, the j-th second pixel point corresponding to the i-th first pixel point belonging to the background region in the real foreground depth map is determined, where i and j are positive integers;

The first depth information of the i-th first pixel is replaced in the real foreground depth map using the second depth information of the j-th second pixel.

Update i to i+1, and repeat the above two steps until the first depth information of each first pixel point belonging to the background region in the real foreground depth map is traversed to obtain the fused depth map.

4. The method according to claim 3, characterized in that, determining the j-th second pixel point corresponding to the i-th first pixel point belonging to the background region in the real foreground depth map in the virtual background depth map includes:

Based on the intrinsic and extrinsic parameters of the real camera, determine the screen coordinates of the i-th first pixel in the background region on the physical screen;

In the virtual environment, the coordinates of the virtual point corresponding to the i-th first pixel are determined based on the screen coordinates;

Based on the intrinsic and extrinsic parameters of the virtual camera, the coordinates of the virtual point are mapped onto the virtual background depth map to obtain the j-th second pixel point. The virtual camera is used to capture the rendering target corresponding to the virtual background in the virtual environment.

5. The method according to any one of claims 1 to 4, wherein the real foreground depth map includes a foreground region corresponding to the real foreground;

The method further includes:

Update the third depth information of the third pixel point belonging to the foreground region within the real foreground depth map, where the third pixel point is the pixel point corresponding to the target object.

6. The method according to claim 5, wherein the third depth information includes a depth value;

Updating the third depth information of the third pixel point belonging to the foreground region within the real-world foreground depth map includes:

In the foreground region of the real foreground depth map, determine the third pixel point belonging to the target object;

In response to the depth value update command, the depth value of the third pixel is updated.

7. The method according to claim 6, wherein updating the depth value of the third pixel in response to the depth value update instruction comprises:

According to the depth value setting instruction, the depth value of the third pixel is set to the first preset depth value;

Alternatively, according to the depth value increment instruction, a second preset depth value is added to the depth value of the third pixel;

Alternatively, according to the depth value reduction instruction, the depth value of the third pixel is reduced by a third preset depth value.

8. The method according to any one of claims 1 to 4, characterized in that, obtaining the real foreground depth map of the target scene includes:

Based on the intrinsic and extrinsic parameters of the depth camera and the intrinsic and extrinsic parameters of the real camera, spatial offset information between the depth camera and the real camera is generated;

Obtain the depth map captured by the depth camera;

Based on the spatial offset information, the depth information of the depth map is mapped onto the target video frame to obtain the real foreground depth map.

9. The method according to any one of claims 1 to 4, wherein the real-world camera is used to capture the target scene from a first angle;

The acquisition of the real-world foreground depth map of the target scene includes:

Based on the intrinsic and extrinsic parameters of the reference camera, a first mapping relationship is obtained. The first mapping relationship is used to represent the mapping relationship between the camera coordinate system and the real coordinate system of the reference camera. The reference camera is used to capture the target scene from a second angle, which is different from the first angle.

Based on the intrinsic and extrinsic parameters of the real camera, a second mapping relationship is obtained. The second mapping relationship is used to represent the mapping relationship between the camera coordinate system of the real camera and the real coordinate system.

The reference image captured by the reference camera is reconstructed according to the first mapping relationship to obtain a reconstructed reference image; the target video frame captured by the real camera is reconstructed according to the second mapping relationship to obtain a reconstructed target scene image.

Based on the parallax between the reconstructed reference image and the reconstructed target scene image, the depth information of each pixel within the target video frame is determined to obtain the real foreground depth map.

10. The method according to any one of claims 1 to 4, characterized in that, obtaining the virtual background depth map of the target scene includes:

Obtain the rendering target in the virtual environment corresponding to the virtual background;

Generate a rendering target depth map of the rendering target in the virtual environment, the rendering target depth map including virtual depth information, the virtual depth information being used to represent the distance from the rendering target to the virtual camera in the virtual environment;

The virtual depth information in the rendering target depth map is converted into real depth information to obtain the virtual background depth map. The real depth information is used to represent the distance from the rendering target mapped to the real environment to the real camera.

11. The method according to any one of claims 1 to 4, characterized in that, adjusting the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame includes:

Based on the preset aperture or preset focal length of the real camera, a distance range is determined. The distance range is used to represent the distance from the reference point corresponding to the pixel with a sharpness greater than the sharpness threshold to the real camera.

Based on the fused depth map and the distance range, the sharpness of each pixel in the target video frame is adjusted to generate the depth-of-field effect map of the target video frame.

12. The method according to any one of claims 1 to 4, characterized in that, adjusting the display parameters of the target video frame according to the fused depth map to generate a depth-of-field effect map of the target video frame includes:

Based on the focus distance of the real camera and the fusion depth map, the sharpness of the area corresponding to the virtual background within the target video frame is adjusted to generate the depth-of-field effect map of the target video frame.

13. The method according to any one of claims 1 to 4, characterized in that generating a target video with a depth-of-field effect based on the depth-of-field effect map of the target video frame includes:

The depth-of-field effect diagram of the target video frame is played at a preset frame rate to obtain the target video with depth-of-field effect.

14. A video generation device based on virtual reality, characterized in that the device comprises:

The acquisition module is used to acquire target video frames from a video frame sequence, wherein the video frame sequence is obtained by capturing a target scene using a real camera, and the target scene includes a real foreground and a virtual background, wherein the virtual background is displayed on a physical screen in a real environment;

The acquisition module is further configured to acquire the real foreground depth map and the virtual background depth map of the target video frame. The real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the depth information from the virtual background to the real camera after being mapped onto the real environment.

A fusion module is used to fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map, wherein the fused depth map includes the depth information of each reference point in the target scene from the real environment to the real camera;

The update module is used to adjust the display parameters of the target video frame according to the fused depth map and generate a depth-of-field effect map of the target video frame;

The update module is also used to generate a target video with a depth effect based on the depth effect map of the target video frame.

15. A computer device, characterized in that the computer device comprises: a processor and a memory, the memory storing at least one program, the at least one program being loaded and executed by the processor to implement the virtual reality-based video generation method as described in any one of claims 1 to 13.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores at least one piece of program code, the program code being loaded and executed by a processor to implement the virtual reality-based video generation method as described in any one of claims 1 to 13.

17. A computer program product comprising a computer program or instructions, characterized in that, when the computer program or instructions are executed by a processor, they implement the virtual reality-based video generation method according to any one of claims 1 to 13.