CN118354056A - A three-dimensional video communication method with high sensitivity to motion parallax - Google Patents
A three-dimensional video communication method with high sensitivity to motion parallax Download PDFInfo
- Publication number
- CN118354056A CN118354056A CN202410526045.6A CN202410526045A CN118354056A CN 118354056 A CN118354056 A CN 118354056A CN 202410526045 A CN202410526045 A CN 202410526045A CN 118354056 A CN118354056 A CN 118354056A
- Authority
- CN
- China
- Prior art keywords
- eye
- video communication
- camera
- motion parallax
- communication method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 60
- 238000004891 communication Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000035945 sensitivity Effects 0.000 title description 3
- 230000036544 posture Effects 0.000 claims abstract description 40
- 238000009877 rendering Methods 0.000 claims abstract description 24
- 230000005540 biological transmission Effects 0.000 claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 230000001934 delay Effects 0.000 claims abstract description 11
- 230000001815 facial effect Effects 0.000 claims description 6
- 230000004424 eye movement Effects 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 210000003128 head Anatomy 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 17
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/366—Image reproducers using viewer tracking
- H04N13/383—Image reproducers using viewer tracking for tracking with gaze detection, i.e. detecting the lines of sight of the viewer's eyes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/122—Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
本发明公开了一种高灵敏运动视差的三维视频通信方法,属于计算机技术领域,通过计算机视觉算法跟踪眼部位姿,并预测眼部在空间中的运动轨迹以及未来的位姿,将预测的眼部位姿信息传输给三维重建和渲染模块,计算得到未来视角下的画面图像,达到对采集、传输、计算时延补偿的效果,实现高灵敏运动视差的三维视频通信;该方法对对眼部位姿进行跟踪和预测,信息传输给三维重建渲染模块,生成未来视角下的画面,实现高质量的三维视频通信,提供更加沉浸和逼真的体验。
The invention discloses a three-dimensional video communication method with high-sensitivity motion parallax, which belongs to the field of computer technology. The method tracks eye postures through a computer vision algorithm, predicts the motion trajectory of the eyes in space and the future posture, transmits the predicted eye posture information to a three-dimensional reconstruction and rendering module, calculates and obtains a picture image under the future perspective, achieves the effect of compensating for acquisition, transmission, and calculation delays, and realizes three-dimensional video communication with high-sensitivity motion parallax. The method tracks and predicts the eye postures, transmits the information to a three-dimensional reconstruction and rendering module, generates a picture under the future perspective, realizes high-quality three-dimensional video communication, and provides a more immersive and realistic experience.
Description
技术领域Technical Field
本发明属于计算机领域,尤其涉及一种高灵敏运动视差的三维视频通信方法。The invention belongs to the field of computers, and in particular relates to a three-dimensional video communication method with high-sensitivity motion parallax.
背景技术Background technique
随着网络、AI、云计算的不断发展,三维视频通信成为未来远程交流的主流方式。与传统的二维相比,三维视频通信通过捕捉和远程参与者的三维图像和环境信息,使本地用户能够更准确地感知对方的运动、深度和空间位置。它不仅能够弥补地理距离带来的障碍,还能够提供更加身临其境的体验,促进实时合作和交流。为了实现观看的3d效果,需要考虑双目视差、运动视差和深度视差。其中运动视差是基于参与者的运动而产生的视差效果,通过捕捉本地用户的眼部位置,3d显示设备显示该视角下远程参会者的画面。然而由于数据采集、数据传输、三维重建和渲染等计算的时延,会导致本地用户的眼部移动,该视角的画面切换存在时延,也就是运动视差存在迟滞现象。针对此问题,目前业界的常用方法通过增加传感器设备、增加带宽来减少画面的时延,但这些方法并不能完全解决运动视差的迟滞问题。在目前的现有技术中缺少一种高灵敏运动视差的三维视频通信方法。With the continuous development of the Internet, AI, and cloud computing, three-dimensional video communication has become the mainstream mode of remote communication in the future. Compared with traditional two-dimensional, three-dimensional video communication enables local users to more accurately perceive the movement, depth, and spatial position of each other by capturing the three-dimensional images and environmental information of remote participants. It can not only make up for the obstacles caused by geographical distance, but also provide a more immersive experience and promote real-time cooperation and communication. In order to achieve the 3D effect of viewing, binocular parallax, motion parallax, and depth parallax need to be considered. Among them, motion parallax is a parallax effect based on the movement of the participants. By capturing the eye position of the local user, the 3D display device displays the picture of the remote participant from this perspective. However, due to the delay of calculations such as data acquisition, data transmission, three-dimensional reconstruction, and rendering, the local user's eyes will move, and there will be a delay in the picture switching of this perspective, that is, there is a hysteresis in motion parallax. To address this problem, the common methods in the industry currently reduce the delay of the picture by adding sensor devices and increasing bandwidth, but these methods cannot completely solve the hysteresis problem of motion parallax. In the current prior art, there is a lack of a three-dimensional video communication method with high sensitivity motion parallax.
发明内容Summary of the invention
本发明所要解决的技术问题是针对背景技术的不足提供一种高灵敏运动视差的三维视频通信方法,通过计算机视觉算法跟踪眼部位姿,并预测眼部在空间中的运动轨迹以及未来的位姿,将预测的眼部位姿信息传输给三维重建和渲染模块,计算得到未来视角下的画面图像,达到对采集、传输、计算时延补偿的效果,实现高灵敏运动视差的三维视频通信。The technical problem to be solved by the present invention is to provide a high-sensitivity motion parallax three-dimensional video communication method in response to the shortcomings of the background technology. The eye posture is tracked by a computer vision algorithm, and the movement trajectory of the eye in space and the future posture are predicted. The predicted eye posture information is transmitted to a three-dimensional reconstruction and rendering module, and the picture image under the future perspective is calculated, so as to achieve the effect of compensating for the acquisition, transmission, and calculation delays, and realize three-dimensional video communication with high-sensitivity motion parallax.
本发明为解决上述技术问题采用以下技术方案:The present invention adopts the following technical solutions to solve the above technical problems:
一种高灵敏运动视差的三维视频通信方法,具体包含如下步骤;A high-sensitivity motion parallax three-dimensional video communication method specifically comprises the following steps:
步骤1,在本地用户侧,通过采集设备采集的画面用于眼动追踪,通过眼动追踪模块获取本地用户眼部在空间中的位姿信息;Step 1: On the local user side, the images collected by the collection device are used for eye tracking, and the position information of the local user's eyes in space is obtained through the eye tracking module;
步骤2,通过远程参会者侧的相机采集画面用于三维重建和渲染;Step 2: Using the camera on the remote participant's side to collect images for 3D reconstruction and rendering;
步骤3,将本地用户眼部的空间位姿信息和远程参与者的画面数据传输至云主机,三维重建渲染模块根据眼部位姿和画面信息进行新视角图像的生成;Step 3: The spatial position information of the local user's eyes and the image data of the remote participant are transmitted to the cloud host, and the 3D reconstruction rendering module generates a new perspective image according to the eye position and image information;
步骤4,将新视角图像传输给本地用户完成。Step 4: Transmit the new view image to the local user.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤1中,采用眼动追踪模块通过AI估计或几何算法获取本地用户眼部在空间中的位姿信息,具体包含如下步骤:As a further preferred solution of the high-sensitivity motion parallax three-dimensional video communication method of the present invention, in step 1, an eye tracking module is used to obtain the position information of the local user's eyes in space through AI estimation or geometric algorithm, which specifically includes the following steps:
步骤1.1,初始化双相机位置,计算双相机之间的相对位置关系,根据相机拍摄画面使用相机标定算法求取相机内外参,或结合深度传感器在三维空间求解或使用人工智能算法估计相对位置;Step 1.1, initialize the positions of the two cameras, calculate the relative position relationship between the two cameras, use the camera calibration algorithm to obtain the camera internal and external parameters according to the camera shooting picture, or use the depth sensor to solve in three-dimensional space or use the artificial intelligence algorithm to estimate the relative position;
步骤1.2,对双相机拍摄的画面分别进行人脸识别以及人脸关键点检测,提取相应的面部特征点,求得画面中各特征点的像素坐标以及两个相机中特征点的对应关系;Step 1.2, perform face recognition and face key point detection on the images taken by the dual cameras, extract the corresponding facial feature points, and obtain the pixel coordinates of each feature point in the image and the corresponding relationship between the feature points in the two cameras;
步骤1.3,根据双相机之间的相对关系与画面中的特征点,确定唯一的空间三角形利用几何求解法,计算眼部在三维空间中的位置坐标;Step 1.3, according to the relative relationship between the two cameras and the feature points in the picture, determine the unique spatial triangle and use the geometric solution method to calculate the position coordinates of the eye in the three-dimensional space;
步骤1.4,结合面部特征和头部位姿计算双目的视线方向,得到眼部三维空间的位姿。Step 1.4, calculate the sight direction of the two eyes by combining the facial features and the head posture to obtain the three-dimensional spatial posture of the eyes.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤1中,增加眼部位姿预测模块,通过预测未来时间的眼部轨迹和未来某时刻的眼部位姿,来补偿采集、传输、计算时延,解决运动视差的迟滞现象。As a further preferred solution of the high-sensitivity motion parallax three-dimensional video communication method of the present invention, in step 1, an eye posture prediction module is added to compensate for the acquisition, transmission, and calculation delays by predicting the eye trajectory in the future and the eye posture at a certain moment in the future, thereby solving the hysteresis phenomenon of motion parallax.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,增加眼部位姿预测模块,具体步骤下:As a further preferred solution of the high-sensitivity motion parallax 3D video communication method of the present invention, an eye position posture prediction module is added, and the specific steps are as follows:
本地用户侧相机采集画面,眼动追踪模块获取本地用户眼部在空间中的位姿信息;The local user's camera collects images, and the eye tracking module obtains the local user's eye position information in space;
增加眼部位姿预测模块,通过时间序列预测算法或者状态估计算法对眼部运动的轨迹和位置进行估计,将预测的未来时间戳和对应的位姿信息发送给重建和渲染模块。An eye pose prediction module is added to estimate the trajectory and position of eye movement through a time series prediction algorithm or a state estimation algorithm, and the predicted future timestamp and corresponding pose information are sent to the reconstruction and rendering module.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,眼动追踪与预测的数据传输协议需要自定义,包含时间戳和位姿信息,格式如下:As a further preferred solution of the high-sensitivity motion parallax 3D video communication method of the present invention, the data transmission protocol of eye tracking and prediction needs to be customized, including timestamp and posture information, and the format is as follows:
{(ts_curr,pos_curr),{(ts_curr,pos_curr),
(ts_pred1,pos_pred1),(ts_pred1,pos_pred1),
(ts_pred2,pos_pred 2),…}(ts_pred2,pos_pred 2),…}
其中,ts_curr:当前的时间戳;pos_curr:当前的眼部位姿;;ts_pred:未来的时间戳,其中,编号表示预测多个;pos_pred:预测的眼部位姿,编号表示预测多个。Among them, ts_curr: current timestamp; pos_curr: current eye pose; ts_pred: future timestamp, where the number indicates multiple predictions; pos_pred: predicted eye pose, where the number indicates multiple predictions.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,重建和渲染模块会收到多个预测的眼部位姿,生成多个新视角画面,将画面传输回本地用户,传输协议需要需要自定义,包含时间戳、位姿信息、帧图像信息,格式如下:As a further preferred solution of the high-sensitivity motion parallax 3D video communication method of the present invention, the reconstruction and rendering module receives multiple predicted eye postures, generates multiple new perspective images, and transmits the images back to the local user. The transmission protocol needs to be customized, including timestamp, posture information, frame image information, and the format is as follows:
{(ts_curr,pos_curr,frame_curr),{(ts_curr,pos_curr,frame_curr),
(ts_pred1,pos_pred1,frame_pred1),(ts_pred1,pos_pred1,frame_pred1),
(ts_pred2,pos_pred 2,frame_pred2),…}(ts_pred2,pos_pred 2,frame_pred2),…}
其中,frame表示在不同时刻下根据不同的眼部位姿生成的新视角的画面帧。Among them, frame represents the picture frame of a new perspective generated according to different eye postures at different times.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤1中,根据眼动追踪算法要求,所述采集设备采用相机,所述采样相机采用单个或多个。As a further preferred solution of the high-sensitivity motion parallax three-dimensional video communication method of the present invention, in step 1, according to the requirements of the eye tracking algorithm, the acquisition device uses a camera, and the sampling camera uses a single camera or multiple cameras.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤2中,根据重建算法要求,所述采集设备采用相机,采集相机采用单个或多个。As a further preferred solution of the high-sensitivity motion parallax 3D video communication method of the present invention, in step 2, according to the requirements of the reconstruction algorithm, the acquisition device uses a camera, and the acquisition camera uses a single camera or multiple cameras.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤2中,所述采集相机采用RGB-D相机。As a further preferred solution of the high-sensitivity motion parallax three-dimensional video communication method of the present invention, in step 2, the acquisition camera adopts an RGB-D camera.
作为本发明一种高灵敏运动视差的三维视频通信方法的进一步优选方案,在步骤1中,所述采集相机采用RGB相机。As a further preferred solution of the high-sensitivity motion parallax three-dimensional video communication method of the present invention, in step 1, the acquisition camera is an RGB camera.
本发明采用以上技术方案与现有技术相比,具有以下技术效果:Compared with the prior art, the present invention adopts the above technical solution and has the following technical effects:
1、本发明一种高灵敏运动视差的三维视频通信方法,通过计算机视觉算法跟踪眼部位姿,并预测眼部在空间中的运动轨迹以及未来的位姿,将预测的眼部位姿信息传输给三维重建和渲染模块,计算得到未来视角下的画面图像,达到对采集、传输、计算时延补偿的效果,实现高灵敏运动视差的三维视频通信;1. A high-sensitivity motion parallax 3D video communication method of the present invention tracks eye postures through a computer vision algorithm, predicts the motion trajectory of the eye in space and the future posture, transmits the predicted eye posture information to a 3D reconstruction and rendering module, calculates and obtains the screen image under the future perspective, achieves the effect of compensating for the acquisition, transmission, and calculation delays, and realizes high-sensitivity motion parallax 3D video communication;
2、该方法对对眼部位姿进行跟踪和预测,信息传输给三维重建渲染模块,生成未来视角下的画面,实现高质量的三维视频通信,提供更加沉浸和逼真的体验;2. This method tracks and predicts the eye posture, transmits the information to the 3D reconstruction rendering module, generates images from the future perspective, realizes high-quality 3D video communication, and provides a more immersive and realistic experience;
3、该方法在处理过程中考虑采集、传输、计算时延,通过对眼部运动进行预测,达到对时延补偿的效果,从根本解决运动视差的迟滞问题,提高用户体验;3. This method takes into account the acquisition, transmission, and calculation delays during the processing process, and achieves the effect of delay compensation by predicting eye movement, fundamentally solving the hysteresis problem of motion parallax and improving user experience;
4、该方法并不需要增加传感器设备、增加带宽,可移植性强,适用于云计算场景,充分利用云端算力,是一种通用的降低时延的方法。4. This method does not require the addition of sensor equipment or bandwidth, has strong portability, is suitable for cloud computing scenarios, and makes full use of cloud computing power. It is a general method to reduce latency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without paying any creative work.
图1是在三维视频通信中,本地用户要看到远程参与者的3d效果需要经过的过程示意图;FIG1 is a schematic diagram of the process that a local user needs to go through to see the 3D effect of a remote participant in a 3D video communication;
图2是增加眼部位姿预测模块的示意图。FIG. 2 is a schematic diagram of adding an eye pose prediction module.
具体实施方式Detailed ways
下面结合附图对本发明的技术方案做进一步的详细说明:The technical solution of the present invention is further described in detail below in conjunction with the accompanying drawings:
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。下面根据附图和优选实施例详细描述本发明,本发明的目的和效果将变得更加明白,应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments in the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention. The present invention is described in detail below based on the drawings and preferred embodiments, and the purpose and effect of the present invention will become clearer. It should be understood that the specific embodiments described here are only used to explain the present invention and are not used to limit the present invention.
本发明的主要目的是通过计算机视觉算法跟踪眼部位姿,并预测眼部在空间中的运动轨迹以及未来的位姿,将预测的眼部位姿信息传输给三维重建和渲染模块,计算得到未来视角下的画面图像,达到对采集、传输、计算时延补偿的效果,实现高灵敏运动视差的三维视频通信。The main purpose of the present invention is to track the eye posture through computer vision algorithms, predict the movement trajectory of the eyes in space and the future posture, transmit the predicted eye posture information to the three-dimensional reconstruction and rendering module, calculate the picture image under the future perspective, achieve the effect of compensating for the acquisition, transmission and calculation delays, and realize three-dimensional video communication with high-sensitivity motion parallax.
随着网络、AI、云计算的不断发展,三维视频通信成为未来远程交流的主流方式。与传统的二维相比,三维视频通信通过捕捉和远程参与者的三维图像和环境信息,使本地用户能够更准确地感知对方的运动、深度和空间位置。它不仅能够弥补地理距离带来的障碍,还能够提供更加身临其境的体验,促进实时合作和交流。为了实现观看的3d效果,需要考虑双目视差、运动视差和深度视差。其中运动视差是基于参与者的运动而产生的视差效果,通过捕捉本地用户的眼部位置,3d显示设备显示该视角下远程参会者的画面。然而由于数据采集、数据传输、三维重建和渲染等计算的时延,会导致本地用户的眼部移动,该视角的画面切换存在时延,也就是运动视差存在迟滞现象。With the continuous development of the Internet, AI, and cloud computing, three-dimensional video communication has become the mainstream mode of remote communication in the future. Compared with traditional two-dimensional, three-dimensional video communication enables local users to more accurately perceive the movement, depth, and spatial position of each other by capturing the three-dimensional images and environmental information of remote participants. It can not only make up for the obstacles caused by geographical distance, but also provide a more immersive experience and promote real-time cooperation and communication. In order to achieve the 3D effect of viewing, binocular parallax, motion parallax, and depth parallax need to be considered. Among them, motion parallax is a parallax effect based on the movement of the participants. By capturing the eye position of the local user, the 3D display device displays the picture of the remote participant from that perspective. However, due to the delay of calculations such as data acquisition, data transmission, three-dimensional reconstruction, and rendering, the local user's eyes will move, and there will be a delay in the picture switching of this perspective, that is, there is a hysteresis phenomenon in motion parallax.
在三维视频通信中,本地用户要看到远程参与者的3d效果需要经过如下的步骤,如图1所示。首先在本地用户侧相机采集的画面用于眼动追踪,根据眼动追踪算法要求,采集相机可能是单个或多个,眼动追踪模块通过AI估计或几何算法获取本地用户眼部在空间中的位姿信息(位姿包括位置和方向);本方案主要是追踪眼部位姿,利用眼部位姿预测,补偿新视角生成过程中的时延,这里的眼动追踪模块的步骤只是举例一种可行的手段来实现此功能,其他实现该模块功能的方法都可以;该方案利用眼部位姿预测,补偿新视角生成过程中的时延,云服务由于增加了传输时延运动视差的迟滞现象更为明显,但该方案在即使端到端的方式下又可以有优化效果,这里只是拿云服务案例说明。In three-dimensional video communication, the local user needs to go through the following steps to see the 3D effect of the remote participant, as shown in Figure 1. First, the picture collected by the camera on the local user side is used for eye tracking. According to the requirements of the eye tracking algorithm, the collection camera may be single or multiple. The eye tracking module obtains the pose information of the local user's eyes in space (pose includes position and direction) through AI estimation or geometric algorithm; this solution mainly tracks the eye pose, uses the eye pose prediction, and compensates for the delay in the process of generating a new perspective. The steps of the eye tracking module here are just an example of a feasible means to achieve this function. Other methods to achieve the function of this module are also possible; this solution uses eye pose prediction to compensate for the delay in the process of generating a new perspective. The hysteresis of motion parallax is more obvious due to the increase of transmission delay in cloud services, but this solution can have an optimization effect even in an end-to-end manner. Here we only take the cloud service case for illustration.
同时远程参会者侧的相机采集画面用于三维重建和渲染,根据重建算法要求,采集相机可能是单个或多个甚至是RGB-D相机。采集设备包含但不限于相机,能获取相关信息的传感器都可以。不限于RGB-D相机,根据三维重建或新视角生成的算法需求,RGB相机也可以,或者其他获取算法所需数据的传感器都可以。At the same time, the camera on the remote participant's side collects images for 3D reconstruction and rendering. Depending on the requirements of the reconstruction algorithm, the collection camera may be a single or multiple cameras or even an RGB-D camera. The collection device includes but is not limited to cameras, and any sensor that can obtain relevant information is acceptable. It is not limited to RGB-D cameras. Depending on the algorithm requirements for 3D reconstruction or new perspective generation, RGB cameras can also be used, or other sensors that obtain the data required by the algorithm can be used.
然后,由于端侧的算力可能有限,可以采用云服务的方式来充分利用算力,本地用户眼部的空间位姿信息和远程参与者的画面数据需要传输至云主机,三维重建渲染模块根据上述的眼部位姿和画面信息进行新视角图像的生成。最后将新视角图像传输给本地用户完成。Then, since the computing power on the client side may be limited, cloud services can be used to fully utilize the computing power. The spatial pose information of the local user's eyes and the image data of the remote participant need to be transmitted to the cloud host. The 3D reconstruction rendering module generates a new perspective image based on the above eye pose and image information. Finally, the new perspective image is transmitted to the local user.
其中眼动追踪模块可以使用如下方案,初始化双相机位置,计算双相机之间的相对位置关系,根据相机拍摄画面使用相机标定算法求取相机内外参,或结合深度传感器在三维空间求解或使用人工智能算法估计相对位置。对双相机拍摄的画面分别进行人脸识别以及人脸关键点检测,提取相应的面部特征点,求得画面中各特征点的像素坐标以及两个相机中特征点的对应关系。根据双相机之间的相对关系与画面中的特征点,确定唯一的空间三角形利用几何求解法,计算眼部在三维空间中的位置坐标。结合面部特征和头部位姿计算双目的视线方向,得到眼部三维空间的位姿。The eye tracking module can use the following scheme to initialize the position of the two cameras, calculate the relative position relationship between the two cameras, use the camera calibration algorithm to obtain the internal and external parameters of the camera according to the camera shooting picture, or combine the depth sensor to solve in three-dimensional space or use artificial intelligence algorithm to estimate the relative position. Perform face recognition and face key point detection on the pictures taken by the two cameras, extract the corresponding facial feature points, and obtain the pixel coordinates of each feature point in the picture and the corresponding relationship between the feature points in the two cameras. According to the relative relationship between the two cameras and the feature points in the picture, determine the unique spatial triangle and use the geometric solution method to calculate the position coordinates of the eyes in three-dimensional space. Combine facial features and head posture to calculate the line of sight direction of the two eyes and obtain the position and posture of the eyes in three-dimensional space.
但是上述方案过程中不可避免的时延,会影响三维视频通信中实时的运动视差效果。当本地用户在t1时刻移动,本地用户的相机捕捉到本地用户眼部的移动,经过数据采集、数据传输、重建渲染计算和画面回传的时延,此时看到画面时已到了t2时刻,也就是眼部移动(t2-t1)时间后才能看到当时视角的画面,这就是运动视差存在迟滞现象的原因。However, the inevitable delay in the above solution will affect the real-time motion parallax effect in 3D video communication. When the local user moves at time t1, the local user's camera captures the movement of the local user's eyes. After the delay of data collection, data transmission, reconstruction and rendering calculation, and image return, the image is seen at time t2, that is, the image can only be seen after the eye moves (t2-t1) time. This is the reason for the hysteresis of motion parallax.
为了解决三维视频通信中运动视差的灵敏度问题,增加眼部位姿预测模块,通过预测未来时间的眼部轨迹和未来某时刻的眼部位姿,来补偿采集、传输、计算时延,从根本解决运动视差的迟滞现象,具体步骤如图2所示。In order to solve the problem of motion parallax sensitivity in three-dimensional video communication, an eye posture prediction module is added to compensate for the acquisition, transmission, and calculation delays by predicting the eye trajectory in the future and the eye posture at a certain moment in the future, thereby fundamentally solving the hysteresis phenomenon of motion parallax. The specific steps are shown in Figure 2.
首先和之前步骤一样,本地用户侧相机采集画面,眼动追踪模块获取本地用户眼部在空间中的位姿信息。然后,增加眼部位姿预测模块,通过时间序列预测算法或者状态估计算法对眼部运动的轨迹和位置进行估计,将预测的未来时间戳和对应的位姿信息发送给重建和渲染模块,为解决预测算法的精度问题,可以预测多个未来的眼部位姿。这里眼动追踪与预测的数据传输协议需要自定义,主要包含时间戳和位姿信息,格式如下:First, as in the previous steps, the local user-side camera captures the image, and the eye tracking module obtains the local user's eye position information in space. Then, an eye position prediction module is added to estimate the trajectory and position of the eye movement through a time series prediction algorithm or a state estimation algorithm, and the predicted future timestamp and corresponding position information are sent to the reconstruction and rendering module. In order to solve the accuracy problem of the prediction algorithm, multiple future eye positions can be predicted. Here, the data transmission protocol for eye tracking and prediction needs to be customized, mainly including timestamps and position information, and the format is as follows:
{(ts_curr,pos_curr),{(ts_curr,pos_curr),
(ts_pred1,pos_pred1),(ts_pred1,pos_pred1),
(ts_pred2,pos_pred 2),…}(ts_pred2,pos_pred 2),…}
其中,ts_curr:当前的时间戳;pos_curr:当前的眼部位姿;ts_pred:未来的时间戳(编号表示预测多个);pos_pred:预测的眼部位姿(编号表示预测多个)。Among them, ts_curr: current timestamp; pos_curr: current eye pose; ts_pred: future timestamp (number indicates multiple predictions); pos_pred: predicted eye pose (number indicates multiple predictions).
这样,重建和渲染模块会收到多个预测的眼部位姿,生成多个新视角画面,将画面传输回本地用户,传输协议需要需要自定义,主要包含时间戳、位姿信息、帧图像信息,格式如下:In this way, the reconstruction and rendering module will receive multiple predicted eye poses, generate multiple new perspective images, and transmit the images back to the local user. The transmission protocol needs to be customized, mainly including timestamp, pose information, frame image information, and the format is as follows:
{(ts_curr,pos_curr,frame_curr),{(ts_curr,pos_curr,frame_curr),
(ts_pred1,pos_pred1,frame_pred1),(ts_pred1,pos_pred1,frame_pred1),
(ts_pred2,pos_pred 2,frame_pred2),…}(ts_pred2,pos_pred 2,frame_pred2),…}
其中,frame表示在不同时刻下根据不同的眼部位姿生成的新视角的画面帧。将这些数据再传输回本地用户,此时时间已经到了t2时刻,从上面集合中选取t2时刻眼部位姿与预测眼部位姿最相近的画面用于展示。Among them, frame represents the frame of the new perspective generated according to different eye poses at different times. These data are then transmitted back to the local user. At this time, the time has reached time t2. From the above set, the picture with the eye pose at time t2 that is closest to the predicted eye pose is selected for display.
下面以预测一个时刻的位姿为例,对该方案实现的高灵敏运动视差的整个过程进行描述:The following describes the whole process of high-sensitivity motion parallax achieved by this scheme, taking the prediction of a moment's posture as an example:
本地用户处于运动状态,t1时刻开始计时,相机拍摄画面的时刻记作ts_curr,通过眼动追踪算法计算该画面对应的眼部位姿记作pos_curr,经过眼部位置预测模块预测未来ts_pred时刻的眼部位姿为pos_pred,传输时间戳和眼部位姿数据,重建与渲染模块生成pos_pred对应的新视角画面,将预测的画面传输给本地用户观看,此时已经到了t2时刻,由于眼部位置预测模块的结果pos_pred正是t2时刻对应视角的画面,(t2-ts_pred)近似为0,因此通过预测来补偿采集、传输、计算时延,运动视差迟滞现象问题得以解决。The local user is in motion, and the timing starts at time t1. The moment when the camera takes the picture is recorded as ts_curr. The eye pose corresponding to the picture is calculated by the eye tracking algorithm and recorded as pos_curr. The eye position prediction module predicts the eye pose at the future time ts_pred as pos_pred. The timestamp and eye pose data are transmitted. The reconstruction and rendering module generates a new perspective picture corresponding to pos_pred, and the predicted picture is transmitted to the local user for viewing. At this time, it is time t2. Since the result pos_pred of the eye position prediction module is exactly the picture of the perspective corresponding to time t2, (t2-ts_pred) is approximately 0. Therefore, the acquisition, transmission, and calculation delays are compensated by prediction, and the problem of motion parallax hysteresis is solved.
综上所述,本发明旨在实现高灵敏运动视差的三维视频通信。通过跟踪眼部位姿并预测运动轨迹和未来位姿,三维重建和渲染未来视角的画面,以补偿采集、传输和计算时延,用户能够更准确地感知三维通信中的立体效果,提供身临其境的观看体验。In summary, the present invention aims to realize three-dimensional video communication with high-sensitivity motion parallax. By tracking eye posture and predicting motion trajectory and future posture, three-dimensional reconstruction and rendering of future perspective images are performed to compensate for acquisition, transmission and calculation delays, so that users can more accurately perceive the stereoscopic effect in three-dimensional communication, providing an immersive viewing experience.
1.该方法对对眼部位姿进行跟踪和预测,信息传输给三维重建渲染模块,生成未来视角下的画面,实现高质量的三维视频通信,提供更加沉浸和逼真的体验。1. This method tracks and predicts the eye posture, transmits the information to the 3D reconstruction rendering module, generates images from the future perspective, realizes high-quality 3D video communication, and provides a more immersive and realistic experience.
2.该方法在处理过程中考虑采集、传输、计算时延,通过对眼部运动进行预测,达到对时延补偿的效果,从根本解决运动视差的迟滞问题,提高用户体验。2. This method takes into account the acquisition, transmission, and calculation delays during the processing process, and achieves the effect of delay compensation by predicting eye movement, fundamentally solving the hysteresis problem of motion parallax and improving user experience.
3.该方法并不需要增加传感器设备、增加带宽,可移植性强,适用于云计算场景,充分利用云端算力,是一种通用的降低时延的方法。3. This method does not require the addition of sensor equipment or bandwidth, has strong portability, is suitable for cloud computing scenarios, and makes full use of cloud computing power. It is a general method to reduce latency.
本领域普通技术人员可以理解,以上所述仅为发明的优选实例而已,并不用于限制发明,尽管参照前述实例对发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实例记载的技术方案进行修改,或者对其中部分技术特征进行等同替换。凡在发明的精神和原则之内,所做的修改、等同替换等均应包含在发明的保护范围之内本实施例中的所有技术特征均可根据实际需要而进行自由组合。Those skilled in the art can understand that the above are only preferred examples of the invention and are not intended to limit the invention. Although the invention is described in detail with reference to the above examples, those skilled in the art can still modify the technical solutions recorded in the above examples or replace some of the technical features with equivalents. Any modification, equivalent replacement, etc. made within the spirit and principle of the invention should be included in the protection scope of the invention. All technical features in this embodiment can be freely combined according to actual needs.
最后应说明的是:以上所述仅为本发明的优选实施例而已,并不用于限制本发明,尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Finally, it should be noted that the above is only a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the aforementioned embodiments, it is still possible for those skilled in the art to modify the technical solutions described in the aforementioned embodiments or to make equivalent substitutions for some of the technical features therein. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.
Claims (10)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410526045.6A CN118354056A (en) | 2024-04-29 | 2024-04-29 | A three-dimensional video communication method with high sensitivity to motion parallax |
| PCT/CN2024/136678 WO2025227710A1 (en) | 2024-04-29 | 2024-12-04 | Three-dimensional video communication method having highly-sensitive motion parallax |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410526045.6A CN118354056A (en) | 2024-04-29 | 2024-04-29 | A three-dimensional video communication method with high sensitivity to motion parallax |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118354056A true CN118354056A (en) | 2024-07-16 |
Family
ID=91821022
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410526045.6A Pending CN118354056A (en) | 2024-04-29 | 2024-04-29 | A three-dimensional video communication method with high sensitivity to motion parallax |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN118354056A (en) |
| WO (1) | WO2025227710A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025227710A1 (en) * | 2024-04-29 | 2025-11-06 | 天翼云科技有限公司 | Three-dimensional video communication method having highly-sensitive motion parallax |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH1020998A (en) * | 1996-06-28 | 1998-01-23 | Osaka Kagaku Gijutsu Center | Positioning device |
| CN105263050A (en) * | 2015-11-04 | 2016-01-20 | 山东大学 | Mobile terminal real-time rendering system and method based on cloud platform |
| CN106814853A (en) * | 2016-12-15 | 2017-06-09 | 上海眼控科技股份有限公司 | A kind of eye control tracking based on machine learning |
| CN114268784A (en) * | 2021-12-31 | 2022-04-01 | 东莞仲天电子科技有限公司 | Method for improving near-to-eye display experience effect of AR (augmented reality) equipment |
| US20220174257A1 (en) * | 2020-12-02 | 2022-06-02 | Facebook Technologies, Llc | Videotelephony with parallax effect |
| CN116166121A (en) * | 2023-01-28 | 2023-05-26 | 深圳锐视智芯科技有限公司 | Eyeball tracking method, device, equipment and medium based on binocular stereoscopic vision |
| CN117456113A (en) * | 2023-12-26 | 2024-01-26 | 山东山大华天软件有限公司 | Cloud offline rendering interactive application implementation method and system |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9978180B2 (en) * | 2016-01-25 | 2018-05-22 | Microsoft Technology Licensing, Llc | Frame projection for augmented reality environments |
| CN107333121B (en) * | 2017-06-27 | 2019-02-26 | 山东大学 | Immersive stereoscopic rendering projection system and method for moving viewpoint on curved screen |
| US11410331B2 (en) * | 2019-10-03 | 2022-08-09 | Facebook Technologies, Llc | Systems and methods for video communication using a virtual camera |
| CN115314696B (en) * | 2021-05-08 | 2024-07-16 | 中国移动通信有限公司研究院 | Image information processing method and device, server and terminal |
| CN114040184B (en) * | 2021-11-26 | 2024-07-16 | 京东方科技集团股份有限公司 | Image display method, system, storage medium and computer program product |
| CN118354056A (en) * | 2024-04-29 | 2024-07-16 | 天翼云科技有限公司 | A three-dimensional video communication method with high sensitivity to motion parallax |
-
2024
- 2024-04-29 CN CN202410526045.6A patent/CN118354056A/en active Pending
- 2024-12-04 WO PCT/CN2024/136678 patent/WO2025227710A1/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH1020998A (en) * | 1996-06-28 | 1998-01-23 | Osaka Kagaku Gijutsu Center | Positioning device |
| CN105263050A (en) * | 2015-11-04 | 2016-01-20 | 山东大学 | Mobile terminal real-time rendering system and method based on cloud platform |
| CN106814853A (en) * | 2016-12-15 | 2017-06-09 | 上海眼控科技股份有限公司 | A kind of eye control tracking based on machine learning |
| US20220174257A1 (en) * | 2020-12-02 | 2022-06-02 | Facebook Technologies, Llc | Videotelephony with parallax effect |
| CN114268784A (en) * | 2021-12-31 | 2022-04-01 | 东莞仲天电子科技有限公司 | Method for improving near-to-eye display experience effect of AR (augmented reality) equipment |
| CN116166121A (en) * | 2023-01-28 | 2023-05-26 | 深圳锐视智芯科技有限公司 | Eyeball tracking method, device, equipment and medium based on binocular stereoscopic vision |
| CN117456113A (en) * | 2023-12-26 | 2024-01-26 | 山东山大华天软件有限公司 | Cloud offline rendering interactive application implementation method and system |
Non-Patent Citations (2)
| Title |
|---|
| CHA ZHANG: "Improving Depth Perception with Motion Parallax and Its Application in Teleconferencing", IEEE MMSP 2009, 23 October 2009 (2009-10-23) * |
| TROJE NIKOLAUS F: "Depth from motion parallax: Deictic consistency, eye contact, and a serious problem with Zoom.", JOURNAL OF VISION, 1 September 2023 (2023-09-01) * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025227710A1 (en) * | 2024-04-29 | 2025-11-06 | 天翼云科技有限公司 | Three-dimensional video communication method having highly-sensitive motion parallax |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025227710A1 (en) | 2025-11-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7566092B2 (en) | 3D model transmission method, 3D model reception method, 3D model transmission device, and 3D model reception device | |
| EP3899870B1 (en) | Cloud-based camera calibration | |
| US11924397B2 (en) | Generation and distribution of immersive media content from streams captured via distributed mobile devices | |
| US9774896B2 (en) | Network synchronized camera settings | |
| EP2406951B1 (en) | System and method for providing three dimensional imaging in a network environment | |
| JP2023083574A (en) | Receiving method, terminal, and program | |
| CN101651841B (en) | Method, system and equipment for realizing stereo video communication | |
| Uddin et al. | Unsupervised deep event stereo for depth estimation | |
| CN113272863A (en) | Depth prediction based on dual pixel images | |
| CN114040184B (en) | Image display method, system, storage medium and computer program product | |
| CN111385481A (en) | Image processing method and device, electronic device and storage medium | |
| CN118354056A (en) | A three-dimensional video communication method with high sensitivity to motion parallax | |
| WO2025227710A9 (en) | Three-dimensional video communication method having highly-sensitive motion parallax | |
| CN117784933A (en) | Multi-person AR interaction method based on cloud rendering | |
| CN115174941B (en) | Real-time motion performance analysis and real-time data sharing method based on multiple paths of video streams | |
| CN114979564B (en) | Video shooting method, electronic equipment, device, system and medium | |
| US20250329123A1 (en) | Display system for displaying mixed reality space image and processing method for use of display system | |
| US20240015264A1 (en) | System for broadcasting volumetric videoconferences in 3d animated virtual environment with audio information, and procedure for operating said device | |
| CN113515193B (en) | Model data transmission method and device | |
| Pan et al. | 5g mobile edge assisted metaverse light field video system: Prototype design and empirical evaluation | |
| CN113473172B (en) | VR video caching method and device, caching service device and storage medium | |
| Pan et al. | Mobile edge assisted multi-view light field video system: Prototype design and empirical evaluation | |
| KR101273634B1 (en) | Tracking Method of Multiple Objects using Mobile Device in Augumented Reality Environment and System Using the same | |
| KR101788005B1 (en) | Method for generating multi-view image by using a plurality of mobile terminal | |
| CN111193858A (en) | Method and system for shooting and displaying augmented reality |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |