WO2022266556A1

WO2022266556A1 - Methods and systems for motion prediction

Info

Publication number: WO2022266556A1
Application number: PCT/US2022/039856
Authority: WO
Inventors: Shengqi FENG
Original assignee: Innopeak Technology Inc
Current assignee: Innopeak Technology Inc
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-12-22
Anticipated expiration: 2025-02-09

Abstract

The present invention is directed to extended reality systems and methods. In an exemplary embodiment, at least two location points of an object corresponding to two separate timestamps are captured by a sensor and utilized to predict a location point of the object at a third timestamp, which accounts for both the translation and rotation movements of the sensor. The predicted location point may be used to determine an insertion location for rendering an overlaying image to reduce the end-to-end latency of the system. There are other embodiments as well.

Description

METHODS AND SYSTEMS FOR MOTION PREDICTION

BACKGROUND OF THE INVENTION

[0001] The present invention is directed to extended reality systems and methods.

[0002] Over the last decade, extended reality (XR) devices — -including both augmented reality (AR) devices and virtual reality (VR) devices — have become increasingly popular. Important design considerations and challenges for XR devices include performance, cost, and power consumption. Various XR-related applications utilize motion prediction techniques. Over the past, existing motion prediction techniques used in XR devices have been inadequate for reasons further explained below.

[0003] It is desired to have new and improved XR systems and methods thereof.

BRIEF SUMMARY OF THE INVENTION

[0004] The present invention is directed to extended reality systems and methods. In an exemplary embodiment, at least two location points of an object corresponding to two separate timestamps are captured by a sensor and utilized to predict a location point of the object at a third timestamp, which accounts for both the translation and rotation movements of the sensor. The predicted location point may be used to determine an insertion location for rendering an overlaying image to reduce the end-to-end latency of the system. There are other embodiments as well.

[0005] A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by the data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method for motion prediction. The method includes capturing a first data set for an object at a first time using a sensor, the first data set may include at least a first set of three positions and a first set of three rotations. The method also includes calculating a first adjusted point using at least the first data set. The method also includes capturing a second data set for the object at a second time using the sensor, the second data set may include at least a second set of three positions and a second set of three rotations. The second time is later than the first time. The method also includes calculating a second adjusted point using at least the second data set. The method also includes calculating a position data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The method also includes calculating a rotation data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The method also includes calculating a velocity data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The method also includes providing a predicted point for a third time using at least the position data and the rotation data. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0006] Implementations may include one or more of the following features. In some embodiments, the third time is later than the second time. In other embodiments, the third time is after the first time and before the second time. The rotation data and the position data are obtained using a cubic Hermite spline model. The method may include solving a position equation and a rotation equation based on the cubic Hermite spline model. The method may include identifying the object. A difference between the first data set and the second data set is attributed to a movement of the sensor. The movement may be non-linear. The movement may be characterized by a non-zero acceleration. In some implementations, the method may include: capturing a first image at the first time, the object being positioned in the first image; generating an overlaying image; determining an insertion location for the overlaying image based on the predicted point; capturing a second image at the third time; placing the overlaying image at the insertion location of the second image. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer- accessible medium.

[0007] One general aspect is directed to an extended reality device, which includes a housing having a front side and a rear side. The device also includes a sensor positioned on the front side, the sensor is configured to determine a first location point of an object at a first time and a second location point of the object at a second time. The device also includes a display configured on the rear side of the housing. The device also includes a memory coupled to the sensor and configured to store the first location point and the second location point The device also includes a processor coupled to the memory. In some embodiments, the processor is configured to: calculate a first adjusted point based on the first location point; calculate a second adjusted point based on the second location point; and calculate a predicted location point using the first location point, the first adjusted point, the second location point, and the second adjusted point. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0008] Implementations may include one or more of the following features. According to some embodiments, the sensor may include a camera. The sensor may include a lidar. The processor may include a central processing unit and a neural processing unit. The display may be configured to display a generated image at the predicted location. The device may include a camera configured to capture a first image, a display configured to display the first image and a generated image at the predicted location of the first image. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer- accessible medium.

[0009] Another general aspect is directed to a method for motion prediction. The method includes capturing a first data set for an object at a first time, the first data set may include at least a first set of three positions and a first set of three rotations. The method also includes storing the first data set at a memory. The method also includes calculating a first adjusted point using at least the first data set. The method also includes capturing a second data set for the object at a second time, the second data set may include at least a second set of three positions and a second set of three rotations. The method also includes storing the second data set at the memory. The method also includes calculating a second adjusted point using at least the second data set. The method also includes calculating a position data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The method also includes calculating a rotation data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The method also includes calculating a velocity data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. The velocity data may include a linear velocity and an angular velocity. For example, the linear velocity may be measured in distance change over a predetermined time interval. The angular velocity may be measured in an angular change over the predetermined time interval. The method also includes providing a predicted point for a third time using at least the position data and the rotation data. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0010] Implementations may include one or more of the following features. The method may include calculating a three-dimensional linear velocity for position. The method may include calculating a three-dimensional angular velocity for rotation. A time interval between the first time and the third time is less than about 20ms. When the time interval is greater than 20ms, the accuracy of prediction may be unacceptable, which leads to non-optimal user experience. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.

[0011] It is to be appreciated that embodiments of the present invention provide many advantages over conventional techniques. Among other things, motion detection techniques according to embodiments of the present invention are highly accurate, especially in detecting motions associated with human head movements that are non-linear. Additionally, motion detection techniques according to embodiments of the present invention can be performed at high frame rates and satisfy various performance requirements of XR devices.

[0012] Embodiments of the present invention can be implemented in conjunction with existing systems and processes. For example, motion detection techniques according to the present invention can be used in a wide variety of XR systems and other devices.

Additionally, various techniques according to the present invention can be adopted into existing XR systems via software or firmware update. There are other benefits as well. [0013] The present invention achieves these benefits and others in the context of known technology. However, a further understanding of the nature and advantages of the present invention may be realized by reference to the latter portions of the specification and attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Figure 1 is a simplified diagram illustrating extended reality (XR) apparatus 115n according to embodiments of the present invention.

[0015] Figure 2 is a simplified block diagram illustrating components of extended reality apparatus 115n according to embodiments of the present invention.

[0016] Figure 3 is a simplified diagram illustrating location mismatch in an AR environment.

[0017] Figure 4 is a simplified timing diagram illustrating location point capturing and prediction according to embodiments of the present invention.

[0018] Figure 5 A is a simplified diagram illustrating location prediction based on two points on a non-linear path according to embodiments of the present invention.

[0019] Figure 5B is a simplified diagram illustrating location prediction based on two points on a rotational path according to embodiments of the present invention.

[0020] Figure 6 is a simplified flow diagram illustrating a method for predicting a location point in an XR environment according to embodiments of the present invention. DETAILED DESCRIPTION OF THE INVENTION

[0021] The present invention is directed to extended reality systems and methods. In an exemplary embodiment, at least two location points of an object corresponding to two separate timestamps are captured by a sensor and utilized to predict a location point of the object at a third timestamp, which accounts for both the translation and rotation movements of the sensor. The predicted location point may be used to determine an insertion location for rendering an overlaying image to reduce the end-to-end latency of the system. There are other embodiments as well.

[0022] With the advent of virtual reality and augmented reality applications, XR devices (e.g., AR-glasses, head-mounted displays, etc.) that enable immersive AR/VR experiences are becoming more and more popular. The ability to accurately provide a representation of the scene relative to the user's current perspective (e.g., position and/or orientation) in real-time (i.e., with low system latency) promises exciting new applications in immersive virtual and augmented realities. There has been great progress in recent years, especially with the arrival of deep learning technology. However, it remains a challenging task due to various reasons such as computational complexity, unconstrained user movement, and ultra-low latency requirements.

[0023] For example, when a user wears an AR-glasses that operates with six-degrees-of- freedom (6DOF) to view an augmented visual representation of the ambient environment, a motion-to-photon latency may occur, which corresponds to an elapsed time between the initiation of the image rendering and the output of the rendered image. The user's movement during this period may result in dissonance between the perceived perspective — based on historical head pose data — -and the actual perspective of the moment. Such dissonance may cause user discomfort or sickness, even losing the sense of physical presence in the virtual world. Conventional techniques typically employ linear fitting algorithms to predict the user's movement for image rendering, which act poorly in real-world use cases that often contain direction change, speed change, and/or natural vibration, and/or the like. Hence, there is a need for more robust and scalable solutions for motion prediction.

[0024] Embodiments of the present invention provide a motion prediction system for the XR apparatus. The system utilizes historical pose information and raw sensor data to realize 6DOF motion prediction that works robustly in various conditions (e.g., non-linear movement, non- zero acceleration, and/or the like). Implemented with simultaneous localization and mapping (SLAM) modules, embodiments of the present invention provide a complete system-wide solution, which may involve features such as real-time on-edge devices (e.g., mobile phones, embedding devices), to simultaneously track the pose and motion of the user's movement and enable predictive image rendering for an improved immersive experience.

[0025] The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it into the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

[0026] In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block, diagram form, rather than in detail, in order to avoid obscuring the present invention.

[0027] The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

[0028] Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of’ or “act of’ in the Claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.

[0029] Please note, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counterclockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object.

[0030] Figure 1 is a simplified diagram (top view) illustrating extended reality apparatus 115n according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. It is to be understood that the term “extended reality” (XR) is broadly defined, which includes virtual reality (VR), augmented reality (AR), and/or other similar technologies. For example, XR apparatus 115 as shown can be configured as VR, AR, or others. Depending on the specific implementation, ER apparatus 115 may include small housing for AR applications or relatively larger housing for VR applications. Cameras 180A and 180B are configured on the front side of apparatus 115. For example, cameras 180A and 180B are respectively mounted on the left and right sides of the ER apparatus 115. In various applications, additional cameras may be configured below cameras 180 A and 180B to provide an additional field of view and range estimation accuracy. For example, cameras 180A and 180B both include ultrawide angle or fisheye lenses that offer large fields of view, which offer wider coverage of the scene for enhanced immersion.

[0031] In addition to the cameras, one or more head-mounted display (HMD) tracking sensors (described in further detail below) may be mounted on the XR apparatus 115 to track the motion (e.g., position and/or rotation) of the user's head. Display 185 is configured on the backside of ER apparatus 115. For example, display 185 may be a semitransparent display that overlays information on an optical lens in AR applications. In VR implementations, display 185 may include a non-transparent display.

[0032] Figure 2 is a simplified block diagram illustrating components of extended reality apparatus 115 according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. In some embodiments, an XR headset (e.g., AR headset 115n as shown, or the like) might include, without limitation, at least one of a processor 150, a memory 155, a communication interface 160, light source(s) 165, a SLAM module 170, HMD tracking sensor(s) 175, camera(s) 180, a display 185, and/or peripheral devices 190, and/or the like.

[0033] In some instances, the processor 150 might communicatively be coupled (e.g., via a bus, via wired connectors, or via electrical pathways (e.g., traces and/or pads, etc. ) of printed circuit boards ("PCBs") or integrated circuits ("ICs"), and/or the like) to each of one or more of the memory 155, communication interface 160, light source(s) 165, SLAM module 170, HMD tracking sensor(s) 175, camera(s) 180, display 185, and/or peripheral devices 190, and/or the like. In various implementations, camera(s) 180 may be configured to capture one or more images of the scene relative to the user's 120n position and orientation. For example, a pair of stereo cameras are provided to capture a stereoscopic representation of the ambient environment for image processing. In other embodiments, a monocular camera is provided to capture monocular images of the scene. [0034] In various embodiments, HMD tracking sensor(s) 175 is configured to track the motion of the HMD device and capture one or more data sets associated with the position and/or rotation of the user's head. For example, HMD tracking sensor(s) 175 is positioned on the front side of the XR apparatus 115 and is configured to determine a first location point of an object and a first time and a second location point of the object at a second time, which will be described in further detail below. HMD tracking sensor(s) may comprise an inertial management unit (IMU) including, without limitation, a gyroscope, a magnetometer, and/or an accelerometer, and/or the like. In some cases, HMD tracking sensor(s) further includes one or more infrared sensors and/or optical sensors (e.g., a lidar), which facilitate 6DOF tracking by tracking specific points on the HMD device in 3D space.

[0035] In various embodiments, memory 155 is coupled to the HMD tracking sensor(s) 175 and includes dynamic random-access memory (DRAM) and/or non-volatile memory. For example, position and rotation data captured by HMD tracking sensor(s) 175 may be temporarily stored at the DRAM for processing, and executable instructions (e.g., linear and/or non-linear fitting algorithms) may be stored at the non-volatile memory. In various embodiments, memory 155 may be implemented as a part of the processor 150 in a system-on- chip (SoC) arrangement.

[0036] In various embodiments, processor 150 includes different types of processing units, such as central processing unit (CPU) 151 and neural processing unit (NPU) 152. Processor 150 may additionally include a graphic processing unit (GPU) for efficient image processing. Different types of processing units are optimized for different types of computations. For example, CPU 151 handles various types of system functions, such as managing cameras 180 and HMD tracking sensor(s) 175 and moving raw sensor data to memory 155. NPU 152 is optimized for convolutional neural networks and predictive models. In certain embodiments, NPU 152 is specifically to perform ER-related calculations, such as head motion tracking and detection, position calculation, rotation calculation, velocity calculation, acceleration calculation, and/or others. In various implementations, SLAM module 170 is communicatively coupled to processor 150 and HMD tracking sensor(s) 175. SLAM module 170 is configured to process raw sensor data captured by HMD tracking sensor(s) 175 and generate internal state information (e.g., position data, rotation data, velocity data, and/or the like) corresponding to each sensor timestamp to assist processor 150 in executing a sequence of instructions (e.g., head motion prediction algorithms).

[0037] In AR applications, the field of view of each camera 180 overlaps with a field of view of an eye of the user 120n. The display 185 may be used to display or project the generated image overlays (and/or to display a composite image or video that combines the generated image overlays superimposed over images or video of the actual area). The communication interface 160 provides wired or wireless communication with other devices and/or networks. For example, communication interface 160 may be connected to a computer for tether operations, where the computer provides the processing power needed for graphic-intensive applications.

[0038] In various implementations, XR apparatus 115n further includes one or more peripheral devices 190 configured to improve user interaction in various aspects. For example, peripheral devices 190 may include, without limitation, at least one of speaker(s) or earpiece(s), eye-tracking sensor(s), audio sensor(s) or microphone(s), noise sensors, touch screen(s), keyboard, mouse, and/or other input/output devices.

[0039] The camera(s) 180 include their respective lenses and sensors used to capture images or video of an area in front of the ER apparatus 115. For example, front cameras 180 include cameras 180A and 180B as shown in Figure 1B, and they are configured respectively on the left and right sides of the housing. In various implementations, the sensors of the front cameras may be low-resolution monochrome sensors, which are not only energy-efficient (without color filter and color processing thereof), but also relatively inexpensive, both in terms of device size and cost. Other embodiments of this system include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0040] Figure 3 is a simplified diagram illustrating location mismatch in an AR environment. This diagram is merely an example, which should not unduly limit the scope of the claims.

One of ordinary skill in the art would recognize many variations, alternatives, and modifications. As shown, when a user wears an AR-glasses 305 to view an object 310 on a display screen 315, there may be a system latency due to image processing (e.g., 20ms) from the moment when the image is captured by AR-glasses 305 till the rendered image (e.g., image that is generated and/or rendered to be superimposed on the real- world object) is output to the display screen 315. For example, a first image of object 310 is captured at a first timestamp w'hen a first data set (including the position and rotation information of the AR-glasses 305) is detected by a sensor (e.g., HMD tracking sensor 175 in Figure 2). The processor (e.g., processor 150 in Figure 2) performs to render a copy of the first image based on at least the first data set and output the rendered image to the display screen 315 at a first location 320 at a second timestamp (e.g., 20ms later). In some cases, the time interval between the first timestamp and the second timestamp being less than or equal to 20ms is advantageous to ensure the accuracy of prediction, thereby allowing for enhanced user experience. However, by the second timestamp, the AR-glasses 305 has moved downward, which leads to a mismatch between the rendered location 320 of the first image and the “actual” location 325 of the image relative to the new head pose. Such location mismatch caused by the user's movement may result in user poorly aligned image, discomfort, disorientation, and even losing the sense of presence in the virtual world. As such, it is desirable to minimize such mismatch by predicting the new location for the rendered image according to the user's head motion for improved immersive experience.

[0041] Figure 4 is a simplified timing diagram illustrating location point capturing and prediction according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. In various implementations, a sensor (e.g., HMD tracking sensor(s) 175 in Figure 2) captures a pose state sequence 401 associated with the position and rotation of the XR apparatus. The sensor detects the head motion and updates the pose information (e.g., position and/or rotation) at each timestamp (e.g., 410, 420, 440). In some cases, the historical and/or cunent pose information may be utilized to predict a pose at a past and/or future timestamp. A SLAM module (e.g., SLAM module 170 in Figure 2) may receive the raw sensor data and generate internal state information at each timestamp for further processing, which will be described in further detail below.

[0042] Figure 5 A is a simplified diagram illustrating location prediction based on two points on a non-linear path according to the embodiments of the present invention. Figure 5B is a simplified diagram illustrating location prediction based on two points on a rotational path according to the embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. As many XR apparatus usually operate with six degrees of freedom including three dimensions in position (x, y, z) and three dimensions in rotation (roll, yaw, pitch), a non-linear model may be employed to predict the head motion of the user. In various implementations, a cubic Hermite spline model is used for motion prediction.

[0043] Now referring back to Figure 4. As explained above, the sensor detects the head motion and updates the pose information at separate timestamps. For example, the sensor captures a first data set for an object (e.g., object 310 in Figure 3) at a first timestamp 410, the first data set includes a first set of three positions and a first set of three rotations. Then at the second timestamp 420, the sensor captures the second data set for the object, the second data set includes a second set of three rotations and a second set of three rotations. [0044] To predict the head motion at a future timestamp such as a third timestamp 430, a cubic Hermite model is introduced as shown in Figure 5 A. For example, a first point 501 represents the first data set associated with the head pose at the first timestamp 410. The first point 501 is associated with the six degrees of freedom pose including three dimensions in position (x, y, z) and three dimensions in rotation (roll, yaw, pitch). A first derivative 503 of the first point 501 is calculated and/or optimized by a SLAM module (e.g., SLAM module 170 in Figure 2). The first derivative 503 of the first point 501 indicates the three-dimensional linear velocity for position (v_x, v_y, v_z) and three-dimensional angular velocity for rotation (w_x, w_y, w_z) at the first timestamp 410. For example, calculations for linear velocity are illustrated in Figure 5A, and angular velocity is illustrated in Figure 5B. It is to be appreciated that by using both linear velocity and angular velocity (and their derivatives, i.e., linear acceleration and angular acceleration), future position in a user's pose can be more accurately predicted; a person rarely moves linearly at a constant speed. A first adjusted point 505 is then calculated based on the first point 501 and its first derivative 503 using, for example, a CPU.

[0045] Similarly, a second point 502 representing the second data set associated with the head pose (e.g., three-dimensional position and three-dimensional rotation) at the second timestamp 420 is captured by the sensor and processed by the SLAM module to calculate its first derivative 504. The first derivative 504 of the second point 502 indicates the three- dimensional linear velocity for position (v_v, v_y, v_z) and three-dimensional angular velocity for rotation (w_x, w_y, w_z) at the second timestamp 420. A second adjusted point 506 is then calculated based on the second point 502 and its first derivative 504.

[0046] A position data for a predicted point at the third timestamp 430 can thus be determined by solving a position equation with the first point 501, the first adjusted point 505, the second point 502, and the second adjusted point 506, as follows:

(Eqn. 1) [0047] Similarly, in a rotational case shown in Figure 5B, a non-linear model (e.g., a cubic Hermite model) may be employed to predict the head motion using at least two data sets captured at separate timestamps. In various implementations, a first rotation point 551 represents a first rotation data set associated with the head pose at the first timestamp 410 in a 3D space. For example, the first rotation point 551 indicates a first rotation state on a rotation path 580. In some cases, the rotation path 580 is associated with the rotational motion of the user's head. The first rotation point 551 is associated with the six degrees of freedom pose including three dimensions in position (x, y, z) and three dimensions in rotation (roll, yaw, pitch). A first derivative 553 of the first rotation point 551 is calculated and/or optimized by a SLAM module (e.g., SLAM module 170 in Figure 2). The first derivative 553 of the first rotation point 551 indicates the three-dimensional angular velocity for rotation (w_x, w_y, w_z) at the first timestamp 410. A first rotation curve 565 may be determined based on the first rotation state (e.g., a first angular velocity and/or a first angular acceleration) at the first timestamp 410. It is to be appreciated that the first rotation curve 565 indicates a rotational motion associated with the first rotation state relative to a first rotation axis 560. A first angular velocity at the first timestamp 410 (i.e., the first derivative 553 of the first rotation point 551) may be measured in an angular change relative to the first rotation curve 565 during a predetermined time interval (e.g., a unit time interval). In some cases, a first adjusted rotation point 555 is determined using the first rotation point 551 and its first derivative 553. [0048] The second rotation point 552 represents a second rotation data set associated with the head pose at the second timestamp 420 in the 3D space. For example, the second rotation point 552 indicates a second rotation state on the rotation path 580. The second rotation point 552 is associated with the six degrees of freedom pose including three dimensions in position (x, y, z) and three dimensions in rotation (roll, yaw, pitch). A first derivative 554 of the second rotation point 552 is calculated and/or optimized by a SLAM module (e.g., SLAM module 170 in Figure 2). The first derivative 554 of the second rotation point 552 indicates the three-dimensional angular velocity for rotation (w_x, w_y, w_z) at the second timestamp 420. A second rotation curve 575 may be determined based on the second rotation state (e.g., a second angular veloci ty and/or a second angular acceleration) at the second timestamp 420. It is to be appreciated that the second rotation curve 575 indicates a rotational motion associated with the second rotation state relative to a second rotation axis 570. A second angular velocity at the second timestamp 420 (i.e., the second derivative 554 of the second rotation point 552) may be measured in an angular change relati ve to the second rotation curve 575 during the predetermined time interval (e.g., a unit time interval). In some cases, a second adjusted rotation point 556 is determined using the second rotation point 552 and its first derivative 554. [0049] In some embodiments, a rotation data for the predicted point at the third timestamp 430 can thus be determined by solving a rotation equation with the first rotation point 551, the first adjusted rotation point 555, the second rotation point 552, and the second adjusted rotation point 556, as follows:

(Eqn. 2) [0050] As an example, Equations 1 and 2 may be solved under the following assumption such that the above equations are in accordance with the physical meaning of the user's head motion (e.g., translational motion and/or rotational motion):

(Eqn. 3)

(Eqn. 4)

[0051] For example, Equation 3 above describes the position and its derivative- — as expressed in Equation 2 — describes velocity (e.g., linear velocity and/or angular velocity). The derivative of Equation 4 may also be used to account for the accelerations(e.g., linear acceleration and/or angular acceleration).

[0052] In some cases, when applying the predetermined time interval (e.g., from t₀ to t₁) to the above Equation 1, the following position equation may be obtained:

(Eqn. 5) where,

[0053] In some embodiments, when applying the predetermined time interval (e.g., from t₀ to t₁) to the above Equation 2, the following rotation equation may be obtained:

(Eqn. 6) v/here,

[0054] As an example, if the pose information at the first timestamp 410 and a fourth timestamp 440 is known, the position data and the rotation data at the third timestamp 430 — which is after the first timestamp and before the fourth timestamp — may be calculated using four points on a cubic Hennite spline. In various embodiments, velocity data include both linear velocity (as measured in distance change over a unit tune interval) and angular velocity (as measured in angle change — or rotation — over a unit time interval). In some cases, a position data and the rotation data at the fourth timestamp 440 may be calculated using the data collected at the third timestamp 430 and the first timestamp 410 or the second timestamp. It is to be appreciated that the more data (e.g., data collected at previous timestamps) is included, the more accurate is the motion prediction.

[0055] It is to be understood that embodiments of the present invention are not limited to the Hermite spline, which is one of the models for motion prediction. Depending on the implementation, other models or mathematical formulae may be used to describe the motion and acceleration of a user's pose.

[0056] The predicted location at the third timestamp 430 can then be determined using the position data and the rotation data. As an example, timestamps 430 and 420 may be merged, as a pose at timestamp 430 (440) at timestamp 420(430) can be predicted using a first data point 410 and a second data point 420. It is to be appreciated that taking the 6DOF motion — both linear and angular — into account provides a more robust solution for motion prediction (e.g., either at a future timestamp or a past timestamp), which effectively reduces the motion- to-photon latency and achieves high tracking stability and image projection accuracy.

[0057] Figure 6 is a simplified flow diagram illustrating a method for predicting a location point in an XR environment according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, one or more steps may be added, removed, repeated, rearranged, modified, overlapped, and/or replaced, and they should limit the scope of the claims.

[0058] At step 602, a first data set for an object is captured by a sensor. In some cases, the object may be identified first. For example, the first data set includes at least a first set of three positions and the first set of three rotations. The first data set may be temporarily stored at a memory for further processing. In various embodiments, both imaging and motion information are used to generate the first data set. For example, an imaging sensor (e.g., part of a camera) captures object locations, while a motion sensor (e.g., accelerometer) captures angular position of a user's pose. In some cases, the object may be identified by, for example, a processor (e.g., processor 150 in Figure 2) using object detection algorithms. In addition to the first data set, a first image containing the object may also be captured (e.g., by camera 180 in Figure 2) at the first tune.

[0059] At step 604, a first adjusted point is calculated using at least the first data set. For example, the first adjusted point may be calculated using a first point associated with the first data set and its first derivative on a cubic Hermite spline. Using the Hermite spline model as an example, points 503 and point 506 in Figure 5 are adjusted points, whose respective derivates are points 503 and 504 in Figure 5. [0060] At step 606, a second data set for the object is captured by the sensor at a second time. For example, the second data set includes at least a second set of three positions and a second set of three rotations. The second time may be later than the first time. In various implementations, a difference between the first data set and the second data set is attributed to a movement (e.g., translation and/or rotation movements) of the sensor. For example, the movement may be non-linear and may be characterized by a non-zero acceleration. As shown in Figure 5, the path of motion is a non-linear curve, which reflects the non-linear motion of human users.

[0061] At step 608, a second adjusted point is calculated using at least the second data set. For example, the second adjusted point may be calculated using a second point associated with the second data set and its first derivative on a cubic Hermite spline.

[0062] At step 610, position data is calculated using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. For example, the position data may be calculated using a cubic Hermite spline model, which is based on changes in linear velocity and angular velocity. For example, a predicted position may be calculated by solving equations based on the Hermite spline or other models.

[0063] At step 612, a rotation data is calculated using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. For example, the rotation data may be calculated using a cubic Hermite spline model, which includes least a position equation and a rotation equation.

[0064] At step 614, a velocity data is calculated using at least the first data set, the first adjusted point, the second data set, and the second adjusted point. For example, the velocity data include both linear velocity and angular velocity. Acceleration values, which can be calculated as derivatives of velocity data, may be used in position and velocity calculation as well. Tn some cases, at least steps 610, 612, and 614 may be performed in different sequences. In some embodiments, at least two of steps 610, 612, and 614 may be performed in parallel. [0065] At step 616, a predicted point for a third time is provided using at least the position data and the rotation data. In an example, the third time is later than the second time. In other embodiments, the third time is after the first time and before the second time. In various implementations, the predicted point helps to determine an insertion location for an overlaying image (e.g., virtual content) on a second image captured at the third time. The first image and the second image augmented with an overlaying image positioned at the predicted location may be displayed on a display screen (e.g., display 185 in Figure 2). For example, a time interval between the first time at the third time is less than 20ms to allow for real-time immersion. [0066] Depending on the implementation and the specific application, additional processes may be performed. For example, the XR apparatus may receive a first image captured by a camera along with head pose data captured by the sensor to determine the image of the user's surrounding relative to his/her perspective at the first time. In some cases, the pose data may be used in understanding and calibrating the user's motion. In addition to the user's position and orientation at the current timestamp, the linear and angular acceleration are taken into account to predict the user's perspective at a future (or past) timestamp, which advantageously improves the prediction accuracy in non-linear movement scenarios. The processor (e.g., processor 150 in Figure 2) may generate an overlaying image (e.g., virtual content or a copy of the real object) to place at the predicted location. The composite image (e.g., an augmented reality image) is then displayed on the display screen at a second time with the perspective substantially corresponding to the user's “actual” pose of the moment. As the user continuously moves, the XR apparatus tracks the head motion and updates the image as well as the head pose to provide real-time immersion to the user.

[0067] While the above is a full description of the specific embodiments, various modifications, alternative constructions and equivalents may be used. Therefore, the above description and illustrations should not be taken as limiting the scope of the presen t invention which is defined by the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A method for motion prediction, the method comprising: identifying an object; capturing a first data set for the object at a first time using a sensor, the first data set comprising at least a first set of three positions and a first set of three rotations; calculating a first adjusted point using at least the first data set; capturing a second data set for the object at a second time using a sensor, the second data set comprising at least a second set of three positions and a second set of three rotations, the second tune being later than the first time; calculating a second adjusted point using at least the second data set; calculating a position data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point; calculating a rotation data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point; calculating a velocity data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point; and providing a predicted point for a third time using at least the position data and the rotation data.

2. The method of claim 1 wherein the third time is later than the second time.

3. The method of claim 1 wherein the third time is after the first time and before the second time.

4. The method of claim 1 wherein the rotation data and the position data are obtained using a cubic Hermite spline model.

5. The method of claim 4 further comprising solving a position equation and a rotation equation based on the cubic Hermite spline model.

6. The method of claim 1 further comprising predicting a point at a fourth time based on the third time and at least one of the first time and the second time.

7. The method of claim 1 wherein a difference between the first data set and the second data set is attributed to a movement of the sensor.

8. The method of claim 7 wherein the movement is non-linear.

9. The method of claim 7 wherein the movement is characterized by a non-zero acceleration.

10. The method of claim 1 further comprising: capturing a first image at the first time, the object being positioned in the first image; generating an overlaying image; determining an insertion location for the overlaying image based on the predicted point; capturing a second image at the third time; pl acing the overlaying image at the insertion location of the second image.

11. An extended reality device comprising: a housing, a housing comprising a front side and a rear side; a sensor positioned on the front side, the sensor is configured to determine a first location point of an object at a first time and a second location point of the object at a second time; a display configured on the rear side of the housing; a memory coupled to the sensor and configured to store the first location point and the second location point; and a processor coupled to the memory; wherein the processor is configured to: calculate a first adjusted point based on the first location point; calculate a second adjusted point based on the second location point; and calculate a predicted location point using the first location point, the first adjusted point, the second location point, and the second adjusted point.

12. The device of claim 11 wherein the sensor comprises a camera.

13. The device of claim 11 wherein the sensor comprises a lidar.

14. The device of claim 11 wherein the processor comprises a central processing unit and a neural processing unit.

15. The device of claim 11 wherein the display configured to display a generated image at the predicted location.

16. The device of claim 11 further comprising a camera configured to capture a first image, the display being configured to display the first image and a generated image at the predication location of the first image.

17. A method for motion prediction, the method comprising: capturing a first data set for an object at a first time, the first data set comprising at least a first set of three positions and a first set of three rotations; storing the first data set at a memory; calculating a first adjusted point using at least the first data set; capturing a second data set for the object at a second time, the second data set comprising at least a second set of three positions and a second set of three rotations; storing the second data set at the memory; calculating a second adjusted point using at least the second data set; calculating a position data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point; calculating a rotation data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point; calculating a velocity data using at least the first data set, the first adjusted point, the second data set, and the second adjusted point, the velocity data including a linear velocity and an angular velocity, the linear velocity being measured in distance change over a predetermined time interval, the angular velocity being measured in an angular change over the predetermined time interval; and providing a predicted point for a third time using at least the position data and the rotation data.

18. The method of claim 17 further comprising calculating a three-dimensional linear velocity for position.

19. The method of claim 17 further comprising calculating a three-dimensional angular velocity for rotation.

20. The method of claim 17 wherein a time interval between the first time and the third time is less than 20ms.