[go: up one dir, main page]

WO2023005457A1 - Procédé et appareil de calcul de pose, dispositif électronique et support de stockage lisible - Google Patents

Procédé et appareil de calcul de pose, dispositif électronique et support de stockage lisible Download PDF

Info

Publication number
WO2023005457A1
WO2023005457A1 PCT/CN2022/098295 CN2022098295W WO2023005457A1 WO 2023005457 A1 WO2023005457 A1 WO 2023005457A1 CN 2022098295 W CN2022098295 W CN 2022098295W WO 2023005457 A1 WO2023005457 A1 WO 2023005457A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
rgb image
target
image
transformation matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/098295
Other languages
English (en)
Chinese (zh)
Inventor
尹赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of WO2023005457A1 publication Critical patent/WO2023005457A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present application relates to the technical field of computer vision, in particular to a pose calculation method and device, electronic equipment, and a readable storage medium.
  • the position and attitude of electronic devices in an unknown environment is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • Embodiments of the present application provide a pose calculation method and device, electronic equipment, and a readable storage medium, which can reduce the waiting time for real-time output pose.
  • a pose calculation method comprising:
  • the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;
  • the target RGB image, the depth image, and the next frame of RGB image of the target RGB image determine the pose of the electronic device when collecting the next frame of RGB image.
  • a pose computing device comprising:
  • the initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;
  • a pose determination module configured to determine that the electronic device acquires the next frame of RGB images according to the initial pose, the target RGB image, the depth image, and the next frame of RGB images of the target RGB image time pose.
  • An electronic device includes a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the operation of the pose calculation method as described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the operation of the pose calculation method as described above is realized.
  • Fig. 1 is an application environment diagram of a pose calculation method in an embodiment
  • Fig. 2 is the flow chart of pose computing method in an embodiment
  • Fig. 3 is the flow chart of pose calculation method in another embodiment
  • Fig. 4 is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment
  • Fig. 5 adopts preset perspective projection PnP algorithm in one embodiment, calculates the flow chart of the first pose transformation matrix method between next frame RGB image and target RGB image;
  • FIG. 6 is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment
  • Fig. 7 is the flowchart of calculating the first translation transformation matrix method between the next frame RGB image and the target RGB image in Fig. 5;
  • FIG. 8 is a flowchart of a method for constructing a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image in one embodiment
  • Fig. 9 is a flow chart of the method for calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window according to the three-dimensional map of the target in Fig. 3;
  • Fig. 10 is a schematic diagram of the pose and three-dimensional map calculation when calculating the RGB image after collecting the next frame of RGB image in one embodiment
  • Fig. 11 is a flowchart of a pose calculation method in another embodiment
  • Fig. 12 is a schematic diagram of a pose calculation method in yet another embodiment
  • Fig. 13 is a flowchart of a pose calculation method in a specific embodiment
  • Fig. 14 is a structural block diagram of a pose calculation device in an embodiment
  • Fig. 15 is a structural block diagram of a pose calculation device in an embodiment
  • Fig. 16 is a structural block diagram of a pose calculation device in another embodiment
  • Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.
  • the position and attitude of electronic devices in an unknown environment is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • VINS visual-inertial system, visual-inertial system
  • VINS-MONO visual-inertial system
  • the specific operation of the VINS-MONO algorithm includes: assuming that there are 10 frames of images in the preset initialization sliding window. Of course, the size of the preset initialization sliding window is not specifically limited in this application.
  • the first step is to collect RGB images through the camera on the electronic device. After accumulating 10 frames of RGB images in the preset initialization sliding window, two frames of images with parallax satisfying the conditions are selected from the 10 frames of RGB images (for example, L frame and R frame), and then use the epipolar geometric constraints to calculate the pose between these two frames.
  • the pose is used to restore the map points that are co-viewed between the two frames using the triangulation method.
  • the third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error .
  • the fourth step is to use the triangulation method between the arbitrary frame and the L frame and the R frame to restore the map points that are co-viewed between the arbitrary frame and the L frame and the R frame.
  • the VINS-RGBD algorithm When determining the pose of the electronic device when capturing each frame of image, the VINS-RGBD algorithm is used.
  • the specific operations of the VINS-RGBD algorithm include: assuming that there are 10 frames of images in the preset initialization sliding window, Then, in the first step, the camera on the electronic device collects the RGB image and the depth image at the same time, and it is necessary to ensure that each frame of the RGB image has a corresponding depth image.
  • the correspondence means that the RGB image is aligned with the depth image in time and space.
  • two frames of images (such as L frame and R frame) are screened out from these 10 frames of RGB images.
  • the traditional PnP algorithm is used to calculate the pose of the L frame at the same time, and then combined with the depth image corresponding to the L frame, the reprojection error meets the preset threshold to filter the effective map points, and then restore the map points.
  • the third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error .
  • the fourth step is to use the depth image of any frame and the L frame to recover the map points that can be co-viewed and the reprojection error meets the threshold requirement.
  • a pose calculation method is proposed in the embodiment of the present application. It is not necessary to collect full RGB images in the preset initialization sliding window before the electronic device can output the collected images. Preset the pose when initializing the first frame of RGB image outside the sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. Obviously, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output pose.
  • Fig. 1 is an application scene diagram of a pose calculation method in an embodiment.
  • the application environment includes an electronic device 120, and the electronic device 120 includes a first camera and a second camera.
  • the first camera is an RGB camera
  • the second camera is a camera for collecting depth images, for example, a TOF (Time-offlight, time-of-flight) camera or a structured light camera, which is not limited in this application.
  • the electronic device 120 collects the RGB image and the depth image of the current environment respectively through the first camera and the second camera. If the electronic device 120 collects the depth image of the current environment for the first time within the preset initialization sliding window, the electronic device will collect the depth image when the depth image is collected.
  • the pose of is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined.
  • the electronic device 120 can be a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device (smart bracelet, smart watch, smart glasses, smart gloves, smart socks, smart belt, etc.), VR (virtual reality , virtual reality) devices, smart homes, driverless cars and other arbitrary terminal devices.
  • Fig. 2 is a flowchart of a pose calculation method in an embodiment.
  • the pose calculation method in this embodiment is described by taking the operation on the electronic device 120 in FIG. 1 as an example, and the electronic device 120 includes a first camera and a second camera.
  • the first camera is an RGB camera
  • the second camera is a camera for collecting depth images.
  • the pose calculation method includes operation 220-operation 240, wherein,
  • Operation 220 if the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the preset initialization The last image frame within the sliding window.
  • the size of the preset initialization sliding window is equal to the duration of collecting a certain number of image frames. For example, if the default initialization sliding window is set to include 10 frames of RGB images, then after collecting 10 frames of RGB images, the 10 frames of RGB images will fill the preset initialization sliding window. At this time, the size of the preset initialization sliding window is the same as The duration of collecting 10 frames of RGB images is equal.
  • the electronic device does not need to collect full RGB images within the preset initialization sliding window before the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window.
  • the RGB image is collected through the first camera in the electronic device, and the depth image is collected through the second camera at the same time. If the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the position of the electronic device when the depth image is collected will be The pose is determined as the initial pose, and the image coordinate system of the target RGB image is used as the world coordinate system, which realizes the visual initialization of the electronic device. And there is no need to select two frames of images whose parallax meets the condition from the 10 frames of RGB images, so the adaptability is wider.
  • the acquisition frequency of the second camera to acquire the depth image may be lower than the acquisition frequency of the first camera to acquire the RGB image, it will not be able to acquire the depth image corresponding to each frame of the RGB image, that is, relative to the RGB image Part of the depth image will be missing.
  • Operation 240 according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
  • the perspective projection PnP algorithm can be used, based on the matching 2D-2D feature point pairs between the target RGB image and the next frame RGB image of the target RGB image; secondly, according to these 2D-2D feature point pairs combined with the depth image. 3D points to obtain matching 3D-2D feature point pairs; again, based on the 3D-2D feature point pairs, calculate the pose transformation matrix between the next frame RGB image of the target RGB image and the target RGB image.
  • pose refers to position and attitude
  • pose is a six-dimensional matrix, including three positions (X, Y, Z) and three attitude angles (heading, pitch, roll).
  • the perspective projection PnP algorithm here can be the traditional perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are simultaneously calculated based on the 3D-2D feature point pairs, and the rotation transformation matrix and the translation transformation matrix constitute Pose transformation matrix.
  • the perspective projection PnP algorithm here may be a new perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are calculated step by step based on the 3D-2D feature point pairs.
  • the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined.
  • the time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
  • the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
  • a pose calculation method which also includes:
  • operation 260 construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image.
  • the pose of the electronic device when capturing the next frame of RGB image is calculated, and at this time, the target 3D map of the current environment can be constructed based on the depth image of the target RGB image and the pose of the next frame of RGB image .
  • the target three-dimensional map of the current environment it can be divided into the following two situations.
  • the target RGB image is the first frame image in the preset initialization sliding window, that is, the corresponding depth image is collected for the first frame RGB image in the preset initialization sliding window. Then, because there is no RGB image before the target RGB image, when constructing the target 3D map of the current environment, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated, according to With a pose transformation matrix and a depth image, a target 3D map of the current environment can be constructed.
  • the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.
  • the target RGB image is not the first frame of image in the preset initialization sliding window, that is, the corresponding depth image is not collected from the first frame of RGB image in the sliding window. Then, when constructing the target 3D map of the current environment, because the target RGB image has also collected RGB images before, the map points can not only be restored based on the target RGB image, but also based on the RGB images collected before the target RGB image To restore the map points, so as to restore more map points to enrich the 3D map of the current environment.
  • the first pose transformation matrix between the next frame of RGB image and the target RGB image is calculated similarly, and the initial three-dimensional map of the current environment can be constructed according to the first pose transformation matrix and the depth image.
  • the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.
  • calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
  • Operation 280 calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window.
  • the position and orientation of the electronic device when collecting other RGB images after the next RGB image frame within the sliding window can be calculated according to the target three-dimensional map. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is based on these RGB images and the depth image. Obviously, the map points on the target three-dimensional map are more comprehensive than the depth image.
  • the poses of other RGB images can be directly calculated according to the target three-dimensional map.
  • the target three-dimensional map of the current environment is constructed according to the pose and depth image when the electronic device collects the next frame of RGB image. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
  • operation 260 is to construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image, including:
  • the target RGB image is the first frame image in the preset initialization sliding window, there is no RGB image before the target RGB image, so it is only necessary to calculate the first frame image between the next frame RGB image and the target RGB image.
  • a pose transformation matrix is constructed. Then, according to the first pose transformation matrix and the depth image, a target 3D map of the current environment is constructed.
  • the target RGB image is not the first frame image within the preset initialization sliding window, there is an RGB image before the target RGB image. Therefore, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image. Secondly, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
  • the target RGB image is the first frame image in the preset initialization sliding window
  • only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated.
  • the target 3D map of the current environment is directly constructed. If the target RGB image is not the first frame image in the preset initialization sliding window, first calculate the first pose transformation matrix between the next frame RGB image and the target RGB image. Then, according to the first pose transformation matrix and the depth image, an initial 3D map of the current environment is constructed.
  • the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
  • a target three-dimensional map of the current environment is constructed, including:
  • FIG. 4 it is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame of RGB image. Then, when constructing the target 3D map of the current environment at this time, it specifically includes two operations:
  • an initial 3D map of the current environment is constructed. Specifically, calculate the first pose transformation matrix between the next RGB image (frame 5) and the target RGB image (frame 4), and construct the initial three-dimensional image of the current environment based on the first pose transformation matrix and the depth image map.
  • the method of calculating the initial three-dimensional map of the current environment here is the same as if the target RGB image is the first frame image in the preset initialization sliding window, then in operation 260, the pose and depth image of the next frame RGB image are collected according to the electronic device, The method of constructing the target 3D map of the current environment is the same, and will not be repeated here.
  • the initial 3D map is updated to generate the target 3D map. Specifically, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and supplement the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
  • the target RGB image when constructing the target three-dimensional map of the current environment, it includes: calculating the first frame image between the next frame RGB image and the target RGB image A pose transformation matrix, constructing an initial 3D map of the current environment according to the first pose transformation matrix and the depth image. Then calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
  • the integrity and accuracy of the target three-dimensional map obtained at this time are greatly improved.
  • a pose calculation method further comprising:
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
  • the first pose transformation matrix between the next frame RGB image and the target RGB image is calculated, including:
  • the preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix (also referred to as pose) between the next frame of RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used for step-by-step calculation Rotation transformation matrix and translation transformation matrix between a frame of RGB image and target RGB image.
  • the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
  • the preset perspective projection PnP algorithm when using the preset perspective projection PnP algorithm to calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image Rotation transformation matrix; secondly, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. And the first rotation transformation matrix and the first translation transformation matrix constitute the first pose transformation matrix.
  • calculate the first pose transformation matrix between the RGB image before the target RGB image and the target RGB image including:
  • the preset perspective projection PnP algorithm is used to calculate the second pose transformation matrix (also referred to as pose) between the RGB image before the target RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used to analyze
  • the step is to calculate a second rotation transformation matrix and a second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the second rotation transformation matrix and the second translation transformation matrix constitute a second pose transformation matrix.
  • the process of calculating the rotation transformation matrix and the translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the rotation transformation matrix with the error generated when calculating the translation transformation matrix. Therefore, the accuracy of the first or second pose transformation matrix finally obtained by adopting the preset perspective projection PnP algorithm to realize the step-by-step calculation is improved.
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
  • a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
  • Matrix including:
  • Operation 520 calculating a first rotation transformation matrix between the RGB image of the next frame and the target RGB image.
  • the matched 2D-2D feature point pairs between two frames of RGB images are generally better than those of one frame.
  • FIG. 6 it is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment.
  • image i is the target RGB image
  • image j is the next frame RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • a pair of image i and image j on the left side of FIG. 6 is a schematic diagram of determining mutually matching 2D-2D feature point pairs in image i and image j and calculating the first rotation transformation matrix R ij .
  • the matched 2D-2D feature point pairs can be determined between the two frames of the next RGB image and the target RGB image.
  • the optical flow method or other image matching methods can be used to determine the 2D-2D feature points between the two frames. Pairs of feature points. Calculate the first rotation transformation matrix R ij between the next frame of RGB image and the target RGB image through the matching 2D-2D feature point pairs between the next frame of RGB image and the target RGB image, combined with epipolar geometric constraints .
  • Operation 540 Calculate a first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix.
  • the matched 2D-2D feature point pairs on the next frame RGB image and the target RGB image are eliminated according to the first rotation transformation matrix, to obtain the eliminated 2D-2D feature point pairs.
  • the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image.
  • Operation 560 based on the first rotation transformation matrix and the first translation transformation matrix, generate a first pose transformation matrix between the next frame of RGB image and the target RGB image.
  • the first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.
  • the first rotation transformation matrix between the next frame RGB image and the target RGB image is calculated.
  • a first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.
  • the preset perspective projection PnP algorithm is used to realize the step-by-step calculation of the rotation transformation matrix and the translation transformation matrix, avoiding the superposition of the errors generated in the calculation process of the two, and thus improving the accuracy of the final first pose transformation matrix sex.
  • operation 540 is to calculate the first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix, including:
  • Operation 542 Eliminate the matching 2D-2D feature point pairs on the RGB image of the next frame and the target RGB image according to the first rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
  • image i is the target RGB image
  • image j is the next frame of RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • the matching 2D-2D feature point pairs on image i and image j are eliminated according to the first rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
  • Operation 544 converting the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
  • the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
  • Operation 546 calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image.
  • the first translation transformation between the next frame RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula matrix.
  • the formula for calculating the translation transformation matrix is as follows:
  • R ij is the rotation transformation matrix from image j to image i
  • t ij is the translation transformation matrix from image j to image i
  • ⁇ () -1 is a transformation matrix that back-projects 2D points into 3D points.
  • the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the first variable between the next frame RGB image and the target RGB image.
  • the translation transformation matrix is used to construct the least squares formula, and the optimal variable t ij can be calculated as the first variable between the next frame RGB image and the target RGB image.
  • the first translation transformation matrix between the next frame of RGB image and the target RGB image when calculating the first translation transformation matrix between the next frame of RGB image and the target RGB image, first, according to the first rotation transformation matrix, the mutual matching of the next frame of RGB image and the target RGB image The 2D-2D feature point pairs are eliminated to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the first translation transformation matrix between the next frame RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated for many times, and the least square method is used to calculate the first translation transformation matrix, which improves the accuracy of the calculated first translation transformation matrix.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • calculate the first pose transformation matrix between the next frame RGB image and the target RGB image as shown in Figure 8
  • construct the target 3D map of the current environment including:
  • Operation 820 according to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points.
  • the first pose transformation matrix is the pose transformation matrix between the next frame RGB image and the target RGB image.
  • the 3D-2D matching point pairs can be determined based on the depth image and target RGB image corresponding to the target RGB image, and then based on the first pose transformation matrix, the 3D feature points on the depth image can be projected onto the next frame of RGB image to generate Project 2D feature points.
  • Operation 840 calculating a reprojection error between the projected 2D feature point and the 2D feature point on the RGB image of the next frame.
  • the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the re-projection error is smaller than the preset error threshold, the 3D feature point corresponding to the re-projection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point on the depth image is used as the target map point.
  • the target 3D map of the current environment can be constructed based on these target map points.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • the 3D feature points on the depth image are projected onto the next frame RGB image to generate Project 2D feature points.
  • the reprojection error is less than the preset error threshold
  • the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • the target three-dimensional map of the current environment can be calculated.
  • it can directly use the target three-dimensional map for calculation.
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
  • a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
  • the preset perspective projection PnP algorithm needs to be calculated at the same time pose transformation matrix and the second pose transformation matrix.
  • the process of calculating the first pose transformation matrix is not repeated here, and the calculation of the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
  • a preset perspective projection PnP algorithm is used to calculate a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
  • the preset perspective projection PnP algorithm when using the preset perspective projection PnP algorithm to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, first, calculate the difference between the RGB image before the target RGB image and the target RGB image The second rotation transformation matrix between; secondly, according to the depth image and the second rotation transformation matrix, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. And the second rotation transformation matrix and the second translation transformation matrix constitute the second pose transformation matrix.
  • the process of calculating the second rotation transformation matrix and the second translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the second rotation transformation matrix with the error generated when calculating the second translation transformation matrix. Therefore, the preset perspective projection PnP algorithm is improved, and the second pose transformation matrix is calculated step by step, thereby improving the accuracy of the second pose transformation matrix.
  • calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
  • Operation 1 Eliminate the 2D-2D feature point pairs that match each other on the RGB image before the target RGB image and the target RGB image according to the second rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
  • image i is the target RGB image
  • image j is the RGB image before the target RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • the matching 2D-2D feature point pairs on image i and image j are eliminated according to the second rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
  • Operation 2 convert the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
  • the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
  • Operation three calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the distance between the RGB image before the target RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula
  • the second translation transformation matrix is as follows:
  • R ij is the rotation transformation matrix from image j to image i
  • t ij is the translation transformation matrix from image j to image i
  • ⁇ () -1 is a transformation matrix that back-projects 2D points into 3D points.
  • the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image.
  • the second translation transformation matrix is used to construct the least squares formula, and the optimal variable t ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image.
  • the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image when calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image, first, according to the second rotation transformation matrix, the RGB image before the target RGB image and the target RGB image The 2D-2D feature point pairs that match each other are eliminated, and the 2D-2D feature point pairs after elimination are obtained. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated multiple times, and the least square method is used to calculate the second translation transformation matrix, which improves the accuracy of the calculated second translation transformation matrix.
  • the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image to generate a target three-dimensional map, including:
  • the 3D feature points on the depth image are respectively projected onto the RGB image before the target RGB image to generate projected 2D feature points;
  • the 3D feature point on the depth image is used as a new target map point
  • the second pose transformation matrix is a pose transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the 3D-2D matching point pairs can be determined based on the depth image and the target RGB image corresponding to the target RGB image, and then based on the second pose transformation matrix, the 3D feature points on the depth image can be projected onto the RGB image before the target RGB image , generating projected 2D feature points.
  • the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the reprojection error is smaller than the preset error threshold, the 3D feature point corresponding to the reprojection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point is used as the target map point.
  • the 3D feature points that satisfy the conditions are used as the target map points, and the target 3D map of the current environment can be constructed based on these target map points.
  • the 3D feature points on the depth image are projected onto the RGB image before the target RGB image , generating projected 2D feature points. Computes the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. It realizes calculating the target 3D map of the current environment when the target RGB image is not the first frame image in the preset initialization sliding window. Then, when calculating the pose of the electronic device when it collects other RGB images after the next frame of RGB images within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.
  • operation 280 calculates the pose of the electronic device when the RGB image after the next frame of RGB image is collected within the preset initialization sliding window, including:
  • Operation 282 using the next frame of the next RGB image as the current frame, and performing the following target operations:
  • Operation 284 according to the target three-dimensional map, the current frame and the RGB image before the current frame, generate a pair of 3D-2D feature points that match each other between the target three-dimensional map and the current frame;
  • Operation 286, calculating the pose of the current frame based on the 3D-2D feature point pair;
  • Operation 288, update the target three-dimensional map according to the current frame, and use the updated three-dimensional map as a new target three-dimensional map.
  • the next frame of the current frame is used as the new current frame, and the target operation is executed cyclically until the pose of the last RGB image in the preset initialization sliding window is calculated.
  • FIG. 10 it is a schematic diagram of calculating the pose and three-dimensional map calculation of the RGB image after the acquisition of the next frame of RGB image in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame, and the next RGB image frame of the target RGB image is the 5th frame. At this time, the calculation of the pose when the RGB image after the next frame of RGB image is collected is to calculate the pose when the 6th-10th frame of image is collected.
  • next frame (frame 6) of the next RGB image (frame 5) is used as the current frame to calculate the pose of the current frame.
  • a 3D-2D feature point pair matching between the target 3D map and the current frame (6th frame) is generated.
  • the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs.
  • 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.
  • the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. Therefore, when the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame, the rotation transformation matrix and translation transformation matrix between the RGB image before the current frame and the current frame are respectively calculated based on the 3D-2D feature point pairs.
  • the pose transformation matrix is obtained based on the rotation transformation matrix and the translation transformation matrix, and then the pose of the current frame is obtained based on the pose transformation matrix and the pose of the RGB image before the current frame respectively.
  • the pose transformation matrix may be obtained based on the multiplication of the rotation transformation matrix and the translation transformation matrix, and this calculation method is not limited in this application.
  • the target 3D map is updated according to the current frame, and the updated 3D map is used as a new target 3D map.
  • the target operation operations 284-286 are cyclically executed until the pose of the last frame of RGB image in the preset initialization sliding window is calculated.
  • a loop method is adopted. First, based on the target 3D map and the current frame and RGB image located before the current frame, calculate the pose of the current frame. Secondly, the target three-dimensional map is updated based on the current frame, and the updated three-dimensional map is used as a new target three-dimensional map. Finally, the pose of the new current frame is calculated based on the new target 3D map, the next group of current frames and the RGB images before the current frame. Secondly, the new target three-dimensional map is updated again based on the new current frame, and the updated three-dimensional map is used as the new target three-dimensional map. This loops until the pose of the last frame of RGB image within the preset initialization sliding window is calculated.
  • a pair of 3D-2D feature points matching each other between the target three-dimensional map and the current frame is generated, including:
  • the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Since the RGB images before the current frame provide some map points when calculating the target 3D map, the 3D feature points matching the 2D feature point pair can be obtained from the map points in the target 3D map. Then, based on the 3D feature point and the 2D feature point of the current frame in the 2D feature point pair, a 3D-2D feature point pair matching between the target 3D map and the current frame is generated. Therefore, based on the 3D-2D feature point pair, the matching relationship between the 3D feature points on the target 3D map and the 2D feature points on the current frame is obtained.
  • the current frame is the image frame after the next frame image of the target RGB image in the sliding window. If there is a corresponding depth image in the current frame, then operation 288 is to update the target three-dimensional map according to the current frame to generate an updated
  • the final 3D map includes:
  • the target 3D map is updated to generate an intermediate 3D map
  • the intermediate 3D map is updated by triangulation method to generate the updated 3D map.
  • the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window; secondly, according to the pose transformation matrix, project the 3D feature points on the depth image of the current frame to the current On the RGB image before the frame, generate projected 2D feature points.
  • calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the current frame if the reprojection error is less than the preset error threshold, then use the 3D feature point on the depth image of the current frame as the target map point.
  • the target map is updated according to the target map points to generate an intermediate three-dimensional map.
  • a triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map.
  • the camera observes the same space point at two positions, and the three-dimensional space point coordinates are obtained through two camera poses and image observation point coordinates. This process is the calculation process of the triangulation method. Depth information missing in some depth images can be recovered by using triangulation.
  • the target three-dimensional map is calculated according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window Update generates intermediate 3D maps. Then, the triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. If there is a corresponding depth image in the current frame, then the target 3D map is updated based on the 3D feature points on the depth image of the current frame in combination with the target 3D map constructed from the previous image frame. Finally, the triangulation method can recover the missing depth information in some depth images. Therefore, the integrity and accuracy of the three-dimensional map constructed at this time are greatly improved.
  • the target 3D map is updated according to the current frame to generate an updated 3D map, including:
  • the three-dimensional map of the target is updated by the triangulation method, and an updated three-dimensional map is generated.
  • a triangulation method is used to update the target three-dimensional map to generate an updated three-dimensional map. Since the depth information missing in some depth images can be recovered by using the keratinization method, the integrity and accuracy of the three-dimensional map constructed at this time are also greatly improved to a certain extent.
  • Operation 1120 acquire the IMU data collected within the preset sliding window.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • IMU Inertial measurement unit, inertial measurement unit
  • the IMU data acquisition frequency of the electronic device is greater than the acquisition frequency of RGB images, then, when 10 frames of RGB images are included in the sliding window, generally more than 10 sets of IMU data are collected.
  • the initialization information of the IMU it is calculated based on all the IMU data collected in the preset sliding window. Therefore, it is necessary to acquire all the IMU data collected within the preset sliding window.
  • Operation 1140 calculate the initialization information of the IMU; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU.
  • the rotation transformation matrix in the pose transformation matrix of the RGB image can be used for rotation constraints
  • the translation transformation matrix of the RGB image can be used for translation constraints , so as to calculate the initialization information of the IMU.
  • the initialization information of the IMU includes the initial velocity of the electronic device, the bias of the IMU (Bias) and the gravity vector of the IMU.
  • Operation 1160 according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
  • the pose of the RGB image collected after the preset sliding window can be calculated according to the initial pose, the target 3D map, and the initialization information of the IMU.
  • the initial pose is the pose of the target RGB image corresponding to the depth image of the current environment collected for the first time.
  • the target 3D map at this time is the 3D map constructed based on all the image frames in the sliding window.
  • the IMU when calculating the pose of the RGB image collected after the preset sliding window, the IMU is first initialized, and then the preset sliding window can be calculated not only by combining the image data collected by the camera, but also by combining the IMU data.
  • the pose of the RGB image collected behind the window The adaptability and robustness of pose calculation are improved from two dimensions of vision and IMU.
  • a pose calculation method further comprising:
  • the initial pose and initial three-dimensional map are calculated according to the RGB image collected in the preset initialization sliding window
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
  • the traditional VINS-MONO algorithm is used to calculate the initial pose and position according to the RGB images collected in the preset initialization sliding window.
  • Initial 3D map is used to calculate the pose of the electronic device when the RGB image is collected after the preset sliding window. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
  • a pose calculation method further comprising:
  • the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
  • FIG. 12 it is a schematic diagram of calculating the pose when the depth image of the current environment is not collected in the preset initialization sliding window, or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window .
  • the pose of the electronic device when the depth image is collected is determined as the initial pose. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. Therefore, using the pose calculation method in this application can only calculate the pose of the electronic device when the RGB image is collected after the sliding window, which is different from when the electronic device collects the RGB image after the sliding window is calculated by using the traditional VINS-MONO algorithm.
  • the durations of the poses are basically the same.
  • the traditional VINS-MONO algorithm or the pose calculation method in this application can be used for calculation. Therefore, a variety of pose calculation methods are provided, which is more flexible.
  • a pose calculation method is provided, with the RGB image of the depth image collected for the first time as the preset initialization of the second frame image in the sliding window and after the second frame image To illustrate the image of , the method includes:
  • Operation 130 collecting RGB images, IMU data and depth images
  • Operation 1306, performing de-distortion and alignment processing on the collected RGB image and depth image
  • Operation 1310 judging whether there is a corresponding depth image (Depth image) in the RGB image currently collected, and it is the Depth image of the current environment collected for the first time; if so, enter operation 1312; if not, enter operation 1318;
  • the image coordinate system of the RGB image is set as the world coordinate system, the pose of the RGB image is used as the initial pose, and the initial pose is set to 0; and the frame_count of the RGB image is recorded as first_depth;
  • Operation 1318 judge whether first_depth is smaller than the sliding window size windowsize (10 frames); if so, then enter operation 1320; if not, then enter operation 1354;
  • Operation 1320 judge whether the frame number frame_count+1 of current frame is equal to first_depth+1; If so, then enter operation 1322; If not, then enter operation 1334;
  • Operation 1322 using the preset perspective projection PnP algorithm to calculate the first pose between the first_depth frame and the first_depth+1 frame;
  • Operation 1324 according to the first pose, project the 3D feature points on the Depth map onto the first_depth+1 frame to generate projected 2D feature points; calculate the weight between the projected 2D feature points and the 2D feature points on the first_depth+1 frame projection error;
  • Operation 1326 if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the Depth map as the target map point; construct an initial 3D map of the current environment according to the target map point.
  • Operation 1330 respectively project the 3D feature points on the Depth map onto the RGB image before the first_depth frame to generate projected 2D feature points; calculate the projected 2D feature points and the RGB image before the first_depth frame The reprojection error between the 2D feature points on ;
  • Operation 1332 if the reprojection error is less than the preset error threshold, then use the 3D feature point on the Depth map as a new target map point; add the new target map point to the initial 3D map to generate the target 3D map;
  • Operation 1334 according to the target 3D map, the current frame, and the RGB image before the current frame, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame;
  • Operation 1336 based on the 3D-2D feature point pair, the traditional PnP algorithm is used to calculate the pose of the current frame; and enter operation 1354 to output the pose;
  • Operation 1338 judging whether there is a corresponding Depth map in the current frame; if yes, then enter operation 1340, if not, then enter operation 1344;
  • Operation 1340 update the target 3D map to generate an intermediate 3D map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;
  • Operation 1342 using the triangulation method to update the intermediate three-dimensional map to generate an updated three-dimensional map
  • Operation 1344 using the triangulation method to update the target three-dimensional map to generate an updated three-dimensional map
  • Operation 1346 judge whether frame_count is equal to windowsize (10 frames); if so, then enter operation 1348;
  • Operation 1348 performing BA optimization on the poses corresponding to the calculated 10 frames of images
  • Operation 1350 based on the poses corresponding to the 10 frames of images after BA optimization, perform IMU initialization;
  • Operation 1352 according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
  • the VINS-MONO algorithm is used to calculate the pose.
  • the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined.
  • the time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
  • the traditional VINS-MONO algorithm is used to initialize according to the preset For the RGB image collected in the sliding window, calculate the pose when the electronic device collects the RGB image after the sliding window is preset and initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
  • the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
  • a pose calculation device 1400 is provided, and the device includes:
  • the initial pose determination module 1420 is configured to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the target corresponding to the depth image The RGB image is not the last frame image in the preset initialization sliding window;
  • the next frame RGB image pose determination module 1440 is configured to determine the pose of the electronic device when the next frame of RGB image is collected according to the initial pose, the target RGB image, the depth image, and the next frame RGB image of the target RGB image.
  • a pose calculation device 1400 is provided, and the device further includes:
  • the target three-dimensional map construction module 1460 is used to construct the target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image;
  • the other RGB image pose determination module 1480 is configured to calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window according to the target three-dimensional map.
  • the target three-dimensional map construction module 1460 also includes:
  • An initial three-dimensional map construction unit configured to construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;
  • the target three-dimensional map construction unit is used to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and perform the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image Update to build a target 3D map of the current environment.
  • a pose calculation device 1400 is provided, and the device further includes:
  • the pose transformation matrix calculation unit is also used to calculate the first pose transformation matrix or the second pose transformation matrix by using the preset perspective projection PnP algorithm;
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
  • the pose transformation matrix calculation unit is also used to further include:
  • the rotation transformation matrix calculation subunit is used to calculate the rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;
  • the translation transformation matrix calculation subunit is used to calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;
  • the pose transformation matrix calculation subunit is used to generate a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image based on the rotation transformation matrix and the translation transformation matrix.
  • the translation transformation matrix calculation subunit is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination.
  • -2D feature point pair according to the depth image, convert the 2D-2D feature point pair after elimination into a 3D-2D feature point pair; according to the 3D-2D feature point pair, calculate the relationship between the relevant RGB image of the target RGB image and the target RGB image
  • the translation transformation matrix between is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination.
  • the initial three-dimensional map construction unit is also used to project the 3D feature points on the depth image to the next frame of RGB image according to the first pose transformation matrix to generate projected 2D feature points; calculate projected 2D feature points The reprojection error between the point and the 2D feature point on the next RGB image; if the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as the target map point; the current environment is constructed according to the target map point The initial 3D map of .
  • the target three-dimensional map construction unit is further configured to respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image according to the second pose transformation matrix to generate projected 2D feature points; Calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the target RGB image; if the reprojection error is less than the preset error threshold, use the 3D feature point on the depth image as the new target map point; add new target map points to the initial 3D map to construct the target 3D map of the current environment.
  • other RGB image pose determination module 1480 includes:
  • the current frame definition unit is used to use the next frame of the next frame RGB image as the current frame, and perform the following target operations:
  • the target operation unit is used to generate a 3D-2D feature point pair matching between the target 3D map and the current frame according to the target 3D map, the current frame and the RGB image before the current frame; calculate the current frame based on the 3D-2D feature point pair The pose of the frame; update the target 3D map according to the current frame, and use the updated 3D map as the new target 3D map;
  • the loop unit is used to use the next frame of the current frame as a new current frame, and execute the target operation in a loop until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.
  • the target operation unit is further configured to obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame; obtain the 2D feature point pairs from the map points in the target three-dimensional map The matched 3D feature points, according to the matched 3D feature points and 2D feature point pairs, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame.
  • the target operation unit is also used to calculate the current frame according to the depth image corresponding to the current frame, the pose between the current frame and the RGB image before the current frame in the preset sliding window
  • the transformation matrix is used to update the target 3D map to generate an intermediate 3D map
  • the triangulation method is used to update the intermediate 3D map to generate an updated 3D map.
  • the target operation unit is further configured to update the target 3D map by using a triangulation method to generate an updated 3D map.
  • a pose calculation device 1600 is provided, and the device further includes:
  • IMU data acquisition module 1620 for acquiring the IMU data collected in the preset sliding window
  • the IMU initialization module 1640 is used to calculate the initialization information of the IMU according to the pose and IMU data of each frame of the RGB image in the preset sliding window; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU;
  • the first pose calculation module 1660 is configured to calculate the pose of the RGB image collected after the preset sliding window according to the initial pose, the target 3D map and the initialization information of the IMU.
  • a pose calculation device is provided, the device also includes:
  • the second pose calculation module is used to calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window if the depth image of the current environment is not collected in the preset initialization sliding window;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
  • a pose calculation device is provided, the device also includes:
  • the third pose calculation module is used to calculate the initial pose and initial position according to the RGB images collected in the preset initialization sliding window if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window. 3D map;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
  • each module in the pose calculation device is only for illustration. In other embodiments, the pose calculation device can be divided into different modules according to needs, so as to complete all or part of the functions of the pose calculation device.
  • Each module in the pose calculation device can be fully or partially realized by software, hardware and a combination thereof.
  • Each module can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can call and execute the corresponding operations of the above modules.
  • an electronic device including a memory and a processor, and a computer program is stored in the memory.
  • the processor executes a pose provided by each of the above embodiments. The operation of the calculation method.
  • Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.
  • the electronic device includes a processor and a memory connected through a system bus.
  • the processor is used to provide calculation and control capabilities to support the operation of the entire electronic device.
  • the memory may include non-volatile storage media and internal memory.
  • Nonvolatile storage media store operating systems and computer programs.
  • the computer program can be executed by a processor, so as to implement a pose calculation method provided in the above embodiments.
  • the internal memory provides a high-speed running environment for the operating system computer program in the non-volatile storage medium.
  • the electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.
  • a terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.
  • each module in the pose calculation device provided in the embodiment of the present application may be in the form of a computer program.
  • the computer program can run on or on an electronic device.
  • the program modules constituted by the computer program can be stored in the electronic device or the memory of the electronic device.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform operations of the pose calculation method.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDR SDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente demande se rapporte à un procédé et un appareil de calcul de pose, un dispositif électronique et un support de stockage lisible par ordinateur. Le procédé consiste à : si une image de profondeur d'un environnement actuel est collectée dans une fenêtre glissante d'initialisation prédéfinie pour la première fois, déterminer la pose du dispositif électronique dans la collecte de l'image de profondeur en tant que pose initiale, une image RVB cible correspondant à l'image de profondeur n'étant pas la dernière image de trame de la fenêtre glissante d'initialisation prédéfinie (220) ; et déterminer, en fonction de la pose initiale, de l'image RVB cible, de l'image de profondeur et de la trame suivante d'image RVB de l'image RVB cible, la pose du dispositif électronique dans la collecte de la trame suivante d'image RVB (240).
PCT/CN2022/098295 2021-07-29 2022-06-13 Procédé et appareil de calcul de pose, dispositif électronique et support de stockage lisible Ceased WO2023005457A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110866966.3A CN113610918B (zh) 2021-07-29 2021-07-29 位姿计算方法和装置、电子设备、可读存储介质
CN202110866966.3 2021-07-29

Publications (1)

Publication Number Publication Date
WO2023005457A1 true WO2023005457A1 (fr) 2023-02-02

Family

ID=78306042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098295 Ceased WO2023005457A1 (fr) 2021-07-29 2022-06-13 Procédé et appareil de calcul de pose, dispositif électronique et support de stockage lisible

Country Status (2)

Country Link
CN (1) CN113610918B (fr)
WO (1) WO2023005457A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071396A (zh) * 2023-02-23 2023-05-05 哈尔滨工业大学 一种同步定位方法
CN116363201A (zh) * 2023-03-13 2023-06-30 北醒(北京)光子科技有限公司 位姿估计方法、装置、电子设备及可读存储介质
CN117237553A (zh) * 2023-09-14 2023-12-15 广东省核工业地质局测绘院 一种基于点云图像融合的三维地图测绘系统
CN117419690A (zh) * 2023-12-13 2024-01-19 陕西欧卡电子智能科技有限公司 一种无人船的位姿估计方法、装置及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610918B (zh) * 2021-07-29 2025-02-11 Oppo广东移动通信有限公司 位姿计算方法和装置、电子设备、可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610175A (zh) * 2017-08-04 2018-01-19 华南理工大学 基于半直接法和滑动窗口优化的单目视觉slam算法
US20190221000A1 (en) * 2017-01-16 2019-07-18 Shapetrace Inc. Depth camera 3d pose estimation using 3d cad models
CN111160298A (zh) * 2019-12-31 2020-05-15 深圳市优必选科技股份有限公司 一种机器人及其位姿估计方法和装置
CN112164117A (zh) * 2020-09-30 2021-01-01 武汉科技大学 一种基于Kinect相机的V-SLAM位姿估算方法
CN112435206A (zh) * 2020-11-24 2021-03-02 北京交通大学 利用深度相机对物体进行三维信息重建的方法
CN112819860A (zh) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 视觉惯性系统初始化方法及装置、介质和电子设备
CN112907620A (zh) * 2021-01-25 2021-06-04 北京地平线机器人技术研发有限公司 相机位姿的估计方法、装置、可读存储介质及电子设备
CN113610918A (zh) * 2021-07-29 2021-11-05 Oppo广东移动通信有限公司 位姿计算方法和装置、电子设备、可读存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190221000A1 (en) * 2017-01-16 2019-07-18 Shapetrace Inc. Depth camera 3d pose estimation using 3d cad models
CN107610175A (zh) * 2017-08-04 2018-01-19 华南理工大学 基于半直接法和滑动窗口优化的单目视觉slam算法
CN111160298A (zh) * 2019-12-31 2020-05-15 深圳市优必选科技股份有限公司 一种机器人及其位姿估计方法和装置
CN112164117A (zh) * 2020-09-30 2021-01-01 武汉科技大学 一种基于Kinect相机的V-SLAM位姿估算方法
CN112435206A (zh) * 2020-11-24 2021-03-02 北京交通大学 利用深度相机对物体进行三维信息重建的方法
CN112907620A (zh) * 2021-01-25 2021-06-04 北京地平线机器人技术研发有限公司 相机位姿的估计方法、装置、可读存储介质及电子设备
CN112819860A (zh) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 视觉惯性系统初始化方法及装置、介质和电子设备
CN113610918A (zh) * 2021-07-29 2021-11-05 Oppo广东移动通信有限公司 位姿计算方法和装置、电子设备、可读存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071396A (zh) * 2023-02-23 2023-05-05 哈尔滨工业大学 一种同步定位方法
CN116363201A (zh) * 2023-03-13 2023-06-30 北醒(北京)光子科技有限公司 位姿估计方法、装置、电子设备及可读存储介质
CN117237553A (zh) * 2023-09-14 2023-12-15 广东省核工业地质局测绘院 一种基于点云图像融合的三维地图测绘系统
CN117419690A (zh) * 2023-12-13 2024-01-19 陕西欧卡电子智能科技有限公司 一种无人船的位姿估计方法、装置及介质
CN117419690B (zh) * 2023-12-13 2024-03-12 陕西欧卡电子智能科技有限公司 一种无人船的位姿估计方法、装置及介质

Also Published As

Publication number Publication date
CN113610918A (zh) 2021-11-05
CN113610918B (zh) 2025-02-11

Similar Documents

Publication Publication Date Title
CN109727288B (zh) 用于单目同时定位与地图构建的系统和方法
CN109506642B (zh) 一种机器人多相机视觉惯性实时定位方法及装置
WO2023005457A1 (fr) Procédé et appareil de calcul de pose, dispositif électronique et support de stockage lisible
CN110246147B (zh) 视觉惯性里程计方法、视觉惯性里程计装置及移动设备
CN107564061B (zh) 一种基于图像梯度联合优化的双目视觉里程计算方法
CN108648215B (zh) 基于imu的slam运动模糊位姿跟踪算法
US12062210B2 (en) Data processing method and apparatus
CN110702111A (zh) 使用双事件相机的同时定位与地图创建(slam)
WO2020206903A1 (fr) Procédé et dispositif de mise en correspondance d'images et support de mémoire lisible par ordinateur
Saurer et al. Homography based visual odometry with known vertical direction and weak manhattan world assumption
WO2019157925A1 (fr) Procédé et système d'implémentation d'odométrie visuelle-inertielle
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
CN113190120B (zh) 位姿获取方法、装置、电子设备及存储介质
CN110111388A (zh) 三维物体位姿参数估计方法及视觉设备
WO2019104571A1 (fr) Procédé et dispositif de traitement d'image
CN110375732A (zh) 基于惯性测量单元和点线特征的单目相机位姿测量方法
CN113034347A (zh) 倾斜摄影图像处理方法、装置、处理设备及存储介质
WO2024164812A1 (fr) Procédé et dispositif de slam basés sur une fusion de multiples capteurs, et support
US20250037401A1 (en) System and methods for validating imagery pipelines
CN114092564A (zh) 无重叠视域多相机系统的外参数标定方法、系统、终端及介质
CN113610702A (zh) 一种建图方法、装置、电子设备及存储介质
US20250174036A1 (en) Hand pose recognition method and apparatus, device, storage medium, and program product
CN113516714A (zh) 基于imu预积分信息加速特征匹配的视觉slam方法
CN115619851A (zh) 基于锚点的vslam后端优化方法、装置、介质、设备和车辆
CN113963030B (zh) 一种提高单目视觉初始化稳定性的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22848066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22848066

Country of ref document: EP

Kind code of ref document: A1