[go: up one dir, main page]

WO2022183137A1 - Système de slam visuo-inertiel - Google Patents

Système de slam visuo-inertiel Download PDF

Info

Publication number
WO2022183137A1
WO2022183137A1 PCT/US2022/018460 US2022018460W WO2022183137A1 WO 2022183137 A1 WO2022183137 A1 WO 2022183137A1 US 2022018460 W US2022018460 W US 2022018460W WO 2022183137 A1 WO2022183137 A1 WO 2022183137A1
Authority
WO
WIPO (PCT)
Prior art keywords
visual
inertial
data
image
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2022/018460
Other languages
English (en)
Inventor
Pan JI
Yuan Tian
Qingan Yan
Yuxin MA
Yi Xu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innopeak Technology Inc
Original Assignee
Innopeak Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology Inc filed Critical Innopeak Technology Inc
Priority to CN202280017858.1A priority Critical patent/CN116940969A/zh
Publication of WO2022183137A1 publication Critical patent/WO2022183137A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20072Graph-based image processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the disclosed technology relates generally to providing an RGB-D Inertial Simultaneous Localization and Mapping (SLAM) system that incorporates artificial intelligence (Al) and planar constraints.
  • SLAM Simultaneous Localization and Mapping
  • Simultaneous Localization and Mapping is the computational technique of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. This technique is generally used with autonomous vehicle systems, virtual reality (VR), augmented reality (AR), and robotics.
  • VR virtual reality
  • AR augmented reality
  • AR can be used to enhance images as viewed on a screen or other display, produced by overlaying computer-generated images, sounds, or other data to
  • “augment” images in the real-world environment may be implemented too, not just as overlays, but to anchor virtual objects to a real-world environment and allow the user to interact with combined virtual/real objects.
  • the SLAM technique can allow the autonomous vehicle to build a map of the driving environment and localize the vehicle in that environment as the vehicle travels through the corresponding real-world environment.
  • the map can help the vehicle plan a driving path and avoid obstacles within the environment.
  • FIG. 1 is an illustrative visual-inertial SLAM system, user devices, sensors, and network, in accordance with the embodiments disclosed herein.
  • FIG. 2 is an illustrative process implemented by the visual-inertial SLAM system, in accordance with the embodiments disclosed herein.
  • FIG. 3 is an illustrative network structure, in accordance with the embodiments disclosed herein.
  • FIG. 4 is an illustrative process performed by one or more computing systems that may be used in implementing various features of embodiments of the disclosed technology.
  • FIG. 5 is an example of a computing system that may be used in implementing various features of embodiments of the disclosed technology.
  • a system that uses an artificial intelligence (Al) model can predict image classifications or object poses in a real-world environment.
  • a conventional system that implements a geometric SLAM system may determine the position and orientation of an object by analyzing the associated image and IMU data (e.g., using visual-inertial odometry (VIO)).
  • VIO visual-inertial odometry
  • none of the conventional VIO systems incorporate CNN depth/uncertainty maps and planar data constraints, both of which may be determined by sensors embedded in mobile devices or some standalone neural networks.
  • Embodiments of the application are directed to a robust, visual-inertial, SLAM system that combines artificial intelligence (Al) and planar constraints.
  • a trained Al model may be used to predict the depth map in a physical environment, where the depth is the estimated depth or distance of each pixel from the viewing user device using the Al model.
  • the depth map may be combined with an uncertainty map (e.g., a measurement of how certain or confident the Al model depth prediction is) for each input image (e.g., interchangeably using confidence score or uncertainty score based on the training and output of the Al model) to create an input map of the environment to the visual-inertial SLAM system.
  • an uncertainty map e.g., a measurement of how certain or confident the Al model depth prediction is
  • the visual-inertial SLAM system may be a feature-based SLAM system that supports RGB-D inertial sensor input (e.g., a depth-sensing device in association with a red, green, blue input). Instead of relying a depth sensor to get the depth image, the visual- inertial SLAM system may use the predicted depth map as the depth input. Additionally, the Al uncertainty determination can help adaptively weigh the contribution of feature points in the backend optimization of bundle adjustment.
  • the visual-inertial SLAM system may also receive a gravity direction value from one or more inertial sensors. Using the gravity direction value, the visual-inertial SLAM system can implement a fast plane detection method that detects horizontal planes via a one-point Random Sample Consensus (RANSAC) and vertical planes via a two-point RANSAC, resulting in determined horizontal and vertical planes. Those detected planes may be used to regularize the backend bundle adjustment of the visual-inertial SLAM system.
  • RANSAC Random Sample Consensus
  • the visual-inertial SLAM system may improve automated depth perception so that other systems that rely on an accurate depiction of the environment can determine a more accurate action in response to the determinations.
  • the use of the visual-inertial SLAM system may improve reconstruction the physical environment as a visual-inertial SLAM system and accurately detect the position of the agent with respect to the map simultaneously.
  • the proposed system maintains the minimal sensor setup of using a monocular camera and an inertial sensor (e.g., IMU) for SLAM, yet achieves the benefits of using the RGB-D inertial input via an Al model (e.g., CNN or other neural network) to predict depth and uncertainty.
  • an Al model e.g., CNN or other neural network
  • the proposed system combines the best of both worlds (i.e., geometry and deep learning).
  • a fast and effective plane detection method is presented along with the disclosed visual-inertial SLAM system.
  • the system can operate in real-time and provide accurate mapping functionality that reconstructs map points, camera poses, and major planes, which suits AR applications well.
  • FIG. 1 is an illustrative visual-inertial SLAM system, user devices, sensors, and network, in accordance with the embodiments disclosed herein.
  • various computer systems are illustrated, including visual-inertial SLAM system 102, one or more user devices ISO, one or more sensors 132, and network 140.
  • Visual-Inertial SLAM system 102 may be, for example, a server computer, a controller, or any other similar computing component capable of processing data.
  • visual-inertial SLAM system 102 comprises one or more processors 104 (e.g., controllers, control engines, or other processing devices), memory 105, and machine readable media 106.
  • processors 104 e.g., controllers, control engines, or other processing devices
  • memory 105 e.g., machine readable media
  • visual-inertial SLAM system 102 may be implemented at a first user device and provide output to a second user device for implementing in a SD environment.
  • Processor 104 may be one or more central processing units (CPUs), semiconductor-based microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 106.
  • CPUs central processing units
  • semiconductor-based microprocessors and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 106.
  • processor 104 may fetch, decode, and execute instructions to control processes or operations for implementing 3D image processing.
  • processor 104 may include one or more electronic circuits that include electronic components for performing the functionality of one or more instructions, such as a field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other electronic circuits.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • Visual-Inertial SLAM system 102 might also include one or more memory 105 and machine readable media 106.
  • memory 105 and/or machine readable media 106 may comprise random-access memory (RAM) or other dynamic memory, might be used for storing information and instructions to be executed by processor 104.
  • RAM random-access memory
  • Memory 105 and/or machine readable media 106 may comprise random-access memory (RAM) or other dynamic memory, might be used for storing information and instructions to be executed by processor 104.
  • Memory 105 and/or machine readable media 106 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104.
  • Memory 105 and/or machine readable media 106 might likewise include a read only memory (“ROM”) or other static storage device coupled to a bus for storing static information and instructions for processor 104.
  • ROM read only memory
  • Visual-Inertial SLAM system 102 may comprise one or more modules that perform processes or operations.
  • the modules may include, for example, data processing module 108, visual-inertial module 110, artificial intelligence (Al) module 112, planar engine 114, optimization engine 116, and user interface simulation engine 118.
  • Al artificial intelligence
  • FIG. 2 One of the methods executed by the modules of visual-inertial SLAM system 102 is illustrated in FIG. 2.
  • Data processing module 108 may receive data from one or more user devices 130 or sensors 132, and store the sensor data in sensor data store 120.
  • Data may be generated by the user devices 130 (e.g., computing device, mobile or smart phone, etc.) or sensors 132.
  • Sensors 132 may comprise, for example, monocular or stereo cameras, depth sensors, and inertial measurement units (IMU).
  • the sensor data generated by a refracting or monocular camera can generate magnified images and/or optical prism as a two- dimensional (2D) image.
  • the stereo and RGB-D cameras can provide metric scale recovery for map points and camera poses.
  • Depth sensors can generate a three-dimensional (3D) image by acquiring multi-point distance information across a wide Field-of-View (FoV).
  • the inertial sensors or IMUs can measure the acceleration and angular velocity of an object along three mutually perpendicular axes over time.
  • Visual-inertial module 110 is configured to determine visual-inertial data
  • visual-inertial module 110 may acquire one or more input images from sensor data store 120 or directly from a user device 130 or sensor 132 (e.g., single cameras, stereo cameras, omnidirectional cameras, or other image sensors). Based on the quality of the image data, visual-inertial module 110 may apply an image correction process or image processing techniques to improve the quality of one or more portions of the image (e.g., for lens distortion removal).
  • the VIO process may implement feature detection and feature extraction to help estimate the sensor motion.
  • feature detection the process may define interest operators and match features across frames to construct an optical flow field.
  • the process may establish a correlation between two images and perform feature extraction based on the comparison.
  • the process may construct optical flow field (e.g., using Lucas-Kanade method) and remove outliers.
  • the VIO process may further estimate the visual-inertial data (e.g., the motion of the sensor) from the optical flow.
  • a first embodiment may use a Kalman filter to estimate a state distribution.
  • a second embodiment may determine the geometric and 3D properties of the features that minimize a cost function (e.g., using an Al model) based on the re-projection error between two adjacent images. This can be done by, for example, mathematical minimization or random sampling.
  • Visual-inertial module 110 may access image data in sensor data store 120, identify a scale drift of the visual-inertial sensor data that can accumulate errors over time, and negating the scale drift by implementing a loop closure to the visual-inertial sensor data.
  • the scale drift may be erroneously added by one or more sensors 132 (e.g., an inertial sensor) and the scale drift can cause increased errors in the inertial values over time.
  • Artificial intelligence (Al) module 112 may train and implement a machine learning model to predict a depth map and corresponding uncertainty score (or confidence score, used interchangeably) from an input image. For example, during training, Al module 112 may access one or more training images and corresponding depth values for portions of the image (e.g., pixels). The training images may be provided to the model, like a convolution neural network (CNN) or other Al model, and the output of the Al model may comprise a depth map and/or depth values with a low uncertainty value that may be presented as maps or other formats. The uncertainty map corresponding with these correlated images and values may be lower than a threshold uncertainty value at this point of the training.
  • CNN convolution neural network
  • a deep learning model may be combined with a geometric-based model.
  • geometric modeling may not identify depth in an image absent textures or textureless regions, whereas deep learning based models can predict reasonable depth values in such regions by learning a mapping function from the training data.
  • Al module 112 may implement a deep learning model (e.g., a CNN) that can predict the depth and the uncertainty values to augment the geometric values.
  • a deep learning model e.g., a CNN
  • the model can be trained in a supervised, self-supervised, or training agnostic manner, as further illustrated in FIG. 3.
  • the model may implement an auto-encoder structure with skip-connections between the encoder and the decoder portions of the model.
  • the uncertainty weighted self-supervised loss is defined as follows:
  • r ph0 is the photometric loss
  • u is the uncertainty value predicted for each pixel.
  • the total loss function is the summation of the uncertainty weighted self- supervised loss and a depth smoothness loss:
  • l is a weighting parameter and set to 1.0 x 10 -3 .
  • inference is performed for each coming image to predict its depth map and uncertainty map, which are then fed to the disclosed visual-inertial SLAM system 102.
  • Al module 112 can generate an output that comprises at least the depth map D t and/or its uncertainty map U t from a single image l t .
  • the model may implement an auto-encoder structure with skip-connections between the encoder and the decoder.
  • the visual-inertial SLAM system 102 may receive an Al model predicted depth value as input and the Al model depth value is not always accurate, this uncertainty- guided optimization can effectively down-weight the contributions of those map points that potentially have high depth errors.
  • Inertial measurements Y t from the user device 130 or sensors 132 may be determined, corresponding to the image l t
  • Planar engine 114 is configured to detect one or more planes and store corresponding planar constraints in planar constraints data store 122.
  • the planes are detected in a separate thread of the front-end user device 130 or sensor 132 (e.g. an accelerometer, optical, or motion sensor).
  • the planes may be detected from sensor data generated by user device 130 or sensor 132, including gravity direction and uncertainty measurements from a trained Al model, to determine i) horizontal planes that are perpendicular to the gravity direction; and ii) vertical planes that are parallel to the gravity direction.
  • the number of iterations for RANSAC equals: _ log(l-p) log(l-w n )’
  • p the desired success probability
  • w the inlier ratio
  • planar engine 114 is configured to assume that indoor scenes consist mainly of planes that are orthogonal to one another (e.g., to help further accelerate the detection of wall planes). Once planar engine 114 has detected one main wall plane (e.g., using two-point RANSAC method), it may assume there are additional planes parallel or orthogonal to the main wall plane. This may enable removal of one degree-of- freedom (DoF) from the problem and test subsequent plane hypotheses with only one point per sample.
  • DoF degree-of- freedom
  • Planar engine 114 may mitigate these errors by augmenting map points with a normal direction computed from the CNN depth value. Normal direction for a CNN depth image pixel is computed as follows
  • Map point normals are maintained and updated with each CNN depth frame. Map point normals can help significantly reduce false positive inlier points, resulting in more robust plane detection (e.g., for vertical planes).
  • Planar engine 114 is configured to determine a global scale shift in the CNN depth and/or refine CNN depth values using SLAM map points.
  • the map points may be projected onto the camera image to compute an average scale difference between map points and CNN depth, r ⁇ SLAM
  • n is the number of visible map points
  • ' is the depth from the i th jyCNN map point
  • i is the corresponding CNN depth.
  • the CNN depth may be multiplied by this scale.
  • the dense Al model depth map may also be used to determine the 3D plane parameters.
  • the Al model uncertainty values may be employed to guide the sampling process. In regions of higher uncertainties, the depth points are sampled with lower probability, and vice versa.
  • the inlier count of the Al model depth values may be downweighed as they are less accurate than the map points determined by the visual- inertial SLAM system 102, which have undergone an optimization process with optimization engine 116.
  • an extra energy term may be built on top of the re-projection error term. For each plane v e R 4 , its inlying map points may be found by comparing the distance from map points to plane v, in relation to a threshold value. The following plane fitting errors are further minimized in the backend optimization:
  • bundle adjustment may be implemented under the assumption that the planes in the images (e.g., walls, ground, etc.) may be constrained.
  • the cost for bundle adjustment may correspond with the summation of ; b ⁇ " 1
  • planar constraint can be applied in both a local and/or a full bundle adjustment processes as a guidance for map points optimization.
  • Planar constraints may be enforced by planar engine 114 (e.g., during backend processing) through the factor graph optimization after a set of reliable planes are detected.
  • Optimization engine 116 is configured to combine various data as input to the visual-inertial SLAM system 102.
  • the input may comprise, for example, depth map D t and its uncertainty map U t from a single image l t (which is the output from the Al model), inertial measurements Y t , and planar constraints (from planar engine 114).
  • the visual-inertial SLAM system 102 may implement an existing feature-based SLAM that supports RGB-D inertial sensor inputs. However, instead of relying on a depth sensor to determine the depth image or depth map, the predicted depth determined by the Al model may be used as the input to the visual-inertial SLAM system 102.
  • the visual-inertial SLAM system 102 may be augmented with the uncertainty value or uncertainty map determined by the Al model (from Al module 112) to adaptively weight the contribution of feature points. For example, given map points in the world coordinates and the 2D observations xj in frame i, the re-projection errors can be computed as,
  • p h is the robust Huber cost function is the covariance matrix measuring the uncertainty of each detected feature point.
  • the covariance is pre-computed according to the scale at which the feature point is detected; thus all feature points detected from the same scale share the same covariance.
  • using the CNN learnt uncertainties instead to adaptively weight the re-projection errors is proposed.
  • a simple method can be used to convert the uncertainty value to a covariance metric, i.e.,
  • visual-inertial SLAM system 102 may take a CNN predicted depth as input and some received data may not be accurate (e.g., CNN depth), the uncertainty-guided optimization formula provided herein can downweigh the contributions of those map points that potentially have high depth errors.
  • visual-inertial SLAM system 102 may use the depth and left features to generate feature points on a virtual right view.
  • Visual-inertial SLAM system 102 may compute the reprojection errors for the virtual right view so that the overall reprojection cost can correspond with:
  • User interface simulation engine 118 is configured to provide or otherwise transform the output from the visual-inertial SLAM system 102 to provide to one or more user devices 130 via a user interface.
  • the output of the visual-inertial SLAM system 102 may include map points, camera poses (e.g., in the metric scale), and/or steadily detected 3D planes.
  • the 3D plane output can be used when visual- inertial SLAM system 102 is placing a virtual object.
  • the camera pose output may be used to help place an anchor or static position in a virtual environment where a virtual object may be placed.
  • the dense depth map can be also be used for occlusion reasoning for rendering virtual objects and overlaying them digitally onto a real world environment.
  • autonomous vehicle systems, VR/AR, or robotics applications can utilize output from the visual-inertial SLAM system 102.
  • the user interface simulation may provide the virtual object or image along a plane.
  • the values corresponding with the plane data may be detected from the dense Al model depth values and sparse 3D map points and used to regularize optimization engine 116.
  • FIG. 2 is an illustrative process implemented by the visual-inertial SLAM system, in accordance with the embodiments disclosed herein.
  • Various embodiments present a robust RGB-D inertial SLAM system that combines conventional geometric SLAM and modern Al model-based depth and uncertainty predictions, and further enforces planar constraints to improve performance to generate a visual-inertial SLAM system, as illustrated in FIG. 1.
  • the process illustrated in FIG. 2 includes the entire process implemented internally within the disclosed visual-inertial SLAM system.
  • some embodiments of the visual-inertial SLAM system may implement the Al model and store various data internally to the visual-inertial SLAM system as illustrated with FIG. 1, rather than being separate components as illustrated between the Al model at block 220 and the visual-inertial SLAM system at block 240 in FIG. 2. Both implementations are supported by this disclosure.
  • the image l t is received from one or more sensors.
  • the model may also be trained to predict the depth map D t and its uncertainty map U t from a single image I t .
  • the image l t is provided to the trained Al model.
  • the inertial measurement Y t from the sensors and output from the Al model comprising a predicted depth map D t and/or uncertainty map U t .
  • the image I t , its predicted depth map D t , uncertainty map U t , and inertial measurements Y t are determined using the Al model (e.g., CNN or other machine learning model or network).
  • planar constraints unrelated to image l t are also determined.
  • the predicted depth map D t may be provided as input to the visual-inertial SLAM system.
  • the inputs may help build the output generated by the uncertainty-guided RGB-D inertial SLAM system.
  • the visual-inertial SLAM system may generate one or more output values, comprising one or more map points, camera poses, 3D planes, or portions of one or more images. While in use, the planes may be initiated and tracked in the front-end interface corresponding with the user device and planar constraints may be enforced in the visual-inertial SLAM system using, for example, factor graph optimization.
  • the outputs of the system comprise map points and camera poses in the metric scale, and steadily tracked 3D planes.
  • FIG. 3 is an illustrative network structure, in accordance with the embodiments disclosed herein.
  • a network structure is provided for an Al model-based depth and uncertainty prediction.
  • the network can follow an auto-encoder structure with skip connections, and can take one image as input to predict its depth map and uncertainty map.
  • FIG. 4 illustrates an example computing component that may be used to implement the visual-inertial SLAM system in accordance with various embodiments.
  • visual-inertial SLAM system 102 illustrated in FIG. 1 may execute one or more instructions to perform the operations described herein.
  • the process may receive sensor data.
  • the sensor data may comprise an image and visual-inertial data.
  • the process may provide the image to a trained artificial intelligence (Al) model.
  • the output of the trained Al model may comprise at least a depth map for one or more portions of the image and an uncertainty map corresponding with the depth map of the one or more portions of the image.
  • the process may provide the visual-inertial data, the depth map, the uncertainty map, and planar constraints to a visual-inertial Simultaneous Localization and Mapping (SLAM) system (e.g., visual-inertial SLAM system 102 illustrated in FIG. 1).
  • SLAM Simultaneous Localization and Mapping
  • the visual-inertial SLAM system may be configured to generate SD data for a virtual environment.
  • the process may transmit the SD data to a second user device for enhancing images in the 3D environment.
  • the 3D environment includes an autonomous vehicle system, virtual reality (VR), augmented reality (AR), or a robotics application at the second user device, as discussed throughout the disclosure.
  • FIG. 5 depicts a block diagram of an example computer system 500 in which various of the embodiments described herein may be implemented.
  • the computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information.
  • Hardware processor(s) 504 may be, for example, one or more general purpose microprocessors.
  • the computer system 500 also includes a main memory 506, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504.
  • Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • the computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504.
  • a storage device 510 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 502 for storing information and instructions.
  • the computer system 500 may be coupled via bus 502 to a display 512, such as a liquid crystal display (LCD) (or touch screen), for displaying information to a computer user.
  • a display 512 such as a liquid crystal display (LCD) (or touch screen)
  • An input device 514 is coupled to bus 502 for communicating information and command selections to processor 504.
  • cursor control such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512.
  • cursor control such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512.
  • the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.
  • the computing system 500 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s).
  • This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • the word “component,” “engine,” “system,” “database,” data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++.
  • a software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts.
  • Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
  • a computer readable medium such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
  • Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device.
  • Software instructions may be embedded in firmware, such as an EPROM.
  • hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
  • the computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 500 in response to processor(s) 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor(s) 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • non-transitory media refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510.
  • Volatile media includes dynamic memory, such as main memory 506.
  • non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • Non-transitory media is distinct from but may be used in conjunction with transmission media.
  • Transmission media participates in transferring information between non-transitory media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502.
  • transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • the computer system 500 also includes a communication interface 518 coupled to bus 502.
  • Communication interface 518 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
  • communication interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • a network link typically provides data communication through one or more networks to other data devices.
  • a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP).
  • ISP Internet Service Provider
  • the ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet.”
  • Internet Internet
  • Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link and through communication interface 518, which carry the digital data to and from computer system 500, are example forms of transmission media.
  • the computer system 500 can send messages and receive data, including program code, through the network(s), network link and communication interface 518.
  • a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 518.
  • the received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
  • Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware.
  • the one or more computer systems or computer processors may also operate to support performance of the relevant operations in a "cloud computing" environment or as a "software as a service” (SaaS).
  • SaaS software as a service
  • the processes and algorithms may be implemented partially or wholly in application-specific circuitry.
  • the various features and processes described above may be used independently of one another, or may be combined in various ways. Different combinations and sub-combinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations.
  • circuit and component might describe a given unit of functionality that can be performed in accordance with one or more embodiments of the present application.
  • a component might be implemented utilizing any form of hardware, software, or a combination thereof.
  • processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a component.
  • Various components described herein may be implemented as discrete components or described functions and features can be shared in part or in total among one or more components. In other words, as would be apparent to one of ordinary skill in the art after reading this description, the various features and functionality described herein may be implemented in any given application.
  • computer program medium and “computer usable medium” are used to generally refer to transitory or non-transitory media. These and other various forms of computer program media or computer usable media may be involved in carrying one or more sequences of one or more instructions to a processing device for execution. Such instructions embodied on the medium, are generally referred to as “computer program code” or a “computer program product” (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable a computing component to perform features or functions of the present application as discussed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Systèmes et procédés relatifs à un système SLAM visuo-inertiel robuste combinant intelligence artificielle (Al) et contraintes planaires. Par exemple, un modèle d'IA entraîné peut être utilisé pour prédire la carte de profondeur dans un environnement physique, où la profondeur est la profondeur ou la distance estimée de chaque pixel à partir du dispositif utilisateur de visualisation en utilisant le modèle d'IA. La carte de profondeur peut être combinée avec une carte d'incertitude (par exemple, une mesure du degré de certitude ou de confiance de la prédiction de profondeur du modèle d'IA) pour chaque image d'entrée (par exemple, en utilisant de manière interchangeable le score de confiance ou le score d'incertitude basé sur la formation et la sortie du modèle d'IA) pour créer une carte d'entrée de l'environnement, fournie au système SLAM visuo-inertiel.
PCT/US2022/018460 2021-03-02 2022-03-02 Système de slam visuo-inertiel Ceased WO2022183137A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280017858.1A CN116940969A (zh) 2021-03-02 2022-03-02 视觉惯性slam系统

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163155474P 2021-03-02 2021-03-02
US63/155,474 2021-03-02

Publications (1)

Publication Number Publication Date
WO2022183137A1 true WO2022183137A1 (fr) 2022-09-01

Family

ID=83048499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/018460 Ceased WO2022183137A1 (fr) 2021-03-02 2022-03-02 Système de slam visuo-inertiel

Country Status (2)

Country Link
CN (1) CN116940969A (fr)
WO (1) WO2022183137A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119354189A (zh) * 2024-12-26 2025-01-24 杭州电子科技大学 基于多维约束因子图的地磁矢量和ins融合导航方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180005393A1 (en) * 2016-06-30 2018-01-04 Daqri, Llc Deep Image Localization
WO2018182524A1 (fr) * 2017-03-29 2018-10-04 Agency For Science, Technology And Research Localisation robuste en temps réel par odométrie inertielle visuelle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180005393A1 (en) * 2016-06-30 2018-01-04 Daqri, Llc Deep Image Localization
WO2018182524A1 (fr) * 2017-03-29 2018-10-04 Agency For Science, Technology And Research Localisation robuste en temps réel par odométrie inertielle visuelle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HOLYNSKI ET AL.: "Reducing drift in structure from motion using extended features", INTERNATIONAL CONFERENCE ON 3D VISION (3DV, 2020, pages 51 - 60, XP033880260, Retrieved from the Internet <URL:https://arxiv.org/pdf/2008.12295.pdf?ref=https://githubhelp.com> [retrieved on 20220528], DOI: 10.1109/3DV50981.2020.00015 *
LEE ET AL.: "Real-time depth estimation using recurrent CNN with sparse depth cues for SLAM system", INTERNATIONAL JOURNAL OF CONTROL, AUTOMATION AND SYSTEMS, 2019, pages 206 - 216, XP036979429, Retrieved from the Internet <URL:https://eowjd0512.github.io/[2019%20fJCAS]%20Reaf-time%20Depth%20Estimation%20using%20Recurrent%20CNN%20with%20Sparse%20Depth%20Cues%20for%20SLAM%20System.pdf> [retrieved on 20220528], DOI: 10.1007/s12555-019-0350-8 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119354189A (zh) * 2024-12-26 2025-01-24 杭州电子科技大学 基于多维约束因子图的地磁矢量和ins融合导航方法

Also Published As

Publication number Publication date
CN116940969A (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
US11941831B2 (en) Depth estimation
US11064178B2 (en) Deep virtual stereo odometry
US11049270B2 (en) Method and apparatus for calculating depth map based on reliability
US20190220992A1 (en) Object pose estimating method and apparatus
US20190141247A1 (en) Threshold determination in a ransac algorithm
US20230121534A1 (en) Method and electronic device for 3d object detection using neural networks
US12307608B2 (en) Efficient GPU/CPU pipeline for providing augmented reality content
WO2022187753A1 (fr) Système d&#39;affinement de profondeur monoculaire guidé par slam à l&#39;aide d&#39;un apprentissage en ligne auto-supervisé
CN109300151B (zh) 图像处理方法和装置、电子设备
US12045998B2 (en) Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints
US11082633B2 (en) Method of estimating the speed of displacement of a camera
CN110910437A (zh) 一种复杂室内场景的深度预测方法
US10909764B2 (en) Providing augmented reality target images in a web browser
WO2021027543A1 (fr) Procédé et appareil d&#39;apprentissage de modèle basé sur une image monoculaire, et dispositif de traitement de données
EP3185212B1 (fr) Parametrage dynamique d&#39;un filtre a particule
CN120236003A (zh) 一种三维建模方法及装置
Ge et al. Vipose: Real-time visual-inertial 6d object pose tracking
WO2020092051A1 (fr) Redressement d&#39;obturateur roulant dans des images/ vidéos à l&#39;aide de réseaux neuronaux convolutionnels avec des applications à sfm/ slam avec des images/ vidéos d&#39;obturateur roulant
WO2024205880A1 (fr) Génération d&#39;images pour apprentissage de réseau neuronal
WO2022183137A1 (fr) Système de slam visuo-inertiel
Zhou et al. Resolution-sensitive self-supervised monocular absolute depth estimation
Yuan et al. Self-Supervised Monocular Depth Estimation with Depth-Motion Prior for pseudo-LiDAR
CN112116661B (zh) 一种高精地图的构建方法及装置
Kostusiak et al. Enhancing Visual Odometry with Estimated Scene Depth: Leveraging RGB-D Data with Deep Learning.
Porzi et al. An automatic image-to-DEM alignment approach for annotating mountains pictures on a smartphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22760593

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280017858.1

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22760593

Country of ref document: EP

Kind code of ref document: A1