[go: up one dir, main page]

CN116630392A - Visual SLAM method for coupling multi-target tracking - Google Patents

Visual SLAM method for coupling multi-target tracking Download PDF

Info

Publication number
CN116630392A
CN116630392A CN202310262036.6A CN202310262036A CN116630392A CN 116630392 A CN116630392 A CN 116630392A CN 202310262036 A CN202310262036 A CN 202310262036A CN 116630392 A CN116630392 A CN 116630392A
Authority
CN
China
Prior art keywords
target
pose
camera
information
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310262036.6A
Other languages
Chinese (zh)
Inventor
陈光柱
苟荣松
蒲鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202310262036.6A priority Critical patent/CN116630392A/en
Publication of CN116630392A publication Critical patent/CN116630392A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明提出了一种耦合多目标跟踪的视觉SLAM方法。该方法分为耦合视觉里程及前端以及图优化后端。首先,利用扩展卡尔曼滤波(EKF)在目标检测信息的基础之上对目标进行跟踪;然后,根据EKF获取的目标运动信息对当前特征进行分类,静态特征被用于获取相机的位姿信息,动态特征被用于获取目标的6‑DOF位姿信息;最后,建立了一个多元因子图优化模型,对相机位姿、目标位姿、静态3D点、运动上的动态3D点进行了联合优化。本发明针对SLAM系统在动态环境下的定位精度低、获取目标信息不足的问题,研究场景中的目标信息对机器人在定位、建图过程中的作用;本发明能够实现移动设备在动态环境中获取准确的相机位姿信息以及目标位姿信息,并对场景中的目标进行跟踪。

The invention proposes a visual SLAM method coupled with multi-target tracking. The method is divided into coupled visual mileage and front-end and graph optimization back-end. First, the Extended Kalman Filter (EKF) is used to track the target based on the target detection information; then, the current features are classified according to the target motion information obtained by the EKF, and the static features are used to obtain the pose information of the camera. The dynamic features are used to obtain the 6‑DOF pose information of the target; finally, a multivariate factor graph optimization model is established to jointly optimize the camera pose, target pose, static 3D points, and dynamic 3D points in motion. The present invention aims at the problem of low positioning accuracy and insufficient target information of the SLAM system in a dynamic environment, and studies the role of target information in the scene on the robot in the process of positioning and mapping; the present invention can realize mobile devices in a dynamic environment. Accurate camera pose information and target pose information, and track the target in the scene.

Description

一种耦合多目标跟踪的视觉SLAM方法A visual SLAM method coupled with multi-target tracking

技术领域technical field

本发明属于计算机视觉、SLAM领域,具体涉及一种耦合多目标跟踪的视觉SLAM方法The invention belongs to the fields of computer vision and SLAM, and in particular relates to a visual SLAM method coupled with multi-target tracking

背景技术Background technique

随着各种人工智能的发展,极大程度上推动了各种虚拟现实技术在各个领域的应用,比如移动机器人的自主导航、人机协作、工业级场景重建。同步定位与建图(SLAM)技术能够帮助机器人/传感器进行场景感知的同时实现自我的准确定位,得到了很大的发展,却在现实的中的应用相对较少。其主要原因是目前主流的SLAM方法依赖于场景中刚性目标保持全局静止,显然,这一条件十分苛刻。现实场景中存在许多运动的目标,使其成为一个动态环境。这就使得传统SLAM方法不再适用。近年来,出现部分优秀的方法专注于解决SLAM在动态场景下所面临的问题。这些方法中的绝大多数,都利用先验信息(比如场景中的语义信息或者运动结构信息)在定位和建图过程中将场景中的运动目标作为外点进行剔除。上述方法的主要目标都是将动态目标进行剔除的方式大幅度提高位姿的计算精度。显然,这些方法都丢失了场景中的动态目标的信息。With the development of various artificial intelligences, the application of various virtual reality technologies in various fields has been greatly promoted, such as autonomous navigation of mobile robots, human-machine collaboration, and industrial-level scene reconstruction. Simultaneous Localization and Mapping (SLAM) technology can help robots/sensors realize scene perception and accurate self-positioning at the same time. It has been greatly developed, but it has relatively few applications in reality. The main reason is that the current mainstream SLAM method relies on the rigid target in the scene to remain globally static. Obviously, this condition is very harsh. There are many moving objects in a real scene, making it a dynamic environment. This makes traditional SLAM methods no longer applicable. In recent years, some excellent methods have emerged focusing on solving the problems faced by SLAM in dynamic scenes. Most of these methods use prior information (such as semantic information or motion structure information in the scene) to remove moving objects in the scene as outliers in the process of localization and mapping. The main goal of the above method is to greatly improve the calculation accuracy of the pose by eliminating the dynamic target. Obviously, these methods lose the information of dynamic objects in the scene.

综上所述,针对以上研究的不足,提出了一种耦合多目标跟踪的视觉SLAM方法。To sum up, in view of the shortcomings of the above research, a visual SLAM method coupled with multi-target tracking is proposed.

发明内容Contents of the invention

鉴于上述问题,本发明的目的在于提供一种耦合多目标跟踪的视觉SLAM方法。In view of the above problems, the object of the present invention is to provide a visual SLAM method coupled with multi-target tracking.

一种耦合多目标跟踪的视觉SLAM方法,包括以下步骤:A visual SLAM method coupled with multi-target tracking, comprising the following steps:

步骤一:利用COCO数据集对实例分割网络Yolact训练;Step 1: Use the COCO dataset to train the instance segmentation network Yolact;

步骤二:设计二维目标跟踪器对场景中的运动目标进行跟踪;Step 2: Design a two-dimensional object tracker to track moving objects in the scene;

步骤三:对当前图像进行特征提取,利用跟踪到的目标信息对特征进行分类,将特征分为动态的目标特征以及静态的背景特征,静态特征被用于获取相机位姿;Step 3: Extract features from the current image, use the tracked target information to classify the features, divide the features into dynamic target features and static background features, and static features are used to obtain the camera pose;

步骤四:利用目标特征,基于最小重投影误差原理,获取目标的位姿信息;Step 4: Obtain the pose information of the target based on the minimum reprojection error principle by using the target feature;

步骤五:建立多元因子图模型,对相机位姿、目标位姿、静态3D点、目标上的动态3D点进行联合优化。Step 5: Establish a multi-factor graph model, and jointly optimize the camera pose, target pose, static 3D points, and dynamic 3D points on the target.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

1.相比于现有SLAM方法大幅度提高了动态环境下相机的位姿估计精度;1. Compared with the existing SLAM method, the pose estimation accuracy of the camera in a dynamic environment is greatly improved;

2.能够实时获取场景中的目标的运动信息。2. The movement information of the target in the scene can be obtained in real time.

附图说明Description of drawings

图1是COCO数据集示意图;Figure 1 is a schematic diagram of the COCO dataset;

图2是实例分割效果图及特征分类示意图;Figure 2 is a schematic diagram of instance segmentation effect diagram and feature classification;

图3是相机在连续帧的变换关系图;Fig. 3 is a transformation relationship diagram of the camera in consecutive frames;

图4是后端多元因子图模型结构图;Figure 4 is a structural diagram of the back-end multivariate factor graph model;

图5是获取的目标运动信息及相机运动轨迹示意图;FIG. 5 is a schematic diagram of acquired target motion information and camera motion trajectory;

具体实施方式Detailed ways

下面结合附图对本发明的技术方案进行详细说明。The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings.

一种耦合多目标跟踪的视觉SLAM方法,具体包括以下步骤:A visual SLAM method coupled with multi-target tracking, specifically comprising the following steps:

步骤一:利用COCO数据集对实例分割网络Yolact进行训练Step 1: Use the COCO dataset to train the instance segmentation network Yolact

步骤11:为了获取场景中的运动目标的实例掩码信息及检测框信息,在COCO实例分割数据集上对Yolact进行训练,数据集中人、机器人被认为是场景中的动态目标,数据集图片如图1所示;Step 11: In order to obtain the instance mask information and detection frame information of the moving objects in the scene, Yolact is trained on the COCO instance segmentation dataset. Humans and robots in the dataset are considered as dynamic objects in the scene. The images in the dataset are as follows: As shown in Figure 1;

步骤12:将Yolact输出的实例掩码信息及检测框信息通过ROS发送到SLAM端。Step 12: Send the instance mask information and detection frame information output by Yolact to the SLAM end through ROS.

步骤二:设计二维目标跟踪器对场景中的运动目标进行跟踪Step 2: Design a 2D target tracker to track moving targets in the scene

步骤21:利用Yolact输出的目标检测信息;Step 21: using the target detection information output by Yolact;

步骤22:对目标状态进行估计,定义为所需要估计的运动状态,其中/>表示目标mask中心的三维位置信息,/>表示目标在三个方向的速度信息,整个目标的状态估计过程可以表示为:Step 22: Estimate the target state, define is the motion state to be estimated, where /> Indicates the three-dimensional position information of the center of the target mask, /> Represents the speed information of the target in three directions, and the state estimation process of the entire target can be expressed as:

式中,表示需要估计的目标状态,F是状态转移矩阵,wi-1是状态转移过程中的过程噪声,/>是在状态估计过程中的不确定性,Q是过程噪声的协方差矩阵,是过程噪声的分布矩阵,Δt表示观测时间,/>是目标速度模型的协方差矩阵;In the formula, Represents the target state that needs to be estimated, F is the state transition matrix, w i-1 is the process noise in the state transition process, /> is the uncertainty in the state estimation process, Q is the covariance matrix of the process noise, is the distribution matrix of the process noise, Δt represents the observation time, /> is the covariance matrix of the target velocity model;

步骤23:对目标状态进行更新,更新过程主要利用卡尔曼增益系数Ki对目标的估计状态和/>测量值进行融合:Step 23: Update the target state. The update process mainly uses the Kalman gain coefficient K i to estimate the target state and /> The measurements are fused:

式中,Hi表示观测函数的雅克比矩阵,观测函数h(·)=[fα,fβ,fγ,1]能够实现目标从估计状态到测量状态的转变,其具体表达如下:In the formula, H i represents the observation function The Jacobian matrix of the observation function h(·)=[f α ,f β ,f γ ,1] can realize the transition of the target from the estimated state to the measured state, and its specific expression is as follows:

式中,[xo,yo,zo]表示相机中心点的坐标;In the formula, [x o , y o , z o ] represent the coordinates of the camera center point;

步骤24:更新卡尔曼增益系数和不确定性,主要过程如下式:Step 24: Update the Kalman gain coefficient and uncertainty, the main process is as follows:

式中,是观测噪声的协方差矩阵。In the formula, is the covariance matrix of the observation noise.

步骤三:利用FAST方法对当前图像进行特征提取,利用跟踪到的目标信息对特征进行分类,静态特征用于获取相机位姿Step 3: Use the FAST method to extract features from the current image, use the tracked target information to classify the features, and use the static features to obtain the camera pose

步骤31:对当前图片提取FAST特征,获取目标特征集;Step 31: extracting FAST features from the current picture to obtain the target feature set;

步骤32:利用步骤二获取的目标状态,可以获得目标的估计速度:Step 32: Using the target state obtained in step 2, the estimated speed of the target can be obtained:

是获取的目标第一估计速度,此外,根据该目标在上一帧的6-DOF位姿信息可以获得目标获得第二个估计速度: is the first estimated velocity of the acquired target, in addition, according to the 6-DOF pose information of the target in the previous frame You can get the target to get a second estimated velocity:

最后,利用下式获取目标的最终速度:Finally, the final velocity of the target is obtained using the following formula:

α,β在实验中被分别设置为0.4和0.6,最后若Vi j≥l,该目标上的特征被认为是动态特征;α and β are set to 0.4 and 0.6 respectively in the experiment, and finally if V i j ≥ l, the feature on the target is considered as a dynamic feature;

步骤33:从特征集中剔除动态特征,剩下的静态特征利用PnP求解相机位姿。Step 33: Eliminate dynamic features from the feature set, and use PnP to solve the camera pose for the remaining static features.

步骤四:利用目标特征,基于最小重投影误差原理,获取目标的位姿信息Step 4: Using the target features, based on the principle of minimum reprojection error, obtain the pose information of the target

步骤41:建立如图3所示的相机及目标的运动模型;Step 41: Establish the motion model of the camera and the target as shown in Figure 3;

步骤42:建立3D点的非线性变换模型,初步表示出目标在图像帧间的运动变换信息:Step 42: Establish a nonlinear transformation model of 3D points, and initially express the motion transformation information of the target between image frames:

上式中,表示目标上的3D点,/>表示在第i-1帧和第i帧的目标位姿信息,/>是相邻两帧之间的位姿变换;In the above formula, represents a 3D point on the target, /> Represents the target pose information of the i-1th frame and the i-th frame, /> is the pose transformation between two adjacent frames;

步骤43:根据相机投影模型,建立位于目标的上的3D点的重投影误差:Step 43: According to the camera projection model, establish the reprojection error of the 3D point on the target:

上式中,表示匹配上的特征点对所对应的像素点,/>表示投影函数,/>代表第i帧的相机位姿,同时,对于目标上的所有3D点,最小重投影误差可以表示为:In the above formula, Indicates the pixel corresponding to the matching feature point pair, /> represents the projection function, /> represents the camera pose of the i-th frame, meanwhile, for all 3D points on the target, the minimum reprojection error can be expressed as:

上式中,nd表示从目标上提取的特征总数,表示最终需要获取的位姿变换信息;In the above formula, n d represents the total number of features extracted from the target, Indicates the final pose transformation information that needs to be obtained;

步骤44:将上述最小重投影误差模型中映射到李代数空间中:Step 44: Map the above minimum reprojection error model into the Lie algebraic space:

式中,是/>在李代数空间的指数映射,即ρ的是目标平移向量,φ代表的是目标的旋转向量,最后,利用高斯-牛顿法对上式进行求解。In the formula, yes /> The exponential map in the Lie algebraic space, that is ρ is the translation vector of the target, and φ represents the rotation vector of the target. Finally, the Gauss-Newton method is used to solve the above formula.

步骤五:建立图优化模型,对相机位姿、目标位姿、静态3D点、目标上的动态3D点进行联合优化Step 5: Establish a graph optimization model, and jointly optimize the camera pose, target pose, static 3D points, and dynamic 3D points on the target

步骤51:建立如图4所示的因子图模型,因子图中包含目标3D点、静态3D点、相机位姿信息、目标位姿信息多种待优化变量;Step 51: Establish a factor graph model as shown in Figure 4. The factor graph includes target 3D points, static 3D points, camera pose information, target pose information and various variables to be optimized;

步骤52:建立因子图节点所对应多种误差:Step 52: Establish various errors corresponding to the nodes of the factor graph:

式中,表示静态3D点的投影误差,/>代表动态3D点的投影误差,/>代表目标的运动误差,ω表示相机位姿对于目标位姿的影响参数,相机位姿的测量误差则采用了现有方法;In the formula, Indicates the projection error of a static 3D point, /> represents the projection error of a dynamic 3D point, /> Represents the motion error of the target, ω represents the influence parameter of the camera pose on the target pose, and the measurement error of the camera pose adopts the existing method;

步骤53:将上述多种误差模型被联合定以为:Step 53: The above multiple error models are jointly determined as:

表示待优化变量的集合,最后采用Levenberg-Marquardt法对上式进行求解。 Represents the set of variables to be optimized, and finally uses the Levenberg-Marquardt method to solve the above formula.

整个网络结构完整描述如下:A complete description of the entire network structure is as follows:

步骤1:利用COCO数据集对实例分割网络Yolact训练;Step 1: Use the COCO dataset to train the instance segmentation network Yolact;

步骤2:设计二维目标跟踪器对场景中的运动目标进行跟踪;Step 2: Design a two-dimensional object tracker to track moving objects in the scene;

步骤3:对当前图像进行特征提取,利用跟踪到的目标信息对特征进行分类,将特征分为动态的目标特征以及静态的背景特征,静态特征被用于获取相机位姿;Step 3: Extract the features of the current image, classify the features by using the tracked target information, divide the features into dynamic target features and static background features, and static features are used to obtain the camera pose;

步骤4:利用目标特征,基于最小重投影误差原理,获取目标的位姿信息;Step 4: Using the target features, based on the principle of minimum reprojection error, obtain the pose information of the target;

步骤5:建立图优化模型,对相机位姿、目标位姿、静态3D点、目标上的动态3D点进行联合优化;Step 5: Establish a graph optimization model, and jointly optimize the camera pose, target pose, static 3D points, and dynamic 3D points on the target;

步骤6:获取如图5所示的目标的运动轨迹及相机运动轨迹。Step 6: Obtain the movement trajectory of the target and the movement trajectory of the camera as shown in FIG. 5 .

Claims (1)

1.一种耦合多目标跟踪的视觉SLAM方法,其特征在于,包括如下步骤:1. a visual SLAM method for coupling multiple target tracking, is characterized in that, comprises the steps: 步骤1:搭建耦合多目标跟踪的视觉SLAM方法框架;Step 1: Build a visual SLAM method framework coupled with multi-target tracking; 整个框架包括耦合里程计前端与图优化后端,耦合里程计前端包含实二维目标跟踪器、目标位姿估计以及相机位姿估计,图优化后端设计一个多元因子图对前端的目标位姿信息、相机位姿信息进行优化;The entire framework includes a coupled odometry front-end and a graph-optimized back-end. The coupled odometer front-end includes a real two-dimensional target tracker, target pose estimation, and camera pose estimation. Information and camera pose information are optimized; 步骤2:设计二维目标跟踪器;Step 2: Design a 2D object tracker; 基于场景中目标的检测信息,利用扩展卡尔曼滤波(EKF)对目标的运动状态进行估计;在第i时刻,对目标j的运动状态的估计值表示为:Based on the detection information of the target in the scene, the extended Kalman filter (EKF) is used to estimate the motion state of the target; at the i-th moment, the estimated value of the motion state of the target j Expressed as: 式中,wi-1是状态转移过程中的过程噪声,是需要跟踪的目标j最终运动状态,/>是目标j的mask所在的三维位置信息,/>是目标j在三个x,y,z三个方向轴的速度信息,/>是状态转移矩阵,I3×1是一个单位矩阵,Δt是对于目标运动状态的估计频率;在目标状态估计过程中的不确定性的估计值/>可以表示为:In the formula, w i-1 is the process noise in the state transition process, is the final motion state of the target j to be tracked, /> is the three-dimensional position information of the mask of the target j, /> is the speed information of the target j in the three directions of x, y, and z, /> is the state transition matrix, I 3×1 is an identity matrix, Δt is the estimated frequency for the target motion state; the estimated value of the uncertainty in the target state estimation process /> It can be expressed as: 式中,是目标j在第i-1时刻不确定性,Q=A×∑×AT with/> 代表在不确定性估计过程中的噪声项,/>是过程噪声的分布矩阵,/>是用来表示速度模型/>(车间中不同类型目标的运动速度)的协方差矩阵,I3×3是一个单位矩阵;接下来,目标j的运动状态/>的通过下式更新:In the formula, is the uncertainty of target j at time i-1, Q=A×∑× AT with/> represents the noise term in the uncertainty estimation process, /> is the distribution matrix of the process noise, /> is used to represent the speed model /> The covariance matrix of (moving speeds of different types of targets in the workshop), I 3×3 is an identity matrix; next, the moving state of target j /> is updated by the following formula: 式中,表示在第i时刻的卡尔曼增益系数,/>表示观测噪声的协方差矩阵,Ri对角线上的值代表目标j的mask中心的三维坐标的方差,/>表示目标j的观测值,即其mask中心所对应的三维坐标,/>是观测函数/>的雅克比矩阵,观测函数h(·)=[fα,fβ,fγ,1]通过下式表示:In the formula, Indicates the Kalman gain coefficient at the i-th moment, /> Represents the covariance matrix of the observation noise, the value on the diagonal of R i represents the variance of the three-dimensional coordinates of the mask center of the target j, /> Indicates the observed value of target j, that is, the three-dimensional coordinates corresponding to the center of its mask, /> is the observation function /> The Jacobian matrix of , the observation function h(·)=[f α ,f β ,f γ ,1] is expressed by the following formula: 式中,α,β,γ是目标j运动的欧拉角,观测函数h(·)能够实现目标j的观测值到运动状态估计值的映射/>[xo,yo,zo]表示相机中心点的世界坐标;In the formula, α, β, γ are the Euler angles of the movement of the target j, and the observation function h( ) can realize the observed value of the target j Mapping to Motion State Estimates /> [x o , y o , z o ] represents the world coordinates of the camera center point; 步骤3:利用FAST方法对当前图像进行特征提取,获取特征集,特征集中包含位于动态目标上的目标特征以及属于静态背景的静态特征;Step 3: Use the FAST method to perform feature extraction on the current image to obtain a feature set, which includes target features on the dynamic target and static features belonging to the static background; 步骤4:目标运动状态估计;Step 4: Target motion state estimation; 利用跟踪到的目标j的运动状态中速度分量Vi j能够获取目标j的线速度:Use the tracked motion state of target j The medium velocity component V i j can obtain the linear velocity of target j: 设定l作为场景中运动目标的速度阈值,如果目标的速度Vi j>l,则目标j被认为是场景中动态目标,则从目标j上提取的特征属于目标特征,若Vi j<l,则目标j被认为是静态的,从目标j上提取的特征属于静态特征;Set l as the speed threshold of the moving target in the scene. If the speed of the target V i j >l, the target j is considered to be a dynamic target in the scene, and the features extracted from the target j belong to the target feature. If V i j < l, the target j is considered static, and the features extracted from the target j are static features; 步骤5:相机位姿估计;Step 5: camera pose estimation; 利用Yolact输出的目标mask信息,对第i时刻图像中的所有静态特征的位置进行标记,将静态特征利用PnP方法获取相机的初始位姿 Use the target mask information output by Yolact to mark the position of all static features in the image at the i-th moment, and use the PnP method to obtain the initial pose of the camera for the static features 步骤6:目标位姿估计;Step 6: target pose estimation; 对于在第i时刻的目标j上的一个3D点可以根据相机投影模型获取其像素值Pi j,n以及其在自身参考系下的坐标/>于是/>在连续帧的重投影误差ζi表示为:For a 3D point on target j at time i According to the camera projection model, its pixel value P i j,n and its coordinates in its own reference system can be obtained /> So /> The reprojection error ζi at consecutive frames is expressed as: 式中,是第i帧时的相机位姿信息,/>是投影函数(相机内部参数),/>是目标在i-1时刻的目标位姿信息,/>代表该3D点在连续两帧中的SE(3)位姿变换;于是,对于目标j上的所有3D点的重投影误差e(Vi j)表示为:In the formula, is the camera pose information at frame i, /> is the projection function (camera internal parameters), /> is the target pose information of the target at time i-1, /> Represents the SE(3) pose transformation of the 3D point in two consecutive frames; then, the reprojection error e(V i j ) for all 3D points on the target j is expressed as: 式中,目标在任意图像之间的位姿变换,nd是从目标j上的3D点总数,最后,将上式在李代数空间下进行求解获取目标的位姿;In the formula, The pose transformation of the target between arbitrary images, n d is the total number of 3D points from the target j, and finally, the above formula is solved in the Lie algebraic space to obtain the pose of the target; 步骤7:建立多元因子图模型;Step 7: Establish a multivariate factor graph model; 将属于静态特征的3D投影属于目标特征的3D投影/>动态目标的位姿信息、相机的位姿信息作为多元因子图节点;3D projections that will belong to static features 3D projections belonging to target features /> The pose information of the dynamic target and the pose information of the camera are used as nodes in the multivariate factor graph; 步骤8:获取相机测量误差eiStep 8: Obtain the camera measurement error e i ; 采用ORB_SLAM2中的相机测量误差作为此处相机测量误差 Use the camera measurement error in ORB_SLAM2 as the camera measurement error here 步骤9:获取目标特征投影误差 Step 9: Obtain the target feature projection error 根据步骤6所述的目标j位姿信息目标j的目标特征投影误差/>表示为:According to the target j pose information described in step 6 Target feature projection error for target j /> Expressed as: 步骤10:获取静态特征的投影误差 Step 10: Obtain the projection error of static features 根据步骤5所述的相机位姿信息静态特征的投影误差/>表示为:According to the camera pose information described in step 5 Projection error of static features/> Expressed as: 式中,ψ(·)代表投影函数,是相机位姿信息,/>是/>对应的二维像素坐标;In the formula, ψ( ) represents the projection function, is the camera pose information, /> yes /> Corresponding two-dimensional pixel coordinates; 步骤11:获取目标j在连续图像帧之间的运动约束误差 Step 11: Obtain the motion constraint error of target j between consecutive image frames 根据步骤4所述的目标运动状态将目标在连续图像帧之间的运动误差表示为 According to the target motion state described in step 4 Express the motion error of the target between consecutive image frames as 式中,代表权利要求1所述的目标状态;In the formula, represents the target state as described in claim 1; 步骤12:建立联合误差约束模型;Step 12: Establish a joint error constraint model; 将上述多个约束模型构成一个因子图,每个约束模型作为因子图的一个节点,最后,因子图优化问题通过下式表示:The above multiple constraint models constitute a factor graph, and each constraint model is used as a node of the factor graph. Finally, the factor graph optimization problem is expressed by the following formula: 式中,Mc是具有共视关系的图像帧集合,Mo是在当前帧图像中跟踪到的目标总数,Ms从背景中提取的静态特征集,Md是从目标身上提取的动态特征集,li是Huber损失函数,∑Δt代表观测频率的协方差矩阵,是整个因子图中的待优化参数,最后利用Levenberg-Marquardt对上式求解,获取最优参数。In the formula, M c is a set of image frames with a common view relationship, M o is the total number of targets tracked in the current frame image, M s is the static feature set extracted from the background, and M d is the dynamic feature extracted from the target set, l i is the Huber loss function, ∑Δt represents the covariance matrix of the observation frequency, is the parameter to be optimized in the entire factor graph, and finally the above formula is solved by Levenberg-Marquardt to obtain the optimal parameter.
CN202310262036.6A 2023-03-17 2023-03-17 Visual SLAM method for coupling multi-target tracking Pending CN116630392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310262036.6A CN116630392A (en) 2023-03-17 2023-03-17 Visual SLAM method for coupling multi-target tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310262036.6A CN116630392A (en) 2023-03-17 2023-03-17 Visual SLAM method for coupling multi-target tracking

Publications (1)

Publication Number Publication Date
CN116630392A true CN116630392A (en) 2023-08-22

Family

ID=87640559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310262036.6A Pending CN116630392A (en) 2023-03-17 2023-03-17 Visual SLAM method for coupling multi-target tracking

Country Status (1)

Country Link
CN (1) CN116630392A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180161986A1 (en) * 2016-12-12 2018-06-14 The Charles Stark Draper Laboratory, Inc. System and method for semantic simultaneous localization and mapping of static and dynamic objects
CN115619824A (en) * 2022-09-29 2023-01-17 哈尔滨工业大学 A visual-inertial dynamic target tracking SLAM device, method, computer and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180161986A1 (en) * 2016-12-12 2018-06-14 The Charles Stark Draper Laboratory, Inc. System and method for semantic simultaneous localization and mapping of static and dynamic objects
CN115619824A (en) * 2022-09-29 2023-01-17 哈尔滨工业大学 A visual-inertial dynamic target tracking SLAM device, method, computer and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RONGSONG GOU等: "A Visual SLAM With Tightly Coupled Integration of Multiobject Tracking for Production Workshop", 《IEEE INTERNET OF THINGS JOURNAL》, vol. 11, no. 11, 22 February 2024 (2024-02-22), pages 19949 - 19962 *
朱文宇: "电力系统自主巡检机器人三维激光SLAM及重定位的设计与实现", 《CNKI中国优秀硕士毕业论文全文库(基础科学辑)》, no. 02, 15 February 2023 (2023-02-15), pages 005 - 652 *
苟荣松: "生产车间耦合多目标跟踪的视觉SLAM技术", 《CNKI中国优秀硕士毕业论文全文库(工程科技II辑)》, no. 06, 15 June 2025 (2025-06-15), pages 028 - 613 *

Similar Documents

Publication Publication Date Title
CN111325843B (en) Real-time semantic map construction method based on semantic inverse depth filtering
CN110349250B (en) RGBD camera-based three-dimensional reconstruction method for indoor dynamic scene
Tian et al. Research on multi-sensor fusion SLAM algorithm based on improved gmapping
CN111288989B (en) Visual positioning method for small unmanned aerial vehicle
CN111899280B (en) Monocular Visual Odometry Method Using Deep Learning and Hybrid Pose Estimation
Jia et al. A Survey of simultaneous localization and mapping for robot
CN115355901B (en) Multi-machine joint mapping method integrating dynamic target perception
CN115420276B (en) A multi-robot collaborative localization and mapping method for outdoor scenes
CN107392964A (en) The indoor SLAM methods combined based on indoor characteristic point and structure lines
CN112525197B (en) Fusion Pose Estimation Method for Ultra-Broadband Inertial Navigation Based on Graph Optimization Algorithm
CN110533716A (en) A Semantic SLAM System and Method Based on 3D Constraints
CN106780484A (en) Robot interframe position and orientation estimation method based on convolutional neural networks Feature Descriptor
CN110032965A (en) Vision positioning method based on remote sensing images
CN116878501A (en) A high-precision positioning and mapping system and method based on multi-sensor fusion
CN115218889A (en) Multi-sensor indoor positioning method based on dotted line feature fusion
CN108053445A (en) The RGB-D camera motion methods of estimation of Fusion Features
CN112731503A (en) Pose estimation method and system based on front-end tight coupling
CN116045965A (en) Multi-sensor-integrated environment map construction method
CN115482282A (en) Dynamic SLAM method with multi-target tracking capability in autonomous driving scenarios
CN116310128A (en) Monocular Multi-Object SLAM Method Based on Instance Segmentation and 3D Reconstruction in Dynamic Environment
CN118225096A (en) Multi-sensor SLAM method based on dynamic feature point elimination and loop detection
CN118746293A (en) High-precision positioning method based on multi-sensor fusion SLAM
CN112432653B (en) Monocular visual inertial odometry method based on point and line features
CN117036408A (en) Object SLAM method combining multi-target tracking under dynamic environment
CN115930943B (en) SLAM method and system for fusing monocular vision and IMU based on graph optimization and EKF framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination