CN119055358A

CN119055358A - A surgical operation force feedback guidance method based on virtual marker tracking and instrument posture

Info

Publication number: CN119055358A
Application number: CN202411580088.9A
Authority: CN
Inventors: 莫杭杰; 方进; 杨善林; 李霄剑; 李玲; 肖夕林; 王昕�
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2024-11-07
Filing date: 2024-11-07
Publication date: 2024-12-03
Anticipated expiration: 2044-11-07
Also published as: CN119055358B

Abstract

The invention provides a surgical operation force feedback guiding method, a system, a storage medium and electronic equipment based on virtual mark tracking and instrument pose, and relates to the field of medical instruments. According to the invention, a real-time three-dimensional point cloud reconstruction is carried out on an in-vivo scene, a virtual mark dynamic tracking method based on key frame guidance is used for tracking a surgical path marked on a first image frame by a doctor, a three-dimensional surgical path is obtained based on the real-time three-dimensional point cloud, a 6D pose method of a surgical instrument based on 3D Gaussian tracking is combined, and finally a guiding type virtual clamp along the dynamic three-dimensional surgical path is established, so that the doctor can be effectively assisted in completing the operation along a preset surgical path through a force feedback mechanism, and man-machine co-fusion operation conforming to the operation intention of the doctor is realized.

Description

Surgical operation force feedback guiding method based on virtual mark tracking and instrument pose

Technical Field

The invention relates to the field of medical instruments, in particular to a surgical operation force feedback guiding method, a system, a storage medium and electronic equipment based on virtual mark tracking and instrument pose.

Background

Robotic surgery has gained widespread acceptance by surgeons as a minimally invasive surgical technique due to its high degree of precision. Among other things, force haptic feedback is a major challenge facing current surgical robots.

In the related art, patent CN202211328226.5 discloses a force feedback method and device for a surgical robot main manipulator, which comprises the steps of obtaining first force feedback data and scene data of the surgical robot main manipulator, generating a virtual protection domain corresponding to the surgical robot main manipulator based on the scene data, generating target force feedback data by using an impedance control algorithm based on the first force feedback data and the virtual protection domain, and sending a first instruction to a force feedback device controller, wherein the first instruction is used for instructing the force feedback device controller to control the surgical robot main manipulator based on the target force feedback data.

However, in the original scheme, aiming at the establishment of a tension field in a virtual protection domain, a marking path needs to be planned in advance, and the marking path is difficult to be accurately and real-time confirmed due to large intra-operative environmental change.

Disclosure of Invention

(One) solving the technical problems

Aiming at the defects of the prior art, the invention provides a surgical operation force feedback guiding method, a system, a storage medium and electronic equipment based on virtual mark tracking and instrument pose, which solve the technical problems that a mark path needs to be planned in advance for the establishment of a pull field in a virtual protection domain, and the mark path is difficult to be accurately and real-timely confirmed due to large intra-operative environmental change.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme:

a surgical operation force feedback guiding method based on virtual mark tracking and instrument pose, comprising:

acquiring a three-dimensional point cloud of an in-vivo scene based on depth estimation;

The method for estimating the 6D pose of the instrument based on the 3D Gaussian sputtering obtains the optimal 6D pose of the instrument of the current frame image so as to estimate the position of the tail end of the surgical instrument;

Tracking a surgical path marked on a first image frame by a doctor based on a virtual mark dynamic tracking method guided by a key frame to acquire a tracking path of a current frame image, wherein the tracking path is represented by a two-dimensional target point set;

Acquiring a three-dimensional operation path based on the three-dimensional point cloud and the two-dimensional target point set;

Searching a point cloud position closest to the tail end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, acquiring a manual vector of the tail end position of the surgical instrument, and establishing a guiding type virtual force field along the three-dimensional surgical path based on the manual vector so as to guide a doctor to operate the surgical robot system through a force feedback mechanism.

Preferably, the method for estimating the 6D pose of the instrument based on 3D Gaussian sputtering acquires the optimal 6D pose of the instrument of the current frame image so as to estimate the position of the tail end of the surgical instrument, and comprises the following steps:

The method comprises the steps of reading a current frame image, obtaining an instrument mask through a semantic segmentation algorithm, carrying out depth estimation on the current frame image by utilizing a binocular depth estimation algorithm, and respectively obtaining a tissue depth map and an instrument depth map by combining the instrument mask;

initializing the pose of the instrument, and performing 3D Gaussian sputtering by using a pre-reconstructed instrument Gaussian model to obtain a view, a projection mask and a depth map under the corresponding pose;

The method comprises the steps of constructing a total loss function according to a rendering result, a current frame image, an instrument mask and an instrument depth map thereof, updating and learning the instrument pose through a back propagation algorithm, minimizing the total loss function, and obtaining the optimal 6D pose of the instrument of the current frame image;

And determining the position of the instrument tail end corresponding to the Gaussian kernel based on the optimal 6D pose of the instrument of the current frame image, and taking the position as the position of the surgical instrument tail end.

Preferably, the initializing the instrument pose comprises:

For the first frame image, combining the camera model to obtain a rendering view of the instrument Gaussian model on an image plane, enabling the image to coincide with the area where the rendering view is located by moving the cavity mirror, and taking the instrument pose at the moment as an initial instrument pose of the first frame image;

for images other than the first frame, the instrument pose of the current frame image is initialized based on the optimal instrument pose of the previous frame image and optical flow information of two continuous frames.

Preferably, the reconstruction process of the Gaussian model of the instrument comprises the following steps:

Acquiring images of a plurality of known camera poses and including all view angles of the instrument so as to reconstruct sparse point clouds of the instrument;

initializing each point of the sparse point cloud into a Gaussian, and acquiring an initialized instrument Gaussian model;

And carrying out iterative optimization on the initialized instrument Gaussian model by combining the image sequences which are formed by the images with the known camera poses and comprise all view angles of the instrument and the corresponding camera poses, and obtaining the reconstructed instrument Gaussian model.

Preferably, the iterative optimization of the initialized instrument gaussian model to obtain the reconstructed instrument gaussian model by combining the image sequence composed of the images of the known camera poses and including all view angles of the instrument and the corresponding camera poses comprises:

The color consistency loss function between constructing the 3D gaussian rendering view and the original image retaining only the instrument portion is as follows:

Wherein, Respectively representing the center position, rotation and size of any Gaussian; L is the number of images included in the image sequence, L ₁ represents a norm loss; Representing the process of 3D Gaussian micro-rendering an image when the pose of the camera is R _l, G _i represents the ith Gaussian point, I is the number of gauss; representing an original image derived from the sequence of images leaving only the instrument portion;

Introducing a multi-region alignment loss function, wherein the projection shape of each hinged part on the reconstructed instrument Gaussian model on the image plane and an instrument mask M _lc on the original image are respectively aligned, and the multi-region alignment loss function is as follows:

Wherein, C represents the number of articulating components of the instrument; a process of performing 3D gaussian micro-renderable projection map when the camera pose is R _l;

the total loss function of the instrument Gaussian model reconstruction process is as follows:

wherein alpha represents a corresponding weight coefficient;

Obtaining the reconstructed instrument Gaussian model by minimizing the total loss function L _recons The following is shown:

Wherein, Representing the i-th reconstructed gaussian point; The value of the variable representing the smallest function value.

Preferably, the method comprises the steps of constructing a total loss function according to a rendering result, a current frame image, an instrument mask and an instrument depth map thereof, updating and learning the instrument pose through a back propagation algorithm, minimizing the total loss function, and obtaining an optimal 6D pose of the instrument of the current frame image, wherein the method comprises the following steps:

A color consistency loss function between the 3D gaussian rendered view and the current frame image is constructed as follows:

wherein T represents the current time, T _t represents the pose of the current frame instrument; Representing a rendered view; , Representing a process of performing a 3D Gaussian micro-renderable image, M _t representing an instrument mask for current image segmentation;

Constructing a region alignment loss function between the 3D gaussian micro-renderable projection map and an instrument mask of the current frame image, as follows:

Wherein, An instrument mask respectively representing the projection shape of the c-th hinge part on the instrument Gaussian model on the image plane and the current frame image;, a process for performing 3D gaussian micro-renderable projection maps; ;

a loss of depth consistency between the instrument depth map constructing the rendered depth map and the prior frame image is as follows:

Wherein, An instrument depth map representing the rendered depth map and the previous frame image, respectively;, representing a process of rendering a depth map of the instrument gaussian model;

The total loss function is constructed as:

Wherein, alpha ₁、α₂、α₃ respectively represents corresponding weight coefficients;

Then, the instrument pose T _t is iteratively updated through a back propagation algorithm to minimize L _pose, and an instrument optimal 6D pose estimation result of the current frame image is obtained The following is shown:

preferably, the method for dynamically tracking virtual marks based on key frame guidance tracks a surgical path marked on a first image frame by a doctor to acquire a tracking path of a current frame image, and comprises the following steps:

Receiving an operation path marked on a first image frame by a doctor, wherein the operation path is represented by adopting a two-dimensional target point set at a pixel level;

acquiring a target area where a two-dimensional target point set on a first image frame is located, predicting the track of each point by utilizing an optical flow algorithm, and acquiring the two-dimensional target point set on a plurality of later image frames and the target area where the two-dimensional target point set is located so as to initialize a key frame pool;

Traversing the key frame pool aiming at the current frame image, inquiring similar key frames by using a sparse matching method, and acquiring an optimal template based on the sparse matching result between the inquired similar key frames and the current frame image;

when the optimal template is not an empty set, acquiring a tracking area on the current frame image based on a target area on each similar key frame in the optimal template;

predicting the motion trail of each point by utilizing an optical flow algorithm based on the two-dimensional target point set in each target area to obtain the two-dimensional target point set on the corresponding tracking area;

And when the current frame image is judged to be a new key frame based on preset conditions, the current frame image is put into the key frame pool, wherein the preset conditions comprise:

(1) A complete tracking path is obtained compared to the surgical path;

(2) Acquiring a tracking area where the final two-dimensional target point set is located, wherein the center of the tracking area is always within a preset range of the current frame image;

(3) If the similar key frame in the optimal template is greater than one, placing the last new key frame into the key frame pool at a distance exceeding a first preset number of image frames;

If the similar key frame in the optimal template is equal to one, placing the last new key frame into the key frame pool, wherein the distance exceeds a second preset number of image frames, and the ratio of the length of the tracking path to the length of the operation path is consistent with the scale change of the image frames where the tracking path and the operation path are positioned;

(4) The maximum capacity of the keyframe pool is not exceeded.

Preferably, the initialization key frame pool specifically comprises:

By using Representing a key frame, wherein j is a key frame index in a key frame pool; Representing RGB values, W, H representing width and height, respectively; representing a target area; 、 respectively represent extraction by using descriptor-based sparse matching method Feature points and feature descriptions in (3); Representing a two-dimensional set of target points;

Designating the 0 th image frame F ₀ as a key frame K ₀ E K, defining a two-dimensional target point set thereon as ,Represents the coordinates of the s-th pixel point, n represents the number of the pixel points,Representing a real set;

To be used for A region being the centerGenerated by the following formula:

wherein the center point ,Is thatAn arithmetic average of the maximum and minimum values of the abscissa of all pixel points,Is thatThe arithmetic average value of the maximum value and the minimum value of the ordinate of all pixel points in the image, L is the side length of the target area, and L is not less thanDifferences between maximum and minimum values of the abscissa and the ordinate of all pixel points in (a);

Defining a target area Storing four vertices and center pointsExpressed asWhereinFrom the following componentsIt can be seen that the representationUpper left, upper right, lower left and lower right;

Predicting motion V _0→1 E of two later image frames F ₁ and F ₂ by optical flow algorithm ,V_0→2∈And calculates each pixel point on the operation path according to the optical flow resultTrajectories on F ₁ and F ₂:

wherein Interp (∙) represents a sampling process to obtain a corresponding set of target points AndAnd referring to formula (1), acquiring a target regionAnd;

The F ₁、F₂ correspondence is noted as key frame K ₁、K₂, together with K ₀, constitutes an initial key frame pool p= { K ₀,K₁,K₂ }.

Preferably, the step of traversing the key frame pool for the current frame image, performing similar key frame query by using a sparse matching method, and acquiring an optimal template based on a sparse matching result between the queried similar key frame and the current frame image comprises the following steps:

acquiring characteristic points and characteristic descriptions of the current frame image F _i;

Traversing each key frame K _j in the key frame pool, and acquiring a matching point between F _i and K _j by using a sparse matching method;

counting the number n _ji of the matching points of the target area on the K _j, selecting three key frames with the maximum n _ji as similar key frames K _simi={K_a,K_b,K_c } ⊆ P,

The following formula is adopted to obtain the optimal template:

Wherein K _j represents the j-th key frame in the key frame pool, j has the value of a, b or c, and n _ji represents the target area on K _j The number of matching points between the current frame image F _i and the current frame image F _i, wherein sigma is a first threshold value, and K _simi represents all queried similar key frames;

And/or

The method for acquiring the tracking area on the current frame image based on the target area on each similar key frame in the optimal template specifically comprises the following steps:

Based on the matching point between K _j⊆K_best and F _i, the accuracy of estimating homography matrix H _ji,H_ji by using RANSAC is determined by N _in is the number of inner points;

When (when) Not less than lambda, 0< lambda <1, lambda being a second threshold, for the target regionFour vertices ofHomography conversion is carried out to obtainCorresponding position on F _i :

Otherwise, carrying out vertex coordinate conversion by using a neighborhood similarity transformation method:

Extracting a set of matches of F _i and K _j from sparse matching results ,x_im∈And y _jm EIs a two-dimensional column vector representing the feature point locations;

Searching for target area vertices N neighbors of (a)Obtaining corresponding matching pointsTo obtain a dotCorresponding position on F _i :

Wherein the method comprises the steps ofW _t andAndThe distance between them is inversely proportional;

based on vertex coordinates An initial region is determined, expressed as:

Wherein, 、Respectively representCorresponding position on F _i U _min、u_max represents the maximum and minimum values of the abscissa of the four vertices, respectively, v _min、v_max represents the maximum and minimum values of the ordinate of the four vertices, respectively;

determining a tracking area based on coordinates of four vertices of the initial area Is the center point of (2);

Wherein, 、Respectively represent tracking areasIs the center point of (2)Is the abscissa and ordinate of (2);

Referring to formula (1), a tracking area on the current frame image is acquired 。

Preferably, the method comprises predicting motion trajectories of points by optical flow algorithm based on two-dimensional target point sets in each target area to obtain two-dimensional target point sets on corresponding tracking areas, merging the two-dimensional target point sets on each tracking area to obtain final two-dimensional target point sets, and specifically comprises:

Two-dimensional target point set Coordinates of each point p _j,s in the whole frame K _j and relative target areaCoordinates of (c)Has the following mapping relation:

Optical flow in target area And tracking areaIn the above, the optical flow prediction result is V _j→i, and the sparse matching result and the optical flow prediction result should be consistent, and the difference between the two is calculated by the following formula:

if d is less than or equal to gamma, and gamma is a third threshold value, the tracking area Target point set onP _ji,s is obtained by the following method:

Otherwise the first set of parameters is selected, Not participate inIs calculated;

Merging two-dimensional target point sets on each tracking area Obtaining a final two-dimensional target point setExpressed as:

wherein lambda _ji is the weight, and =1, Λ _ji is proportional to n _ji.

Preferably, the searching the point cloud position closest to the end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, and obtaining the artificial vector of the end position of the surgical instrument, comprises:

searching the end position of the surgical instrument on the three-dimensional surgical path by using a K-nearest neighbor algorithm Nearest point cloud locationObtaining the relative distance x _d between the two points;

Providing that all points on the three-dimensional surgical path are contained in a set of virtual marker points V _r for the surgical instrument tip position Distance on three-dimensional surgical pathPoint-closest point cloud locationFind the set of sub-points of V _r WhereinTo additionally investigate the radius;

The surgical instrument end position The corresponding artificial vector is expressed in the form:

The artificial vector length function is calculated as follows:

Wherein, Is the position of the tail end of the surgical instrumentThe distance between the virtual mark point v and the virtual mark point v, R and R are upper and lower boundaries of the distance respectively, when the distance is smaller than or equal to R, the maximum weight q _max is allocated to the virtual mark point v, when the distance is larger than R and smaller than R, the weight of the virtual mark point v is between q _max and 0, and when the distance is larger than or equal to R, the weight of the virtual mark point v is 0; The representation is composed of Vector pointing to point v, and card (∙) represents the cardinality or number of elements of the collection.

Preferably, the guided virtual force field is expressed as:

Wherein K _vf represents the proportionality coefficient of attraction force and artificial vector, D _vf(x_d) represents the damping coefficient Representation ofIs a function of the speed of the machine.

A surgical operative force feedback guidance system based on virtual marker tracking and instrument pose, comprising:

the depth estimation module is used for acquiring a three-dimensional point cloud of the in-vivo scene based on the depth estimation;

the pose estimation module is used for acquiring the optimal 6D pose of the instrument of the current frame image based on a 3D Gaussian sputtering instrument 6D pose estimation method so as to estimate the tail end position of the surgical instrument;

The mark tracking module is used for tracking the operation path marked on the first image frame by a doctor based on a virtual mark dynamic tracking method guided by the key frame to acquire a tracking path of the current frame image, wherein the tracking path is represented by a two-dimensional target point set;

The path conversion module is used for acquiring a three-dimensional operation path based on the three-dimensional point cloud and the two-dimensional target point set;

The force field establishment module is used for searching a point cloud position closest to the tail end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, acquiring an artificial vector of the tail end position of the surgical instrument, and establishing a guiding type virtual force field along the three-dimensional surgical path based on the artificial vector so as to guide a doctor to operate the surgical robot system through a force feedback mechanism.

A storage medium storing a computer program for surgical operation force feedback guidance based on virtual marker tracking and instrument pose, wherein the computer program causes a computer to execute the surgical operation force feedback guidance method as described above.

An electronic device, comprising:

The surgical force feedback guidance system comprises one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the surgical force feedback guidance method as described above.

(III) beneficial effects

The invention provides a surgical operation force feedback guiding method, a system, a storage medium and electronic equipment based on virtual mark tracking and instrument pose. Compared with the prior art, the method has the following beneficial effects:

According to the invention, a real-time three-dimensional point cloud reconstruction is carried out on an in-vivo scene, a virtual mark dynamic tracking method based on key frame guidance is used for tracking a surgical path marked on a first image frame by a doctor, a three-dimensional surgical path is obtained based on the real-time three-dimensional point cloud, a 6D pose method of a surgical instrument based on 3D Gaussian tracking is combined, and finally a guiding type virtual clamp along the dynamic three-dimensional surgical path is established, so that the doctor can be effectively assisted in completing the operation along a preset surgical path through a force feedback mechanism, and man-machine co-fusion operation conforming to the operation intention of the doctor is realized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a surgical force feedback guidance method based on virtual marker tracking and instrument pose according to an embodiment of the present invention;

fig. 2 is an overall flowchart of an instrument 6D pose estimation method based on 3D gaussian sputtering according to an embodiment of the present invention;

FIG. 3 is a schematic view of a coordinate system for articulated motion according to an embodiment of the present invention;

fig. 4 is an overall flowchart of a dynamic tracking method of virtual marks based on key frame guidance according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the application solves the technical problems that a marking path needs to be planned in advance for the establishment of a pull field in a virtual protection domain, and the marking path is difficult to be accurately and real-timely confirmed due to large in-vivo environmental change in an operation by providing a surgical operation force feedback guiding method, a system, a storage medium and electronic equipment based on virtual marking tracking and instrument pose.

The technical scheme in the embodiment of the application aims to solve the technical problems, and the overall thought is as follows:

In the embodiment of the invention, the real-time three-dimensional point cloud reconstruction is carried out on the in-vivo scene, the virtual mark dynamic tracking method based on key frame guidance is used for dynamically tracking the operation path marked on the first image frame by a doctor, the three-dimensional operation path is obtained based on the real-time three-dimensional point cloud, and the guiding virtual clamp along the dynamic three-dimensional operation path is finally established by combining the 6D pose method of the operation instrument based on the 3D Gaussian tracking, so that the doctor can be effectively helped to complete the operation along the preset operation path by a force feedback mechanism, and the man-machine co-fusion operation conforming to the operation intention of the doctor is realized.

In particular, a binocular depth estimation-based instrument 6D pose estimation method is provided. Aiming at the difficulties of 6D pose estimation and tracking of the instrument caused by the characteristics of fast motion, low texture, serious reflection and the like of the surgical instrument and the problem of reduced tissue depth estimation precision caused by the existence of the instrument, the three-dimensional position of the instrument is accurately estimated, and the complete three-dimensional reconstruction of a surgical scene and the three-dimensional position estimation and tracking of the surgical instrument can be realized.

And a key frame guided virtual mark dynamic tracking method is provided, so that the two-dimensional virtual mark on the laparoscopic image is tracked with pixel-level precision, and the intuitiveness and the accuracy of information transmission are ensured. The two-stage tracking method adopted reduces error tracking. In addition, the diversity of the key frame pool is enriched by dynamically updating the key frame pool, so that the robustness and accuracy of dynamic tracking of the virtual mark are further improved.

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

Example 1:

as shown in fig. 1, an embodiment of the present invention provides a surgical operation force feedback guiding method based on virtual mark tracking and instrument pose, including:

s1, acquiring a three-dimensional point cloud of an in-vivo scene based on depth estimation;

S2, acquiring the optimal 6D pose of the instrument of the current frame image by using a 3D Gaussian sputtering-based instrument 6D pose estimation method so as to estimate the tail end position of the surgical instrument;

S3, tracking a surgical path marked on a first image frame by a doctor based on a virtual mark dynamic tracking method guided by a key frame to acquire a tracking path of a current frame image, wherein the tracking path is represented by a two-dimensional target point set;

s4, acquiring a three-dimensional operation path based on the three-dimensional point cloud and the two-dimensional target point set;

S5, searching a point cloud position closest to the tail end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, obtaining an artificial vector of the tail end position of the surgical instrument, and establishing a guiding type virtual force field along the three-dimensional surgical path based on the artificial vector so as to guide a doctor to operate the surgical robot system through a force feedback mechanism.

In the embodiment of the invention, the real-time three-dimensional point cloud reconstruction is carried out on the in-vivo scene, the virtual mark dynamic tracking method based on key frame guidance is used for tracking the operation path marked on the first image frame by a doctor, the three-dimensional operation path is obtained based on the real-time three-dimensional point cloud, the 6D pose method of the operation instrument based on 3D Gaussian tracking is combined, and finally, the guiding type virtual clamp along the dynamic three-dimensional operation path is established, so that the doctor can be effectively helped to complete the operation along the preset operation path through a force feedback mechanism, and the man-machine co-fusion operation conforming to the operation intention of the doctor is realized.

The steps of the above scheme will be described in detail as follows:

in step S1, a three-dimensional point cloud of an in-vivo scene is acquired based on the depth estimation.

In this step, a depth estimation algorithm may be used to obtain a three-dimensional point cloud V _d of the in-vivo scene based on a sequence of image frames acquired by laparoscopic surgery that show the in-vivo scene.

By way of example, a neural network may be used herein for depth estimation to present a three-dimensional field of view to a physician, creating a surgical experience that is intuitive to the physician.

In step S2, an instrument 6D pose estimation method based on 3D gaussian sputtering obtains an instrument optimal 6D pose of the current frame image to estimate a surgical instrument end position.

It can be understood that the estimation result of the instrument pose, the relative position of the instrument and the tissue and the interaction state can provide high-precision positioning and navigation information in real time, thereby being beneficial to improving the accuracy and safety of the minimally invasive surgery. Based on the above, the embodiment of the invention provides an instrument-tissue interaction distance estimation method based on binocular depth estimation, instrument 6D pose estimation and depth complementation, which specifically comprises the following steps:

S21, reading a current frame image, and acquiring an instrument mask through a semantic segmentation algorithm, carrying out depth estimation on the current frame image by utilizing a binocular depth estimation algorithm, and respectively acquiring a tissue depth map and an instrument depth map by combining the instrument mask;

S22, initializing the pose of the instrument, and performing 3D Gaussian sputtering by using a pre-reconstructed instrument Gaussian model to obtain a view, a projection mask and a depth map under the corresponding pose;

S23, constructing a total loss function according to the rendering result, the current frame image, an instrument mask and an instrument depth map thereof, updating and learning the instrument pose through a back propagation algorithm, minimizing the total loss function, and obtaining the optimal 6D pose of the instrument of the current frame image;

s24, determining the position of the instrument end corresponding to the Gaussian kernel based on the optimal 6D pose of the instrument of the current frame image, and taking the position as the position of the surgical instrument end.

According to the scheme, the instrument pose estimation is estimated by aligning the instrument Gaussian model and the image characteristics, so that the 6D pose of the instrument can be accurately and robustly recovered in a complex operation scene, and the instrument pose estimation is realized on the basis of a 3D Gaussian sputtering frame, so that the instrument pose can be rapidly recovered in a shorter iteration time and a small number of iteration times.

Fig. 2 shows an overall flowchart of the above scheme, and the method for estimating the pose and depth of the instrument 6D according to the embodiment of the present invention includes, but is not limited to, the pose estimation and the relative distance estimation of the instrument-tissue of the surgical instrument 6D used in laparoscopic surgery or robot-assisted surgery. The method mainly comprises depth estimation and instrument segmentation based on binocular laparoscopic images, 3D Gaussian model reconstruction of surgical instruments, 6D pose estimation of the surgical instruments based on 3D Gaussian optimization and the like.

The details of step S2 will be described in detail below in conjunction with fig. 2:

in step S21, reading the current frame image, obtaining an instrument mask through a semantic segmentation algorithm, carrying out depth estimation on the current frame image by utilizing a binocular depth estimation algorithm, and respectively obtaining a tissue depth map and an instrument depth map by combining the instrument mask;

in the step, the image is subjected to multi-value segmentation by utilizing a semantic segmentation algorithm (random forest, SAM, perSAM and the like) to obtain an instrument mask on the image Where K represents the number of articulations according to the instrument. And performing depth estimation on the endoscope image by using a binocular depth estimation method (MIOL algorithm, CREStereo algorithm and the like) to obtain a depth map D _t, and combining the instrument mask M _t to obtain a tissue depth map D _t(1-M_t of a non-instrument shielding region and an instrument depth map D _tM_t of an instrument visible part. Where t represents the current time.

In step S22, the pose of the instrument is initialized, and 3D gaussian sputtering is performed by using the pre-reconstructed instrument gaussian model, so as to obtain a view, a projection mask and a depth map under the corresponding pose.

Note that 3D gaussian sputter rendering is mainly used for new view synthesis, where the 3D position of an object represented by a 3D gaussian is represented by the center position of each gaussianRotatingSize of the productColor ofThe density sigma is determined jointly, and is not suitable for expressing the 6D pose transformation of a dynamic object.

Accordingly, the embodiment of the invention establishes a 3D Gaussian model of the surgical instrument (i.e. a reconstructed instrument Gaussian model) in advance, so that the 3D Gaussian optimization is converted into Gaussian pose optimization, i.e. the pose optimization of the surgical instrument.

Specifically, the reconstruction process of the instrument Gaussian model comprises the following steps:

Firstly, acquiring images of a plurality of known camera poses and including all view angles of the instrument so as to reconstruct sparse point clouds of the instrument, wherein the method specifically comprises the following steps:

a) Fixing the position of the surgical instrument, aligning the camera with the instrument and shooting the camera around the instrument to obtain a plurality of instrument images I _l containing all view angles, L is the number of images. However, the characteristics of the surgical instrument such as no texture, specular reflection and the like result in that the relative camera pose cannot be easily solved through the image features between adjacent frames. The feature points are provided by attaching marker points to the instrument or by fixing a checkerboard in the field of view.

B) Reconstructing a sparse point cloud of the photographed scene through sfm, and enabling the image to correspond to 2D-3D points of the sfm point cloud.

C) Scaling the sfm point cloud according to the real scale of the instrument, recovering the absolute scale of the sparse point cloud, and solving the absolute camera pose R _l of each image by using the 2D-3D corresponding points.

D) The segmentation algorithm is utilized to segment the instrument to obtain a segmentation mask of the instrument, which is used for removing the background outside the instrument in the image to obtain the image with the instrument only。

E) Image known from camera pose R _l The sparse point cloud M _sfm of the instrument was reconstructed using sfm algorithm for initialization of the 3D gaussian model reconstruction.

Each point of the sparse point cloud is then initialized to a gaussian G _i,Each Gaussian has the property that the center position of the Gaussian isRotatingSize of the productColor ofDensity σ, probability f of belonging to the foreground.

It should be noted that the parameters to be optimized in the reconstruction process of the Gaussian model of the surgical instrument are only the center position of each GaussianRotatingSize of the product。

In addition, the instrument Gaussian model { G _i } at this stage is initialized by initializing each point of the instrument point cloud M _sfm to a Gaussian G _i,I is the number of gauss, the position and the color of the point cloud are initialized to be the central position u and the color c of the gauss, the rotation q of the gauss is initialized to be {1,0 }, the distance from each point to the nearest neighbor is initialized to be the size of the gauss, the density sigma of the gauss is initialized to be 1, and the probability f of belonging to the foreground is initialized to be 1.

On the basis, the image sequence formed by combining the images of the known camera poses and including all view angles of the instrument and the corresponding camera poses thereof iteratively optimizes the initialized instrument Gaussian model to obtain the reconstructed instrument Gaussian model, and the method comprises the following steps:

Wherein L ₁ represents a norm loss; representing an original image derived from the sequence of images leaving only the instrument portion; The camera pose is represented as R _l.

In addition, in order to ensure that the reconstructed instrument Gaussian model is of a hinged structure, the embodiment of the invention introduces a multi-region alignment loss function, wherein the projection shape of each hinged part on the reconstructed instrument Gaussian model on an image plane and an instrument mask M _lc on an original image are constrained to be aligned respectively, and the multi-region alignment loss function is as follows:

Wherein, C represents the number of articulating components of the instrument; and (3) representing a process of performing 3D Gaussian micro-renderable projection graph when the pose of the camera is R _l.

The total loss function of the instrument gaussian model reconstruction process is:

wherein alpha represents a corresponding weight coefficient;

Optimized instrument Gaussian modelIs marked asI represents the index of the gaussian kernel.

In the process of acquiring instrument Gaussian modelOn the basis of the above, for the rigid movement T between the hinge parts of the surgical instrument, the embodiment of the invention defines the hinge movement of the instrument by DH parameter method, regards the rod part and the left clamp (the right clamp is symmetrical with the left clamp) of the instrument as a two-link structure, and defines the local coordinate system F ₁、F₂ for the rod part and the left clamp (the right clamp is symmetrical with the left clamp) of the instrument respectively as shown in fig. 3, and establishes the camera coordinate system F _cam. The instrument pose estimation solves the problem that the instrument is transformed into a matrix from a coordinate system of the instrument to a camera coordinate system, at the moment, a point X under a local coordinate F ₁、F₂ of the instrument is transformed into a formula under the camera coordinate system as follows:

Wherein, Representing the points under the local coordinate system F ₁、F₂, respectively, the point of the left (right) clip in fig. 3 is in the local coordinate system F ₂, the point of the instrument shaft is in the local coordinate system F ₁,Is a transformation matrix of the instrument bar coordinate system F ₁ to the camera coordinate system F _cam,Is a transformation matrix of the instrument left clamp coordinate system F ₂ to the instrument rod coordinate system F ₁, and the instrument left clamp coordinate system F ₂ must rotate around the z axis of the rod coordinate system F ₁, so that the Euler angle θ of one degree of freedom can represent the rotation between the local coordinate systems F ₁、F₂.

In summary, the stiffness transformation T of the articulating instrument can be expressed asWhereinR represents a 3D rotation matrix,Representing the 3D translation vector.

After defining the reconstruction process of the Gaussian model of the instrument and the rigid motion parameters corresponding to the Gaussian model, as shown in FIG. 2 and FIG. 3, the pose parameters of the instrument are optimized based on the defined Gaussian model to obtain the 6D pose of the articulated instrument. And giving a binocular image of the current frame, acquiring a depth image of the current frame through a binocular depth estimation algorithm, and obtaining a segmentation mask of the instrument through an instrument segmentation algorithm. The input image, the depth map and the instrument mask are used as monitoring information for subsequent pose learning. On the basis of the instrument initialization position, a view, a depth map and a projection mask (represented as rendered images, segmentation and depth information in fig. 2) under the corresponding pose are rendered through 3D Gaussian sputtering, and a loss function is respectively constructed by the view, the depth map, the projection mask, an original image, a semantic segmentation mask and a depth estimation result (represented as images, segmentation and depth information in fig. 2), so that a total loss function is constructed according to the subsequent steps, and the instrument pose is updated and learned through a back propagation algorithm, so that the optimal 6D pose of the instrument of the current frame image is obtained.

In the instrument pose optimization stage, the update of the instrument Gaussian model follows the following formula:

Where i denotes the index of the gaussian kernel, t denotes the current frame, Is an updated analogy formula (5) of the reconstructed instrument gaussian model, u _i,t, as follows:

when u _i,0 belongs to the point of the instrument shaft, the following formula is updated:

Wherein, Representing a transformation matrix converting u _i,0 into a local coordinate system F ₁; and determining when the Gaussian model of the instrument is reconstructed and corresponding pose parameters are defined.

When u _i,0 belongs to the point of the instrument left (right) clamp, the following formula is updated:

Wherein, Representing a transformation matrix converting u _i,0 into a local coordinate system F ₂; and determining when the Gaussian model of the instrument is reconstructed and corresponding pose parameters are defined.

Initialization of the instrument pose is mainly divided into two cases.

Firstly, for a first frame of image, combining a camera model to obtain a rendering view of an instrument Gaussian model on an image plane, enabling an image to coincide with an area where the rendering view is located by moving a cavity mirror, and taking the instrument pose at the moment as an initial instrument pose of the first frame of image;

second, for images other than the first frame, the instrument pose of the current frame image is initialized based on the optimal instrument pose of the previous frame image and optical flow information of two continuous frames.

Initial instrument pose of current frame image I _t 3D translation from previous frame instrument pose T _t-1 and coarse estimateObtaining:

Wherein, Is calculated as follows:

Wherein, Representing the mean of the predicted optical flow in the instrument region from the previous frame to the current frame,Representing the average depth value of the instrument area of the previous frame, f _x,f_y represents the focal length in the x, y direction of the camera, respectively.

In step S23, a total loss function is constructed according to the rendering result, the current frame image, the instrument mask and the instrument depth map thereof, and the instrument pose is updated and learned through a back propagation algorithm, so that the total loss function is minimized, and the optimal 6D pose of the instrument of the current frame image is obtained.

Furthermore, the invention implements the optimization of the pose of the setting instrument mainly by rendering the viewColor consistency loss L _im between original image I _t, rendered instrument projection maskRegion alignment loss L _region between the instrument segmentation mask M _t, rendered instrument depth mapAnd the depth consistency loss L _depth between the instrument depth estimation D _t.

Under the action of the iteratively updatable instrument pose T _t, a color consistency loss function between the 3D Gaussian rendering view and the current frame image is constructed as follows:

Thus, the total loss function is constructed as:

Wherein, alpha ₁、α₂、α₃ represents corresponding weight coefficients respectively.

Then, the instrument pose T _t is iteratively updated through a back propagation algorithm to minimize L _pose, and an instrument optimal 6D pose estimation result of the current frame image is obtainedThe following is shown:

In step S24, based on the optimal 6D pose of the instrument in the current frame image, the position of the end of the instrument corresponding to the gaussian kernel is determined and used as the position of the end of the surgical instrument.

Thus, given a binocular endoscopic image, embodiments of the present invention successfully provide the surgical instrument end position of the articulating instrument。

In step S3, a virtual mark dynamic tracking method based on key frame guidance tracks a surgical path marked on a first image frame by a doctor to obtain a tracking path of a current frame image, wherein the tracking path is represented by a two-dimensional target point set, and the method comprises the following steps:

S31, receiving a surgical path marked on a first image frame by a doctor, wherein the surgical path is represented by adopting a two-dimensional target point set at a pixel level;

S32, acquiring a target area where a two-dimensional target point set on a first image frame is located, predicting the track of each point by utilizing an optical flow algorithm, and acquiring the two-dimensional target point set on a plurality of later image frames and the target area where the two-dimensional target point set is located so as to initialize a key frame pool;

S33, traversing the key frame pool aiming at the current frame image, inquiring similar key frames by using a sparse matching method, and acquiring an optimal template based on the sparse matching result between the inquired similar key frames and the current frame image;

S34, when the optimal template is not an empty set, acquiring a tracking area on the current frame image based on a target area on each similar key frame in the optimal template;

S35, predicting the motion trail of each point by utilizing an optical flow algorithm based on the two-dimensional target point set in each target area to obtain the two-dimensional target point set on the corresponding tracking area;

S36, based on preset conditions, when the current frame image is judged to be a new key frame, the current frame image is put into the key frame pool, wherein the preset conditions comprise:

(1) A complete tracking path is obtained compared to the surgical path;

(4) The maximum capacity of the keyframe pool is not exceeded.

The scheme realizes the pixel-level precision tracking of the two-dimensional virtual mark on the laparoscopic image, and ensures the intuitiveness and accuracy of information transmission.

As shown in fig. 4, fig. 4 shows an overall flowchart of the virtual marker dynamic tracking method. The details of step S30 will be described in detail below with reference to fig. 4:

In step S31, receiving a surgical path marked by a doctor on a first image frame, wherein the surgical path is represented by adopting a two-dimensional target point set at a pixel level;

It will be appreciated that in minimally invasive surgery, virtual markers track the position of a dynamically positioned target, providing real-time feedback to the physician regarding the surgical location. The doctor selects the template frame as the 0 th image frame F ₀ epsilon Virtual marks such as planned operation tracks are marked on a display by using a tool such as a mouse or a touch screen pen, and W, H respectively represent the width and the height of F ₀.

The step receives the operation path and adopts a two-dimensional target point set at a pixel levelRepresenting, whereinRepresenting the virtual marker on frame 0, p _0,s epsilonRepresents the coordinates of the s-th pixel, n represents the number of pixels,Representing a set of real numbers.

Through the setting, the following movement along with the laparoscope is facilitated, and the frame F _i E is aimed at the ith frameThe tracking path on the frame is obtained using the proposed tracking methodWill track the resultsLabeling the current frame image F _i can provide visual guidance for doctors.

In step S32, a target area where a two-dimensional target point set on the first image frame is located is obtained, and the track of each point is predicted by using an optical flow algorithm, and the two-dimensional target point set on the next plurality of image frames and the target area where the two-dimensional target point set is located are obtained, so as to initialize a keyframe pool.

The key frame strategy has a core principle that it can identify and record key time instants as reference points for error correction. The key frames are utilized to reduce the computational complexity, improve the tracking efficiency and ensure the stability of long-time tracking.

Illustratively, this step requires determining the initial three key frames as templates for later tracking. Correspondingly, the specific steps of initializing the key frame pool are as follows:

It should be noted that in the embodiments of the present invention Representing a key frame, wherein j is a key frame index in a key frame pool; Representing RGB values, W, H representing width and height, respectively; representing a target area; 、 respectively represent extraction by using descriptor-based sparse matching method Feature points and feature descriptions in (3); representing a two-dimensional set of target points.

First, the 0 th color image frame F ₀ is designated as a key frame K ₀ E K, and a two-dimensional target point set is defined as,Represents the coordinates of the s-th pixel point, n represents the number of the pixel points,Representing a set of real numbers.

Second, byA region being the centerGenerated by the following formula:

wherein the center point ,Is thatAn arithmetic average of the maximum and minimum values of the abscissa of all pixel points,Is thatThe arithmetic average value of the maximum value and the minimum value of the ordinate of all pixel points in the image, L is the side length of the target area, and L is not less thanThe difference between the maximum and minimum values of the abscissa and the ordinate of all pixel points in (a).

Again, F ₀ is processed by using the existing sparse matching method based on descriptors, such as SIFT, lightglue and the like, and feature points and descriptors are extracted from the whole image and are positioned in the target areaThe internal feature points and descriptors are noted as。

Then, defining a target areaStoring four vertices and center pointsExpressed asWhereinFrom the following componentsIt can be seen that the representationUpper left, upper right, lower left, lower right.

Then, the motion V _0→1 E of the two latter image frames F ₁ and F ₂ is predicted by using an optical flow algorithm,V_0→2∈And calculates each pixel point on the virtual mark according to the optical flow resultTrajectories on F ₁ and F ₂:

wherein Interp (∙) represents a sampling process to obtain a corresponding set of target points AndAnd referring to formula (1), acquiring a target regionAnd。

Finally, the F ₁、F₂ correspondence is noted as key frame K ₁、K₂, which together with K ₀ constitutes the initial key frame pool p= { K ₀,K₁,K₂ }.

In step S33, for the current frame image, traversing the keyframe pool, performing similar keyframe query by using a sparse matching method, and obtaining an optimal template based on a sparse matching result between the queried similar keyframe and the current frame image, wherein the method specifically comprises:

The following formula is adopted to obtain the optimal template:

Wherein K _j represents the j-th key frame in the key frame pool, j has the value of a, b or c, and n _ji represents the target area on K _j The number of matching points with the current frame image F _i, sigma is a first threshold value, and K _simi represents all the queried similar key frames.

It should be noted that, as shown in fig. 3, based on the number of similar key frames card (K _best) in the best template, the current frame image F _i may be divided into two states:

(1) If the card (K _best) >0, namely the optimal template is not empty, the fact that the current frame has a region with high similarity with the target region on the key frame and a mark needing to be tracked exists, at the moment, the current frame is considered as a normal frame or a candidate key frame, the next step is carried out for tracking the target region, and accurate positioning is carried out to the pixel position of the operation path.

It will be appreciated that setting this condition identifies situations that may occur after the surgical path is lost.

(2) If card (K _best) =0, i.e. the best template is empty, it indicates that the similarity between the current frame and all key frames is not high, the mark has moved out of view of the surgical path, there is no mark to be traced, the current frame is considered as a frame not to be traced, and the next frame image is directly read.

Then, the embodiment of the invention adopts a two-stage tracking method, when tracking the operation path, adopts two-stage matching from thick to thin, firstly determines similar key frames through traversing a key frame pool and sparse matching, firstly determines a local area by utilizing a matching result, and then tracks the local area to reduce error tracking, and particularly, the invention refers to steps S34-S35:

In step S34, when the best template is not an empty set, a tracking area on the current frame image is obtained based on the target area on each similar key frame in the best template;

this step determines the similar region of the current frame image and the best template frame, noting that tracking is established on the region, not the entire image.

Correspondingly, acquiring the tracking area on the current frame image specifically comprises the following steps:

First, a target region is determined based on a homography transform or a neighborhood similarity transform Four vertices ofCorresponding position on F _i Comprising:

Wherein the method comprises the steps ofW _t andAndThe distance between them is inversely proportional.

Illustratively, N may be four herein.

Then, based on vertex coordinatesAn initial region is determined, expressed as:

Wherein, 、Respectively representCorresponding position on F _i U _min、u_max represents the maximum and minimum values of the abscissa of the four vertices, respectively, and v _min、v_max represents the maximum and minimum values of the ordinate of the four vertices, respectively.

Then, a tracking area is determined based on coordinates of four vertexes of the initial areaIs the center point of (2);

Wherein, 、Respectively represent tracking areasIs the center point of (2)And the abscissa and ordinate of (c).

Finally, referring to formula (1), a tracking area on the current frame image is obtained。

In step S35, based on the two-dimensional target point set in each target area, predicting the motion trail of each point by using an optical flow algorithm to obtain the two-dimensional target point set on the corresponding tracking area, merging the two-dimensional target point sets on each tracking area to obtain the final two-dimensional target point set, and taking the final two-dimensional target point set as the tracking path on the current frame image.

The step of carrying out mark tracking on the target area and the tracking area comprises the following steps:

Otherwise the first set of parameters is selected, Not participate inIs calculated by the computer.

Merging target point sets on each tracking areaObtaining a final target point setExpressed as:

wherein lambda _ji is the weight, and =1, Λ _ji is proportional to n _ji.

It should be noted that in practice, the embodiment of the present invention may obtain relatively accurate results by selecting one or two keyframes with the highest similarity. By judgingAnd if each pixel point in the image is on the image, covering the complete tracking path or the local tracking path on the current frame image, so that real-time guidance can be provided for doctors.

In step S36, when the current frame image is determined to be a new key frame based on a preset condition, the current frame image is put into the key frame pool, wherein the preset condition includes:

(1) A complete tracking path is obtained compared to the surgical path, expressed as The keyframes must contain the complete surgical path to have sufficient information.

(2) Acquiring the final two-dimensional target point setTracking area whereThe center of the tracking areaAlways within the preset range of the current frame image F _i, expressed as∈Where l is a fourth threshold. In order to eliminate the redundancy of the device,AndNeeds to meet the requirements ofΗ ₁ is the fifth threshold and η ₁ <1. This condition is intended to maintain a certain difference between the target area and the key frame, the target area on the key frame being as far as possible offset from the edge of the image.

(3) If the similar keyframes in the best template are greater than one, the distance places the last new keyframe into the keyframe pool, and the first preset number of image frames have been exceeded.

Illustratively, the first predetermined number here is ten, denoted as card (K _best) >1, i-f >10, where f is the index of the most recently added keyframe, inserted from the last keyframe to the now 10 frames.

The second preset number is here, for example, four, correspondingly denoted as card (K _best) =1, i-f >4, and Wherein η ₂、η₃ is a sixth threshold and a seventh threshold respectively, and the calculation formulas of η ₂<1<η₃, r are r=max { H ₀,h₁},h₀、h₁ respectively represent the first two diagonal elements of the homography matrix H _ji of F _i and K _j, and H ₀、h₁ respectively corresponds to the scaling factors of the picture in the horizontal direction and the vertical direction.

(4) The maximum capacity of the keyframe pool is not exceeded, and n _max,n_max, expressed as card (P). Ltoreq.n, is the maximum capacity of keyframe pool P, where n _max is sixteen, for example. To prevent the key frames from repeatedly adding unnecessary computation, the key frames are considered to have saved key information at various perspectives when a certain capacity is reached.

Therefore, the embodiment of the invention can also process the condition of target loss and re-tracking, and realize the robustness of tracking.

The embodiment of the invention successfully obtains the tracking path represented by the two-dimensional target point set.

In step S4, a three-dimensional surgical path is obtained based on the three-dimensional point cloud and the two-dimensional target point set.

In the step, a three-dimensional point cloud V _d based on the depth estimation of the laparoscopic image is used, and a two-dimensional target point set obtained by combining a virtual mark dynamic tracking method is usedA three-dimensional virtual marker point set V _r can be obtained.

Wherein, pi (∙) is a function of obtaining a three-dimensional virtual mark according to the three-dimensional point cloud and the two-dimensional target point set, V _r∈V_d.

In step S5, searching a point cloud position closest to the tail end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, obtaining an artificial vector of the tail end position of the surgical instrument, and establishing a guiding type virtual force field along the three-dimensional surgical path based on the artificial vector so as to guide a doctor to operate the surgical robot system through a force feedback mechanism, wherein the method comprises the following steps:

searching the end position of the surgical instrument on the three-dimensional surgical path by using a K-nearest neighbor algorithm Nearest point cloud locationThe relative distance x _d between the two points is obtained, thereby judging whether the surgical instrument deviates from the preset surgical path.

Providing that all points on the three-dimensional surgical path are contained in a set of virtual marker points V _r for the surgical instrument tip positionDistance on three-dimensional surgical pathPoint-closest point cloud locationFind the set of sub-points of V _r WhereinTo additionally investigate the radius;

The artificial vector length function is calculated as follows:

Then, based on the artificial vector, establishing a virtual force field containing attractive force and viscous resistance along the three-dimensional operation path, namely a guiding type virtual clamp, so as to guide a doctor to operate the operation robot system through a force feedback mechanism, wherein the guiding type virtual force field is expressed as:

Thus, the embodiment of the invention establishes a man-machine touch force interaction mechanism of the doctor and the robot through a force control technology, establishes a guiding type virtual force field in space, realizes accurate guiding of the movement track of the doctor, and ensures that the doctor can operate the surgical robot according to a preset track.

Example 2:

the embodiment of the invention provides a surgical operation force feedback guiding system based on virtual mark tracking and instrument pose, which comprises the following components:

Example 3:

Embodiments of the present invention provide a storage medium storing a computer program for surgical operation force feedback guidance based on virtual marker tracking and instrument pose, wherein the computer program causes a computer to execute the surgical operation force feedback guidance method as described above.

Example 4:

the embodiment of the invention provides electronic equipment, which comprises:

It may be understood that the surgical operation force feedback guiding system, the storage medium and the electronic device based on the virtual mark tracking and the instrument pose provided by the embodiment of the present invention correspond to the surgical operation force feedback guiding method based on the virtual mark tracking and the instrument pose provided by the embodiment of the present invention, and the explanation, the examples, the beneficial effects and other parts of the relevant content may refer to the corresponding parts in the surgical operation force feedback guiding method, which are not repeated herein.

In summary, compared with the prior art, the method has the following beneficial effects:

1. In the embodiment of the invention, the real-time three-dimensional point cloud reconstruction is carried out on the in-vivo scene, the virtual mark dynamic tracking method based on key frame guidance is used for tracking the operation path marked on the first image frame by a doctor, the three-dimensional operation path is obtained based on the real-time three-dimensional point cloud, the 6D pose method of the operation instrument based on 3D Gaussian tracking is combined, and finally, the guiding type virtual clamp along the dynamic three-dimensional operation path is established, so that the doctor can be effectively helped to complete the operation along the preset operation path through a force feedback mechanism, and the man-machine co-fusion operation conforming to the operation intention of the doctor is realized.

2. According to the embodiment of the invention, no additional equipment or data is needed, the complete three-dimensional reconstruction of the surgical scene is realized by only using the binocular endoscope image, and the surgical planning and navigation are assisted by the real-time estimated three-dimensional information of the instrument and the tissue. Based on the research thought of reconstruction-optimization, the pose estimation of the instrument is estimated by aligning the reconstructed instrument model and the image features, so that the 6D pose of the instrument can be accurately and robustly recovered in a complex operation scene. And the frame based on 3D Gaussian sputtering realizes the estimation of the instrument pose, and can realize the quick recovery of the instrument pose in a few iteration times

3. The embodiment of the invention realizes the pixel-level precision tracking of the two-dimensional operation path on the endoscope image, and ensures the intuitiveness and accuracy of information transmission. When the operation path is tracked, two-stage matching from thick to thin is adopted, similar key frames are determined through traversing of a key frame pool and sparse matching, a local area is determined by utilizing a matching result, and then the local area is tracked, so that error tracking is reduced. And the diversity of the key frame pool is enriched by dynamically updating the key frame pool, so that the robustness and accuracy of dynamic tracking of the virtual mark are further improved.

4. According to the embodiment of the invention, on the basis of improving the operation precision, a doctor of a main knife does not need to wear other equipment, so that the interference to the operation of the doctor is prevented.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present invention in essence.

Claims

1. A surgical operation force feedback guidance method based on virtual marker tracking and instrument posture, characterized by comprising:

Acquire the three-dimensional point cloud of the internal scene based on depth estimation;

The 6D pose estimation method of the instrument based on 3D Gaussian sputtering obtains the optimal 6D pose of the instrument in the current frame image to estimate the end position of the surgical instrument;

Based on the key frame-guided virtual marker dynamic tracking method, the surgical path marked by the doctor on the first image frame is tracked to obtain the tracking path of the current frame image; wherein the tracking path is represented by a two-dimensional target point set;

Based on the three-dimensional point cloud and the two-dimensional target point set, obtaining a three-dimensional surgical path;

The point cloud position closest to the end position of the surgical instrument on the three-dimensional surgical path is searched, an additional investigation radius is introduced, an artificial vector of the end position of the surgical instrument is obtained, and based on the artificial vector, a guiding virtual force field along the three-dimensional surgical path is established to guide the doctor to control the surgical robot system through a force feedback mechanism.

2. The surgical operation force feedback guidance method according to claim 1, characterized in that the instrument 6D posture estimation method based on 3D Gaussian sputtering obtains the optimal 6D posture of the instrument in the current frame image to estimate the end position of the surgical instrument; comprising:

Reading the current frame image, obtaining the device mask through a semantic segmentation algorithm; and performing depth estimation on the current frame image using a binocular depth estimation algorithm, and respectively obtaining a tissue depth map and a device depth map in combination with the device mask;

Initialize the device pose, use the pre-reconstructed device Gaussian model to perform 3D Gaussian sputtering, and obtain the view, projection mask and depth map under the corresponding pose;

According to the rendering result, the current frame image, the device mask, and the device depth map, a total loss function is constructed; and the device posture is updated and learned through the back propagation algorithm to minimize the total loss function and obtain the optimal 6D posture of the device in the current frame image;

Based on the optimal 6D posture of the instrument in the current frame image, the position of the Gaussian kernel corresponding to the end of the instrument is determined and used as the end position of the surgical instrument.

3. The surgical operation force feedback guidance method according to claim 2, characterized in that:

The initialization device posture comprises:

For the first frame of image, a rendering view of the instrument Gaussian model on the image plane is obtained in combination with the camera model, the laparoscope is moved so that the image coincides with the area where the rendering view is located, and the instrument posture at this time is used as the initial instrument posture of the first frame of image;

For images other than the first frame, the device pose of the current frame is initialized based on the optimal device pose of the previous frame and the optical flow information of two consecutive frames;

and/or

The reconstruction process of the Gaussian model of the device includes:

Acquire several images with known camera poses and containing all views of the device to reconstruct a sparse point cloud of the device;

Initialize each point of the sparse point cloud as a Gaussian to obtain an initialized Gaussian model of the instrument;

Combining the aforementioned image sequence consisting of the images with known camera poses and containing all viewing angles of the instrument and their corresponding camera poses, iteratively optimizing the initialized instrument Gaussian model to obtain the reconstructed instrument Gaussian model;

and/or

The image sequence composed of the images of the aforementioned known camera poses and containing all viewing angles of the instrument and the corresponding camera poses is combined to iteratively optimize the initialized instrument Gaussian model to obtain the reconstructed instrument Gaussian model; comprising:

The color consistency loss function between the 3D Gaussian rendering view and the original image that only retains the instrument part is constructed as follows:

in, Respectively represent the center position, rotation, and size of any Gaussian; , L is the number of images in the image sequence; L ₁ represents a norm loss; represents the process of 3D Gaussian differentiable rendering of the image when the camera pose is R _l ; Gi represents the i- th _Gaussian point, , I is the number of Gaussians; It indicates the original image from the image sequence where only the instrument part is retained;

A multi-region alignment loss function is introduced to constrain the projection shape of each articulated part on the reconstructed instrument Gaussian model on the image plane and the instrument mask M _lc on the original image to be aligned respectively; the multi-region alignment loss function is as follows:

in, , C represents the number of articulated parts of the device; It represents the process of 3D Gaussian differentiable rendering projection when the camera pose is R _l ;

Among them, α represents the corresponding weight coefficient;

By minimizing the total loss function L _recons , the reconstructed instrument Gaussian model is obtained , as shown below:

in, represents the i- th reconstructed Gaussian point; Indicates the value of the variable when the function value is minimum;

and/or

The method constructs a total loss function based on the rendering result, the current frame image, its device mask, and the device depth map; and updates and learns the device posture through a back propagation algorithm to minimize the total loss function and obtain the optimal 6D posture of the device in the current frame image; including:

Construct the color consistency loss function between the 3D Gaussian rendering view and the current frame image as follows:

Where, t represents the current time; T _t represents the device posture in the current frame; Represents a rendered view; , represents the process of performing 3D Gaussian differentiable rendering of the image; M _t represents the instrument mask of the current image segmentation;

Construct the regional alignment loss function between the 3D Gaussian differentiable rendering projection image and the device mask of the current frame image as follows:

in, They respectively represent the projection shape of the cth hinged part on the Gaussian model of the device on the image plane and the device mask of the current frame image; , Represents the process of performing 3D Gaussian differentiable rendering projection; ;

Construct the depth consistency loss between the rendered depth map and the device depth map of the previous frame as follows:

in, Respectively represent the rendered depth map and the device depth map of the previous frame image; , Represents the process of rendering the depth map of the Gaussian model of the device;

The total loss function is constructed as:

Among them, α ₁ , α ₂ , and α ₃ represent the corresponding weight coefficients respectively;

Then, the device pose T _t is iteratively updated through the back propagation algorithm to minimize L _pose and obtain the optimal 6D pose estimation result of the device in the current frame image. , as shown below:

4. The surgical operation force feedback guidance method according to claim 1, characterized in that the virtual marker dynamic tracking method based on key frame guidance tracks the surgical path marked by the doctor on the first image frame to obtain the tracking path of the current frame image; comprising:

Receiving a surgical path marked by a doctor on a first image frame, wherein the surgical path is represented by a two-dimensional target point set at a pixel level;

Obtain the target area where the two-dimensional target point set on the first image frame is located, and use the optical flow algorithm to predict the trajectory of each point, obtain the two-dimensional target point set and the target area where it is located on the subsequent multiple image frames to initialize the key frame pool;

For the current frame image, the key frame pool is traversed, similar key frames are queried using a sparse matching method, and the best template is obtained based on the sparse matching results between the queried similar key frames and the current frame image;

When the best template is not an empty set, obtaining a tracking area on the current frame image based on the target area on each similar key frame in the best template;

Based on the two-dimensional target point set in each of the target areas, the motion trajectory of each point is predicted using an optical flow algorithm to obtain a two-dimensional target point set on the corresponding tracking area; the two-dimensional target point set on each of the tracking areas is merged to obtain a final two-dimensional target point set, and the final two-dimensional target point set is used as the tracking path on the current frame image;

When the current frame image is determined to be a new key frame based on a preset condition, it is placed in the key frame pool; wherein the preset condition includes:

(1) Compared with the surgical path, a complete tracking path was obtained;

(2) obtaining a tracking area where the final two-dimensional target point set is located, wherein the center of the tracking area is always within a preset range of the current frame image;

(3) if the number of similar key frames in the best template is greater than one, the number of image frames that have passed since the last new key frame was put into the key frame pool exceeds a first preset number;

If the similar keyframe in the best template is equal to one, more than a second preset number of image frames have passed since the last new keyframe was put into the keyframe pool, and the ratio of the length of the tracking path to the surgical path is consistent with the scale change of the respective image frames;

(4) The maximum capacity of the keyframe pool is not exceeded.

5. The surgical operation force feedback guidance method according to claim 4, characterized in that:

The initialization key frame pool specifically includes:

use represents a key frame, where j is the key frame index in the key frame pool; Represents RGB value, W and H represent width and height respectively; Indicates the target area; , They represent the extracted Feature points and feature descriptions in; represents a two-dimensional target point set;

The 0th image frame F ₀ is designated as the key frame K ₀ ∈ K , and the two-dimensional target point set on it is defined as , represents the coordinates of the sth pixel, n represents the number of pixels, represents the set of real numbers;

by The central area Generated by:

Among them, the center point , for The arithmetic mean of the maximum and minimum values of the horizontal coordinates of all pixels in , for The arithmetic mean of the maximum and minimum values of the ordinates of all pixels in the target area; L is the side length of the target area, L is not less than The difference between the maximum and minimum values of the horizontal and vertical coordinates of all pixels in;

Define the target area Store four vertices and the center point , expressed as ,in Depend on It can be seen that The four vertices of: upper left, upper right, lower left, lower right;

The optical flow algorithm is used to predict the motion V _0→1 ∈ in the next two image frames F ₁ and F ₂ , V _0→2 ∈ , and calculate each pixel point on the surgical path according to the optical flow result _Trajectory on F1 and F2 _:

Among them, Interp (∙) represents the sampling process, and the corresponding target point set is obtained and , and refer to formula (1) to obtain the target area and ;

F ₁ and F ₂ are recorded as key frames K ₁ and K ₂ respectively, and together with K _0, they constitute the initial key frame pool P = { K ₀ , K ₁ , K ₂ };

and/or

The method of traversing the key frame pool for the current frame image, performing a similar key frame query using a sparse matching method, and obtaining an optimal template based on a sparse matching result between the queried similar key frame and the current frame image, specifically includes:

Get the feature points and feature description of the current frame image Fi _;

Traversing each key frame K _j in the key frame pool, and obtaining matching points between F _i and K _j using a sparse matching method;

Count the number of matching points n _ji in the target area on K _j , _and select the three key frames with the largest n _ji as similar key frames K _simi ={ Ka , K _b , K _c }⊆ P ,

The best template is obtained using the following formula:

Where Kj represents the jth keyframe in the keyframe pool, j is a, b _or c; nji _represents the target area _on Kj The number of matching points between the current frame image F _i ; σ is the first threshold; K _simi represents all similar key frames queried;

and/or

The step of obtaining a tracking area on the current frame image based on the target area on each similar key frame in the best template specifically includes:

Based on the matching points between K _j ⊆ K _best and F _i , RANSAC is used to estimate the homography matrix H _ji . The accuracy of H _ji is given by We get, n _in is the number of interior points;

when ≥λ , 0<λ<1 , λ is the second threshold, then for the target area The four vertices on Perform homography transformation and get The corresponding position on F _i :

Otherwise, the vertex coordinates are transformed using the neighborhood similarity transformation method:

Extract a set of matches of F _i and K _j from the sparse matching results ， xi _m ∈ and y _jm ∈ is a two-dimensional column vector representing the position of the feature point;

Search target area vertices The N nearest neighbors of , get the corresponding matching point , to get the point The corresponding position on F _i :

in , w _t and and is inversely proportional to the distance between them;

Based on vertex coordinates , determine an initial region, expressed as:

in, , Respectively The corresponding position on F _i The horizontal and vertical coordinates of the four vertices are as follows: u _min and u _max represent the maximum and minimum values of the horizontal coordinates of the four vertices, respectively; v _min and v _max represent the maximum and minimum values of the vertical coordinates of the four vertices, respectively;

Based on the coordinates of the four vertices of the initial area, determine the tracking area The center point ;

in, , Respectively represent the tracking area The center point The horizontal and vertical coordinates of

Refer to formula (1) to obtain the tracking area on the current frame image ;

and/or

The method of predicting the motion trajectory of each point based on the two-dimensional target point set in each target area by using an optical flow algorithm to obtain a two-dimensional target point set on the corresponding tracking area; merging the two-dimensional target point set on each tracking area to obtain a final two-dimensional target point set specifically includes:

2D target point set The coordinates of each point pj _,s in _the entire frame Kj and the relative target area Coordinates The mapping relationship is as follows:

Optical flow in the target area and tracking area The optical flow prediction result is V _j→i . The sparse matching result should be consistent with the optical flow prediction result. The difference between the two is calculated using the following formula:

If d ≤ γ , γ is the third threshold, then the tracking area The target point set on Each point p _ji,s in is obtained by the following method:

otherwise, Not participating Calculation of

Merge the two-dimensional target point set on each tracking area , get the final two-dimensional target point set , expressed as:

Among them, λ _ji is the weight, and =1, λ _ji is proportional to n _ji .

6. The surgical operation force feedback guidance method according to claim 1, characterized in that the step of searching for the point cloud position closest to the end position of the surgical instrument on the three-dimensional surgical path, introducing an additional investigation radius, and obtaining an artificial vector of the end position of the surgical instrument comprises:

Using the K nearest neighbor algorithm, search for the distance from the end of the surgical instrument on the three-dimensional surgical path Nearest point cloud location , get the relative distance x _d between the two points;

Assume that all points on the three-dimensional surgical path are included in the virtual marker point set V _r . and the distance along the 3D surgical path Point cloud location closest to the point , find the sub-point set of V _r ;in For additional investigation radius;

The end position of the surgical instrument The corresponding artificial vector is expressed as follows:

is an artificial vector length function, calculated as follows:

in, End position of surgical instrument The distance between the virtual marker point v , r and R are the upper and lower bounds of the distance respectively. When the distance is less than or equal to r , the virtual marker point v is assigned the maximum weight q _max . When the distance is greater than r and less than R , the weight of the virtual marker point v is between q _max and 0. When the distance is greater than or equal to R , the weight of the virtual marker point v is 0. Indicated by A vector pointing to a point v ; card (∙) represents the cardinality or number of elements in a set.

7. The surgical operation force feedback guiding method according to claim 6, wherein the guiding virtual force field is represented as:

Where K _vf represents the proportionality coefficient between gravity and artificial vector, and D _vf ( x _d ) represents the damping coefficient express speed.

8. A surgical operation force feedback guidance system based on virtual marker tracking and instrument posture, characterized by comprising:

A depth estimation module, used to obtain a three-dimensional point cloud of an in-vivo scene based on depth estimation;

The pose estimation module is used for the instrument 6D pose estimation method based on 3D Gaussian sputtering to obtain the optimal 6D pose of the instrument in the current frame image to estimate the end position of the surgical instrument;

A marker tracking module is used to track the surgical path marked by the doctor on the first image frame based on a virtual marker dynamic tracking method guided by a key frame, and obtain the tracking path of the current frame image; wherein the tracking path is represented by a two-dimensional target point set;

A path conversion module, used for acquiring a three-dimensional surgical path based on the three-dimensional point cloud and the two-dimensional target point set;

The force field establishment module is used to search for the point cloud position on the three-dimensional surgical path that is closest to the end position of the surgical instrument, introduce an additional investigation radius, obtain an artificial vector of the end position of the surgical instrument, and establish a guiding virtual force field along the three-dimensional surgical path based on the artificial vector, so as to guide the doctor to control the surgical robot system through a force feedback mechanism.

9. A storage medium, characterized in that it stores a computer program for force feedback guidance of surgical operations based on virtual marker tracking and instrument posture, wherein the computer program enables a computer to execute the surgical operation force feedback guidance method as described in any one of claims 1 to 6.

10. An electronic device, comprising:

One or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, and the programs include a method for executing a surgical operation force feedback guidance method as described in any one of claims 1 to 6.