WO2025174815A1 - Guidage chirurgical peropératoire à l'aide d'une reconstruction tridimensionnelle - Google Patents
Guidage chirurgical peropératoire à l'aide d'une reconstruction tridimensionnelleInfo
- Publication number
- WO2025174815A1 WO2025174815A1 PCT/US2025/015501 US2025015501W WO2025174815A1 WO 2025174815 A1 WO2025174815 A1 WO 2025174815A1 US 2025015501 W US2025015501 W US 2025015501W WO 2025174815 A1 WO2025174815 A1 WO 2025174815A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- surgical
- region
- video stream
- video
- fiducial marker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00004—Operational features of endoscopes characterised by electronic signal processing
- A61B1/00009—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope
- A61B1/000094—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope extracting biological structures
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00004—Operational features of endoscopes characterised by electronic signal processing
- A61B1/00009—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope
- A61B1/000096—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope using artificial intelligence
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00043—Operational features of endoscopes provided with output arrangements
- A61B1/00045—Display arrangement
- A61B1/0005—Display arrangement combining images e.g. side-by-side, superimposed or tiled
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00163—Optical arrangements
- A61B1/00194—Optical arrangements adapted for three-dimensional imaging
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/20—Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/313—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor for introducing through surgical openings, e.g. laparoscopes
- A61B1/3132—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor for introducing through surgical openings, e.g. laparoscopes for laparoscopy
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/313—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor for introducing through surgical openings, e.g. laparoscopes
- A61B1/317—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor for introducing through surgical openings, e.g. laparoscopes for bones or joints, e.g. osteoscopes, arthroscopes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/10—Computer-aided planning, simulation or modelling of surgical operations
- A61B2034/101—Computer-aided simulation of surgical operations
- A61B2034/105—Modelling of the patient, e.g. for ligaments or bones
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/20—Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
- A61B2034/2046—Tracking techniques
- A61B2034/2065—Tracking using image or pattern recognition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/25—User interfaces for surgical systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B90/00—Instruments, implements or accessories specially adapted for surgery or diagnosis and not covered by any of the groups A61B1/00 - A61B50/00, e.g. for luxation treatment or for protecting wound edges
- A61B90/36—Image-producing devices or illumination devices not otherwise provided for
- A61B90/361—Image-producing devices, e.g. surgical cameras
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B90/00—Instruments, implements or accessories specially adapted for surgery or diagnosis and not covered by any of the groups A61B1/00 - A61B50/00, e.g. for luxation treatment or for protecting wound edges
- A61B90/90—Identification means for patients or instruments, e.g. tags
- A61B90/94—Identification means for patients or instruments, e.g. tags coded with symbols, e.g. text
- A61B90/96—Identification means for patients or instruments, e.g. tags coded with symbols, e.g. text using barcodes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/41—Medical
Definitions
- the present disclosure relates generally to surgical procedures and more specifically to providing real-time assistance for surgical procedures.
- Many minimally invasive surgical procedures such as arthroscopic joint surgical procedures, involve a surgical camera that provides real-time surgical video data.
- the surgical video data may be shown on a display and may be used to guide actions of the surgeon, doctor, or other clinician performing the surgery.
- the surgeon’s view is indirect and may offer only a limit field of view.
- the patients’ preoperative diagnostic radiological imaging may be used to plan surgeries.
- the planning could include deciding resection margins, estimate bone losses, etc.
- these preoperative images are not available in the surgical field of view.
- Providing real-time surgical guidance and three-dimensional views of the patient’s anatomy and surgical tools may be beneficial to the surgeon increasing positive outcomes and reducing procedure times.
- the surgical guidance may include a display showing a reconstructed view of the surgical area.
- the reconstructed view may include three- dimensional (3D) models associated with the patient.
- the reconstructed view can include 3D models of anatomies, pathologies, and implants associated with the patient.
- the 3D models may be based, at least in part, on the patient's magnetic resonance imaging (MRI) and/or computerized tomography (CT) data.
- MRI magnetic resonance imaging
- CT computerized tomography
- a real-time surgical video of an operation or procedure is received.
- the surgical video is processed, and any recognized objects are identified.
- a processor can execute one or more trained neural networks to identify anatomies, pathologies, implants, surgical tools, and the like that may be included within the surgical video.
- a computergenerated 3D model of any (or all) of the identified items may be included in a constructed 3D view provided to the surgeon as surgical guidance.
- the 3D view may be manipulated (rotated, any included joints virtually moved through a range of motion) to advantageously provide the surgeon access to views of the surgical area that may otherwise be occluded or hidden from the view of a surgical camera.
- Any of the methods described herein may be used to generate a 3D surgical view. Any of the methods may include receiving or obtaining a patient’s radiological data, segmenting one or more anatomical structures from the patient’s radiological data, and generating one or more three-dimensional (3D) models based on the one or more segmented anatomical structures.
- any of the methods may include receiving a live surgical video stream, identifying an anatomical structure within the live surgical video stream, and generating a reconstructed 3D model of the identified anatomical structure based at least in part on the one or more 3D models based on the one or more segmented anatomical structures.
- Any of the methods described herein may also include identifying a surgical tool in the live surgical video stream, and superimposing a digital representation of the surgical tool onto the reconstructed 3D model.
- identifying the surgical tool may include executing a neural network trained to recognize the surgical tool within a video stream.
- Any of the methods described herein may further include identifying an implant in the live surgical video stream, and superimposing digital representation of the implant onto the reconstructed 3D model.
- identifying the implant may include executing a neural network trained to recognize the implant within a video stream.
- Any of the methods described herein may include identifying an anatomical structure that includes a pathology, and highlighting the anatomical structure. Any of the methods described herein can include animating the 3D model to move in a realistic manner. [0014] Any of the methods described herein may include displaying the 3D model to replace the live surgical video stream. In some examples, the displayed 3D model is a frame- by-frame replacement of the live surgical video stream.
- the patient’s radiological data may include magnetic resonance imagery (MRI) data, computed tomography (CT) data, or a combination thereof.
- MRI magnetic resonance imagery
- CT computed tomography
- identifying the anatomical structure can include executing a neural network trained to recognize anatomical structures included within a video stream.
- identifying the anatomical structure within the live surgical video stream comprises estimating a camera angle based on the identified anatomical structure.
- generating the reconstructed 3D model of the identified anatomical structure can be based at least in part on the estimated camera angle.
- Any of the devices described herein may include one or more processors, and a memory configured to store instructions that, when executed by the one or more processors cause the device to receive or obtain a patient’s radiological data, segment one or more anatomical structures from the patient’s radiological data, generate one or more three- dimensional (3D) models based on the segmented anatomical structures, receive a live surgical video stream, identify an anatomical structure within the live surgical video stream, and generate a reconstructed 3D model of the identified anatomical structure based at least in part on the one or more 3D models based on the one or more segmented anatomical structures.
- 3D three- dimensional
- any of the non-transitory computer-readable storage mediums described herein may include instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising receiving or obtaining a patient’s radiological data, segmenting one or more anatomical structures from the patient’s radiological data, generating one or more three-dimensional (3D) models based on the segmented anatomical structures, receiving a live surgical video stream, identifying an anatomical structure within the live surgical video stream, and generating a reconstructed 3D model of the identified anatomical structure based at least in part on the one or more 3D models based on the one or more segmented anatomical structures.
- 3D three-dimensional
- Any of the methods described herein can generate an automatically annotated surgical video. Any of the methods may include receiving or obtaining a surgical video stream, identifying a region of interest within the surgical video stream, and generating an annotated view based on the identified region of interest. [0021] In any of the methods described herein, generating the annotated view can include rendering a bounding box around the region of interest. In any of the methods described herein, identifying the region of interest can include executing a neural network trained to recognize an anatomy within a video stream.
- identifying the region of interest can include executing a neural network trained to recognize a pathology within a video stream. In any of the methods described herein, identifying the region of interest may include executing a neural network trained to recognize a surgical tool within a video stream.
- generating the annotated view may include rendering labels near the region of interest.
- Any of the methods described herein may include identifying an anatomical region, and identifying specific anatomical structures within the anatomical region, by executing a neural network trained to detect specific anatomical features within the anatomical region.
- Any of the devices described herein may include one or more processors, and a memory configured to store instructions that, when executed by the one or more processors cause the device to receive or obtain a surgical video stream, identify a region of interest within the surgical video stream, and generate an annotated view based on the identified region of interest.
- Any of the non-transitory computer-readable storage mediums described herein may include instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising receiving or obtaining a surgical video stream, identifying a region of interest within the surgical video stream, and generating an annotated view based on the identified region of interest.
- Any of the methods described herein may provide surgical guidance. Any of the methods may include associating a surgical instrument with a first fiducial marker, associating a surgical region with a second fiducial marker, determining a location of the surgical instrument with respect to the surgical region, and displaying surgical guidance based on the location of the surgical instrument and the location of the surgical region.
- determining the location of the surgical instrument with respect the surgical region may include determining a location of the first fiducial marker and determining a location of the second fiducial marker.
- determining the location of the surgical instrument with respect to the surgical region can include viewing the first fiducial marker with a fixed camera.
- associating a surgical instrument with the first fiducial marker can include affixing the first fiducial marker to the surgical instrument.
- determining the location of the surgical instrument with respect to the surgical region can include viewing the second fiducial marker with a camera mechanically coupled to the surgical instrument.
- associating a surgical region with a second fiducial marker may include affixing the second fiducial marker onto a reference surface adjacent to a patient.
- determining the location of the surgical instrument with respect to the surgical region may include viewing the second fiducial marker with a fixed camera.
- displaying surgical guidance can include displaying a three-dimensional (3D) model of a patient’s anatomy.
- displaying surgical guidance includes displaying a 3D model of the surgical instrument.
- displaying the surgical guidance may include displaying an implant associated with a surgery.
- determining the location of the surgical instrument with respect to the surgical region may include viewing the first fiducial marker and the second fiducial marker with a plurality of cameras.
- displaying surgical guidance may include displaying a planned placement of implants and anchor points.
- displaying surgical guidance can include separately displaying a video stream from a camera associated with the surgical instrument.
- the surgical instrument can include an orthoscopic camera.
- Any of the non-transitory computer-readable storage mediums described herein may include instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising associating a surgical instrument with a first fiducial marker, associating a surgical region with a second fiducial marker, determining a location of the surgical instrument with respect to the surgical region, and displaying surgical guidance based on the location of the surgical instrument and the location of the surgical region.
- FIG. l is a simplified block diagram of an example interoperative surgical assistance system.
- FIG. 2 shows a block diagram of analysis modules.
- FIG. 3 is a flowchart showing an example method for generating an annotated view.
- FIG. 4 is a flowchart showing an example method for generating a three- dimensional view.
- FIG. 5 is a flowchart showing an example method for generating activity summaries.
- FIG. 6 shows an example three-dimensional surgical assistance system.
- FIG. 7 is a flowchart showing an example method for generating three- dimensional surgical guidance.
- FIG. 8 shows a block diagram of a device that may be one example of the intraoperative surgical assistance system of FIG. 1 and/or the three-dimensional surgical assistance system of FIG. 6.
- FIGS. 9A-9C illustrate examples of reconstructed 3D views showing surgical tools as described herein.
- the surgical guidance may include a navigable three-dimensional (3D) model of the patient’s anatomy and, in some cases, the patient’s pathology.
- the 3D model may be based on pre-operative radiological images including magnetic resonance images and/or computed tomography scan data.
- the 3D model may be constructed based on a real-time surgical video, such that elements of the real-time surgical video are replaced with equivalent 3D models.
- the 3D models may be displayed concurrently with the real-time surgical video.
- the surgical guidance may be enhanced by systems and methods for determining a location of a surgical camera with respect to an operating region.
- a surgical assistance system can simultaneously determine the location of the surgical camera and the location of the operating (surgical) region.
- the surgical assistance system can determine when the surgical camera is within the operating region and determine a 3D model representative of the view seen by the surgical camera.
- FIG. l is a simplified block diagram of an example interoperative surgical assistance system 100.
- the surgical assistance system 100 may receive or obtain a real-time surgical video 110 and patient data 120 that may include magnetic resonance imagery (MRI) data, computed tomography (CT) data, a patient’s pre-operative plan, and the like.
- the surgical assistance system 100 may use the surgical video 110 and the patient data 120 to generate surgical assistance data.
- the surgical assistance data may include an annotated view 135, a three-dimensional (3D) view 145, an information view 175, and a database 170.
- the surgical assistance data may be displayed on one or more displays to provide a surgeon (or other technician) guidance regarding an ongoing surgical procedure.
- the guidance may include a 3D rendering of any feasible anatomy of the patient.
- the guidance may include a 3D rendering of any tools that may be used during a procedure or inserted into the patient and moved relative to the patient’s anatomy.
- the real-time surgical video 110 may be a real-time video stream from a surgical camera, such as an orthoscopic or arthroscopic surgical camera. In some examples, other cameras may be used to generate the real-time surgical video 110.
- the real-time video stream may be processed concurrently through any feasible number of processing pathways.
- a first processing pathway may use the real-time surgical video 110 to generate the annotated view 135.
- the real-time surgical video 110 may be received by an anatomy, tool, and pathology processing block 130.
- the anatomy, tool, and pathology processing block 130 can process a video stream in order to recognize any feasible anatomies, implants, surgical tools and/or pathologies that may be included in the video stream.
- the anatomy , tool, and pathology processing block 130 may include a processor that executes one or more neural networks trained to recognize anatomies, implants, surgical tools, and/or pathologies.
- the anatomy, tool, and pathology processing block 130 can generate labels associated with any recognized anatomies, implants, tools, and pathologies.
- the anatomy, tool, and pathology processing block 130 can render the generated labels over or near the associated recognized objects on the video stream.
- the anatomy, tool, and pathology processing block 130 can generate and render bounding boxes that can surround or encompass any recognized objects. In this manner, the anatomy, tool, and pathology processing block 130 can generate the annotated view 135.
- a second processing pathway may use the real-time surgical video 110 to generate the 3D view 145.
- the 3D view 145 may be generated by a 3D scene construction block 140.
- the 3D scene construction block 140 can receive or obtain the patient data 120 and recognition data directly or indirectly (via a metadata stream) from the anatomy, tool, and pathology processing block 130.
- the anatomy, tool, and pathology processing block 130 may output recognition data into a message bus.
- the message bus which may also be referred to as the metadata stream, can be a low-latency message bus that conveys data (video stream data, as well as any other feasible data from any feasible processing block) to other processing blocks or to the database 170.
- the 3D scene construction block 140 can generate a 3D model of the patient’s anatomy.
- the 3D scene construction block 140 can segment the patient’s anatomy into separate parts (bones, tendons, muscle and the like). The segmentation may be guided or assisted by recognition data from the anatomy, tool, and pathology processing block 130. In some examples, the segmentation may be based on one or more neural network (artificial intelligence) algorithms. In some cases, the 3D model may be computed prior to a surgical procedure using the patient data 120.
- the 3D scene construction block 140 can receive the real-time surgical video 110 and/or video stream data from the metadata stream and match the position and orientation of a rendered 3D model to match and track the video stream (and/or metadata stream) data. In some cases, the 3D scene construction block 140 can estimate a camera position based on the real-time surgical video 110 and/or the video stream data from the metadata stream. In some examples, the 3D scene construction block 140 can determine a best fit between anatomical structures on the metadata stream and structures included within the patient data 120. [0060] The 3D scene construction block 140 can render digital representations of tools and implants detected in a video stream onto the 3D model. In some examples, the 3D scene construction block 140 can determine a relative position of a surgical tool against an anatomical structure. The determined relative position may be used to position the surgical tool within the rendering of the patient’s anatomy.
- the 3D scene construction block 140 can detect and output an instantaneous position of a recognized surgical tool to the database 170 through the metadata stream.
- the surgical tools may be specified or described in a configuration file.
- the 3D scene construction block 140 can output a digital representation of the surgical tool in the 3D view 145. Operations of the 3D scene construction block 140 are described in more detail with respect to FIG. 2.
- the 3D scene construction block 140 may respond to a user input (via a keyboard, mouse, trackball, or the like).
- the user input may be used to interact with the 3D model.
- the user can change a view of the 3D model by spinning or moving the 3D model with respect to the viewer’s point of view.
- the user can interact with the model by virtually moving an anatomical joint through a predicted range of motion.
- the predicted range of motion may be based on an a priori knowledge of an anatomical model.
- the 3D view 145 can be a frame-by-frame rendered 3D view of the patient’s anatomy, surgical tools, and pathologies that may be included within a video stream, such as the real-time surgical video 110.
- a video stream such as the real-time surgical video 110.
- the 3D view 145 generated by the 3D scene construction block 140 may have a real-time, frame-by-frame correspondence to the real-time surgical video 110 or any other feasible video stream.
- a third processing pathway may use the real-time surgical video 110 to generate scene descriptions and summarize surgical activities with a language and vision model 150 and a scene summary large language model 160.
- the scene descriptors and surgical summaries may be shown in an information view 175 or may be stored in the database 170.
- the third processing pathway may operate at a slower rate (slower frame rate) than the real-time surgical video 110.
- the language and vision model 150 may be implemented as a neural network that can be executed by one or more processors.
- the neural network can be trained using images as inputs.
- the images can be images from, or related to, any feasible surgery.
- the neural network may be trained to output a scene description in human-readable text.
- the training may be analogous to training a neural network for a large language model, however in this example, an input phrase or sentence is replaced with an input video image, scene, and/or frame.
- the language and vision model 150 may be trained to provide human- readable phrases that describe the related video image, scene, etc.
- the output of the language and vision model 150 may be human-readable scene descriptors 155.
- the scene descriptors 155 may be output to the metadata stream and/or stored in the database 170.
- the surgical assistance system 100 may segment the real-time surgical video based at least in part on visual/language-based neural networks.
- the language and vision model 150 can advantageously enable a user to ignore or de-prioritize less important information.
- the language and vision model 150 may not describe unimportant background information in the video stream. In this manner, the language and vision model 150 can naturally draw the user’s attention to objects or activities of interest.
- the scene summarization large language model (LLM) 160 may include a neural network trained to summarize surgical scenes that are included within the real-time surgical video 110.
- the scene summary LLM 160 may receive or obtain the scene descriptors 155 either from the metadata stream, the database 170, or in some cases directly from the language and vision model 150.
- the scene summary LLM 160 can examine multiple scene descriptors 155 and generate a human- readable activity summary 165 regarding surgical actions, recognized anatomies, implants, or pathologies, or the like that have occurred within or during one or more video scenes.
- the scene summary LLM 160 can operate on a “rolling window” of scene descriptors 155 to generate the activity summary 165.
- the scene summary LLM 160 may be trained with the scene descriptors 155.
- the activity summary 165 may be output to the metadata stream, stored in the database 170, and/or shown in the information view 175.
- the activity summary 165 may be used to summarize surgical procedures and simplify future surgical procedures and/or workflows. In some other examples, the activity summary 165 may be used to provide training or education for device representatives, doctors, support staff, families, patients, or the like. In some cases, a chat bot can be implemented to respond to inquiries regarding the activity summary 165.
- the scene summary LLM 160 can advantageously provide context to visual data.
- the output of the scene summary LLM 160 can be a natural language description and/or interpretation of scenes and actions.
- the arrangement of the language and vision model 150 and the scene summary LLM 160 may be hierarchical. That is, the language and vision model 150 may operate on a lower level to provide the scene descriptors 155 which may be a first level of abstraction.
- the scene summary LLM 160 may generate the activity summary 165 from the scene descriptors 155, where the activity summary 165 is a higher level of abstraction than the scene descriptors 155.
- a fourth processing pathway can include minimal (or no) processing.
- the real-time surgical video 110 may be displayed as an original view 115.
- the original view 115 may be shown in the operating room through a dedicated display.
- the surgical assistance system 100 can display the original view 115, the annotated view 135, the 3D view 145, and the information view 175 together or separately, on a single display (using picture-in-picture) or across multiple displays.
- the surgeon may rely primarily on the 3D view 145 which include solely or mostly a reconstructed 3D view. That is, the 3D view 145 may exclude any real-time video elements, and not rely on blending of any other video streams with the reconstructed 3D view.
- the surgical assistance system 100 can be used to provide real-time surgical guidance through the annotated view 135 and/or the 3D view 145.
- the surgical assistance system 100 can provide a human-readable text summary that describes any feasible operation or procedure included in the real-time surgical video 110.
- the Al pipeline 206 can include an input video preprocessing module 210 to preprocess the surgical video 202.
- the input video preprocessing module 210 can center crop the surgical video 202.
- the input video preprocessing module 210 can remove black space surrounding one or more objects centrally located within a field of view (FOV) of the surgical video 202.
- the FOV of a video may refer to any areas that may be within a central portion of the surgical video 202.
- removing black space may include removing black space around the central portion of the view with removing any non-important image content.
- the view recognition module 245 may include one or more neural networks trained to recognize various scenes and also output corresponding text labels and timestamps.
- the implant recognition module 246 can indicate whether the surgical video 202 includes any feasible recognized implant. Implants can include any hard or soft implant, replacement bone structure or the like. The implant recognition module 246 can output a text label that includes the name of the recognized implant. The implant recognition module 246 may include one or more neural networks trained to recognize various implants and also output the corresponding text labels and associated timestamps.
- the pathology recognition module 248 can indicate whether the surgical video 202 includes any recognized pathology.
- a pathology can include any feasible disease or other physical malady.
- the pathology recognition module 248 may include one or more neural networks that have been trained to recognize any feasible pathology and output a corresponding text label and mask that outlines or highlights the recognized pathology.
- the pathology recognition module 248 can also output a video timestamp associated with the recognized pathology.
- the anatomy recognition module 249 can indicate whether the surgical video 202 includes any recognized anatomy.
- the recognized anatomy can include any feasible anatomy.
- the anatomy recognition module 249 may include one or more neural networks that have been trained to recognize any feasible anatomy and output a corresponding text label and a mask that outlines or highlights the recognized anatomy.
- the anatomy recognition module 249 may also output a video timestamp associated with the recognized anatomy.
- outputs from some or all of the primary Al modules 240 may be used as inputs for further Al processing.
- some or all of the outputs of any of the primary Al modules 240 may be intermediate data for further processing. Further Al processing is described below in conjunction with FIGS. 3-7.
- FIG. 3 is a flowchart showing an example method 300 for generating an annotated view.
- the method 300 can generate the annotated view 135 of FIG. 1.
- operations of the method 300 may be included in the anatomy, tool, and pathology processing block 130 of FIG. 1. Some examples may perform the operations described herein with additional operations, fewer operations, operations in a different order, operations in parallel, and some operations differently.
- the method 300 is described below with respect to the block diagram of FIG. 1, however, the method 300 may be performed by any suitable system or device.
- the method 300 begins in block 302 where a surgical video stream is obtained or received.
- a surgical video stream may be received.
- the surgical video stream may be from a surgical camera (such as but not limited to, an arthroscopic, laparoscopic, and/or endoscopic camera).
- the surgical video stream can be any feasible live video stream from any feasible source, including a network (“cloud-based”) source.
- anatomy, tool, and pathology recognition is performed.
- the anatomy, tool surgical tool
- pathology recognition may be performed by the anatomy, tool, and pathology processing block 130 of FIG. 1.
- anatomy, tools, and pathologies are mentioned here, any region of interested may be recognized or identified.
- the anatomy, tool, and pathology processing block can also recognize or identify implants.
- anatomy, tool, and pathology recognition may be performed by any feasible processor executing one or more neural networks as described in conjunction with FIG. 2.
- any recognized anatomy, tool, and/or pathology data may be provided to the metadata stream for use by other processing blocks or units.
- the recognition may be through a hierarchical arrangement of algorithms.
- a first recognition algorithm may broadly recognize an anatomical region, and a second algorithm may then be used to recognize a specific anatomical structure.
- pathology recognition models may be invoked on the anatomical structures segmented by recognition modules, thereby recognizing pathology specific to an anatomical structure.
- an annotated view is generated.
- the annotated view 135 may be generated by the anatomy, tool, and pathology processing block 130.
- the recognized anatomy, tool, and pathology data (from block 304) may be used to generate labels that are rendered next to or over the associated item. That is, any identified items (identified in block 304) may be annotated or identified in the annotated view 135.
- the recognized anatomy, tool, and pathology data is used to determine (or render) a highlight, bounding box, or a label for any recognized item, particularly when the highlight, bounding box, or the label is shown near or over the surgical video stream.
- some or all of the recognized anatomy, tool, and pathology information may be output to a metadata stream.
- the metadata stream may be implemented as a low-latency message bus which is also output to a database optimized for handling time-series data. Any feasible module or processing block can “listen” to the metadata stream to recieve any data output to the metadata stream by any other module or processing block.
- the recognition data may be output in a variety of formats.
- the recognized structures are isolated while preserving the scale and orientation.
- regions of interest are identified in the form of bounding boxes, in other cases, regions of interest are identified with labels classifying the objects. These bounding boxes and/or labels may be displayed on the annotated view.
- FIG. 4 is a flowchart showing an example method 400 for generating a 3D view.
- the method 400 can generate the 3D view 145 of FIG. 1.
- operations of the method 400 may be included in the 3D scene construction block 140 of FIG. 1.
- the method 400 is described below with respect to the block diagram of FIG. 1, however, the method 400 may be performed by any suitable system or device.
- the method 400 begins in block 402 where a patient’s MRI and/or CT data is received or obtained.
- the MRI and CT data may have been collected as part of a preoperative procedure (or part of any radiological studies) to assess the patient and formulate a diagnosis and treatment.
- the MRI and CT data may be stored in a database such as the database 170.
- the MRI and CT data may be received through any feasible network.
- the MRI and/or CT data is segmented.
- the MRI and CT data may be segmented by an artificial intelligence (neural network) process or procedure.
- the segmented objects can include any feasible anatomical parts including individual joints, bones, tendons, muscles, appliances, and the like.
- a 3D model is generated based on the segmented MRI and/or CT data.
- a three-dimensional model may be generated (rendered) since the segmentation procedure of block 404 is applied to each slice of the radiological image.
- Any segmented output from block 404 can inherit any 3D spatial information associated with the MRI and CT data. In this manner, any individual 3D models may be generated based on MRI and/or CT data.
- a surgical video stream is received or obtained.
- the surgical video stream can be the real-time surgical video stream 110 of FIG. 1.
- the surgical stream may be provided by any feasible surgical camera, or in some cases may be streamed through any feasible wired or wireless network (including the Internet or any cellular network).
- recognized anatomy, tool, and/or pathology data is received or obtained.
- the received anatomy, tool, and pathology data may be provided by the anatomy, tool, and pathology processing block 130.
- the anatomy, tool, and pathology data may be received or obtained from the metadata stream.
- the recognized anatomy, tool, and/or pathology data indicates whether the video stream contains or includes any known (trained for) anatomies, implants, tools, or pathologies.
- the anatomy, tool, and pathology processing block 130 may include one or more neural networks that have been trained to recognize (identify) various anatomies, implants, surgical tools, and pathologies within video streams.
- the received anatomy, tool, and pathology data may be used to determine which elements or objects are included in the reconstructed 3D surgical view.
- the relative position of a recognized tool may be used to align the tool within the reconstructed 3D surgical view.
- any pathology data may be highlighted in the 3D surgical view.
- the reconstructed 3D surgical view can be generated based on an estimated camera position with respect to the received/obtained surgical video stream.
- the estimated camera position may be from the point of view of a surgical camera associated with/providing the video stream.
- the estimated camera position may be a virtual camera position.
- the surgical assistance system 100 may receive inputs (keyboard, mouse, or other inputs) from a user to rotate or otherwise move the reconstructed 3D surgical view.
- a position of a virtual camera (a camera pose) may shift or move in order to properly render the view.
- rotating the reconstructed 3D surgical view can allow the surgeon to better visualize various anatomical landmarks or parts of the anatomy that may otherwise be hidden.
- the user may manipulate an anatomical joint shown in the reconstructed 3D surgical view.
- the user can provide inputs to move a reconstructed 3D joint through a range of motion.
- the range of motion may be determined by a priori knowledge of the anatomical joint.
- Some or all of the data associated with the reconstructed 3D surgical view may be output into the metadata stream.
- reconstructed 3D elements portions of segmented anatomy, recognized surgical tools, anatomical landmarks, etc.
- the output reconstructed 3D elements can correspond to any repositioning of the surgical view (rotation, manipulation, or other motion) performed by the surgeon, medical technician, or other operator.
- the data output to the metadata stream may be specified in a configuration file.
- the reconstructed 3D surgical view can include digital landmarks.
- the digital landmarks may be included with the received CT and/or MRI data.
- the digital landmarks can be transferred to the reconstructed 3D surgical view.
- a digital landmark can be determined from interaction with a surgical tool and an object included within a real-time surgical video.
- any feasible pre-operative annotations may be transferred from the MRI/CT data to the reconstructed 3D surgical view or 3D model.
- additional information associated with the digital landmarks or pathologies may be determined and/or displayed.
- measurements of various pathologies and anatomical structures may automatically be measured and reported on a display. Any of the measurements may be output to the metadata stream and stored in a database.
- FIG. 5 is a flowchart showing an example method 500 for generating activity summaries.
- the method 500 can generate the screen descriptors 155 and/or the activity summary 165 of FIG. 1.
- the method 500 is described below with respect to the block diagram of FIG. 1, however, the method 500 may be performed by any suitable system or device.
- the method 500 begins in block 502 where a surgical video stream is received or obtained.
- the surgical video stream can be the real-time surgical video stream 110 of FIG. 1.
- the surgical stream may be provided by any feasible surgical camera, or in some cases may be streamed through any feasible wired or wireless network (including the Internet or any cellular network).
- scene descriptors are generated based on a language and vision model.
- the scene descriptors may be the scene descriptors 155 generated by the language and vision model 150.
- the language and vision model 150 may be implemented as a neural network that can be executed by one or more processors as described above with respect to FIG. 1. That is, block 504 may be implemented as a neural network trained similar to a large language model, however, in this example, inputs are a video image instead of words, phrases, or sentences.
- the generated scene descriptors 155 may be output to the metadata stream.
- the scene descriptors 155 may be stored in the database 170.
- the scene descriptors 155 may include human-readable phrases, or the like that describe one or more elements of a scene or frame of the real-time surgical video.
- the scene descriptors may include timestamps that may be associated with one or more video frames that are included in the real-time surgical video.
- the scene descriptors 155 may be stored in the database 170.
- an activity summary is generated based on the previously generated scene descriptors 155.
- the activity summary 165 may be generated by a large language model neural network trained to identify activities such as surgical activities, entry into an operating region, and the like, from the human-readable phrases that are included in the scene descriptors 155. Any feasible surgery or surgical activity may be summarized. For instance, a sequence of scene descriptions could indicate the arrival of a specific tool into the field of view, its activation as indicated by the appearance of the involved anatomical structure and the withdrawal of the tool from the field of view. These sequences of frames describe a surgical activity, such as debridement. The system may summarize these scene descriptions in terms of the surgical activity. Similarly, a sequence of these summarizations are abstracted into a higher-level activity, ex., site preparation.
- the activity summary 165 may be stored in the database 170. In some other examples, the activity summary 165 may be shown on a display.
- the activity summary 165 may not be provided or updated at the surgical video frame rate.
- the activity summary 165 may be provided or updated at a rate slower than the surgical video frame rate.
- the slower rate of the activity summary 165 may be due to changes in anatomy position or surgical procedures occurring over the course of many video frames, in some cases over several seconds or minutes.
- a rolling window of screen descriptors may be used to determine the activity summary.
- the relationship between generating the screen descriptors and the activity summary can be hierarchical.
- the screen descriptor generation may be operating with lower-level inputs and the screen descriptors themselves may be used as inputs to for higher level activity summary generation.
- the activity summary 165 from many procedures may be reviewed with respect to surgical tools that have been recognized. For example, a typical level of bloodiness may be associated with a typical or normal use of the surgical tool. If within one particular activity summary 165 the level of bloodiness exceeds a typical level of bloodiness, then the activity summary 165 may also include a note indicating that the tool is producing more bleeding than normal and suggesting that the surgeon may need assistance using the tool.
- the activity summary 165 does not necessarily correspond to text labels, image borders, or parameters. Instead, the activity summary 165 may be based on the natural language output of the language and vision model 150.
- FIG. 6 shows an example 3D surgical assistance system 600.
- a 3D surgical assistance system 600 can provide a surgeon or other medical technician real-time guidance regarding an ongoing surgical operation.
- the surgical guidance may include a reconstructed 3D surgical view as described with respect to FIG. 4. That is, the surgical guidance may include any recognized anatomies, implants, tools, or pathologies that are associated with a real-time video stream, such as a video stream from a surgical camera.
- the displayed 3D surgical view may be displayed with respect to an orientation of an external device, tool, surgical camera, or the like.
- the 3D surgical assistance system 600 may include a 3D surgical assistance processor 610 coupled to a fixed video camera 630, a display 611, a non-fixed surgical camera 640, and a data store 620.
- the 3D surgical assistance processor 610 may determine or generate 3D models and images of anatomies, implants, tools, and/or pathologies.
- the 3D surgical assistance processor 610 may generate a 3D view of a knee joint of a patient 660. (Although the knee joint is described here as an example, the 3D surgical assistance processor 610 can generate a reconstructed 3D surgical view of any feasible anatomy of the patient 660.)
- the generated 3D model may be shown on the display 611.
- the reconstructed 3D view may be based on the patient’s previous MRI and CT data as well as real-time video stream data from the non-fixed surgical camera 640 as described with respect to FIG. 4.
- FIGS. 9A-9C illustrate examples of reconstructed 3D views showing surgical tools that may be used as described herein.
- the 3D surgical assistance processor 610 may include more than just a processing device.
- the 3D surgical assistance processor 610 may include one or more processors, memories, input/output interfaces, and the like.
- the 3D surgical assistance processor 610 may include state machines, programmable logic devices, microcontrollers, or any other feasible device or devices capable of performing operations of the 3D surgical assistance processor 610.
- the imaging system includes the orientation of the external fiducial marker 901 and the tool 903.
- the tools position in external shape is shown, including relative to the anatomy to be operated on (e.g., the panned tunnel 905).
- the image also shows a plurality of sections (bottom left, middle and right) of the patient tissue, shown segmented.
- the orientation outside of the body may be tracked, as shown in FIG. 9B and 9C.
- the tool 903 is shown not aligned to the target (e.g., planned tunnel) 905 and the tool may be marked (e.g., by changing the color) to show when it not FIG. 9B) and is (FIG. 9C) aligned.
- fiducial marker or labels may be disposed on the non-fixed surgical camera 640 as well as an operating table 650 supporting the patient 660.
- a first fiducial marker 670 may be affixed to the non-fixed surgical camera 640 and a second fiducial marker 671 may be disposed on a flat (or relatively flat) surface near the patient 660, and preferably near the anatomy (surgical region) that is the subject of an ongoing surgical procedure.
- the second fiducial marker 671 need not be placed on the operating table 650, but may be placed on any feasible surface, preferably near the patient 660.
- the second fiducial marker 671 need not be placed on a flat surface.
- the 3D surgical assistance system 610 can track the relative positions of the nonfixed surgical camera 640 with respect to the patient 660 through the fixed camera 630 viewing the first and second fiducial markers 670 and 671, respectively.
- the surgical assistance processor 610 can use external coordinates of the first and second fiducial markers 670 and 671 to determine positions of the non-fixed surgical camera 640 and an operating region, respectively.
- the surgical assistance processor 610 can determine a physical orientation of an anatomical structure (associated with a fiducial marker) with respect to any feasible surgical tool (associated with another, separate fiducial marker).
- FIG. 7 is a flowchart showing an example method 700 for generating 3D surgical guidance.
- the method 700 is described with respect to the 3D surgical assistance system 600 of FIG. 6, however, the method 700 may be performed by any suitable system or device.
- the method 700 begins in block 702 where fiducial markers are affixed.
- a first fiducial marker may be affixed to a surgical tool, such as but not limited to a surgical cameras, drill guides and the like.
- a second fiducial marker may be affixed to a reference surface associated with the patient.
- more than two fiducial markers may be used.
- the additional fiducial markers may provide an increase in accuracy with respect to determining tool and anatomy positions.
- one or more cameras view the fiducial markers.
- the fixed camera 630 may view the fiducial markers, although any camera with a field of view of sufficient to capture the fiducial markers affixed in block 702 can be used.
- the fixed camera 630 can view both the fiducial marker of the surgical instrument and the fiducial marker associated with the patient.
- the location and/or position of the fixed camera 630 (with respect to local surroundings, operating room, and the like) may be provided to the surgical assistance processor 610.
- the fixed camera 630 may view a first fiducial marker and a second camera (such as the non-fixed surgical camera 640) can view a second fiducial marker.
- the fixed camera 630 can view the first fiducial marker 670 affixed to the non-fixed surgical camera 640 and the non-fixed surgical camera 640 can view the second fiducial marker 671. Viewing the second fiducial marker 671 with the non-fixed surgical camera 640 may advantageously provide the surgical assistance processor 610 position and orientation information associated with the non-fixed surgical camera 640.
- the surgical assistance processor 610 associates a first fiducial marker with a surgical instrument.
- the first fiducial marker may be one of the fiducial markers affixed in block 702.
- the surgical instrument can be any feasible surgical instrument including a surgical camera, or any feasible surgical tool.
- the surgical assistance processor 610 may also ascertain and/or establish a location that is associated with the first fiducial marker.
- the surgical assistance processor 610 determines the location of the surgical instrument with respect to the surgical region. In some examples, the surgical assistance processor 610 can determine the location of the surgical instrument and the location of the surgical region. Then, the surgical assistance processor 610 can determine if and how the surgical instrument interacts with the surgical region. In some examples, the surgical assistance processor 610 determines the location of surgical instruments and surgical regions by determining the location of the associated fiducial markers.
- the surgical assistance processor 610 may be programmed to understand dimensions and layout of the operating room. For example, details regarding the length and position of walls, operating table, and the like may be determined and provided to the surgical assistance processor 610. In addition, the position of the fixed camera 630 with respect to the operating room, as well as camera characteristics such as resolution, zoom capability, and the like can be provided to the surgical assistance processor 610. Thus, using the operating room data and the characteristics of the fixed camera 630, the surgical assistance processor 610 can determine the location of the surgical instrument based at least in part on the view of the first fiducial marker 670 through the fixed camera 630.
- the surgical assistance processor 610 can also determine the location of the surgical region.
- the fixed camera 630 can view the second fiducial marker 671. Then, using the predetermined dimensions and layout of the operating room, as well as information regarding the location of the second fiducial marker 671 with respect to the surgical region, the surgical assistance processor 610 can determine the location of the surgical region. In some cases, the surgical assistance processor 610 can determine a “best fit” between a computed geometry (e.g., a reconstructed 3D surgical view of the patient’s anatomy) and the observed geometry of the second fiducial marker 671.
- a computed geometry e.g., a reconstructed 3D surgical view of the patient’s anatomy
- the surgical assistance processor 610 can determine relative positions and interactions of the surgical instrument with respect to the surgical region. [0134] Next, in block 712, the surgical assistance processor 610 can display surgical guidance.
- the surgical guidance can include an associated reconstructed 3D surgical view as described above with respect to FIG. 4, as well as relative positions of a surgical region and surgical instruments as described in blocks 702, 704, 706, 708, and 710.
- the surgical guidance may be displayed on the display 611 which may be visible to the surgeon or any other medical technician in the operating room.
- the surgical guidance may be transmitted through a network for viewing at locations separated from the operating room.
- the surgical guidance can include 3D models of any feasible surgical instruments, anatomies, pathologies, implants, anchors, anchor points, and the like.
- more than two fiducial markers may be used. Additional fiducial markers may increase a resolution or sensitivity of the surgical assistance processor 610 in determining the location of the surgical instrument or the location of the surgical region. For example, as the surgeon manipulates the surgical instrument, the view of the first fiducial marker by the fixed camera 630 may be impaired or occluded. Thus, some additional fiducial markers placed on different areas of the surgical instrument may help ensure a more continuous visibility independent of surgical instrument movement. Continuous visibility may increase resolution and/or sensitivity of the surgical assistance processor 610.
- multiple fixed cameras may be used to increase resolution or sensitivity.
- the surgical assistance processor 610 may have difficulty in resolving the positions of the surgical instrument and the surgical region. In those cases, the accuracy of the resolved positions may be reduced when vision of a single fixed camera is reduced or impeded.
- the reconstructed 3D surgical view displayed in the context of the method 700 may be based, at least in part, on pre-operative MRI or CT data.
- the surgical guidance displayed in block 712 can show anatomical structures even when the structures are only partially visible with an inserted surgical camera (surgical tool).
- the example shown in FIG. 6 illustrates external tool fixation.
- the methods and apparatuses described herein may be used with a printed codes and/or pattern applied to the tool (e.g., printed sticker) that may be affixed to the tool, this code/pattem may be used as an identifying marker outside of the body, prior to insertion, to orient and/or normalize the position of the tool in space, relative to the body.
- external tool fixation may provide an indicator of the relative distance and dimensions of the tool outside of the body, which may be used by the system to scale in relation to the tool inside of the body.
- the printed sticker with the pattern may be imaged outside of the body by the same imaging platform.
- the system or method may identify the printed sticker pattern on the tool prior to insertion and the shape and/or orientation can be used to determine a plane, magnification, etc. of the imaging system, which may be used to orient the tool in the surgical space.
- the orientation and scaling relative may be used as a reference.
- inserting the tool and scope into the body (e.g., joint) after having the camera ‘see’ the tool (e.g., sticker) on the table outside of the body may provide a differential signal, e.g., an outside image, along with a relative position of the scope/camera with respect to the plane of the sticker (e.g., image).
- a differential signal e.g., an outside image
- the captured images may be mapped/scaled to the original coordinates identified outside of the body.
- fiducial markers there may be two sets of fiducial markers (reference images, e.g., stickers) seen by the scope, e.g. on the tool and on a surface outside of the body (e.g., table, bed, patient, etc.).
- the camera may visualize these reference images, computing a relative position of the camera and the reference surface (e.g., table).
- the system may determine an alignment of the camera relative to the external reference, and this external reference may be used where the camera is inserted into the body, providing position/orientation in real space.
- This technique may use an external camera and an internal camera (scope) that is inserted into the body.
- the scope When inserting the scope into the body, the scope may initially see/image the external sticker (e.g., the table, bed, etc.) and may give an instant computation of the relative orientation.
- the markers may be printed in a unique shape and size so that the external camera can immediately compute a distance and alignment from the marker(s).
- the shape may be continuously sensed and processed.
- FIG. 8 shows a block diagram of a device 800 that may be one example of the intraoperative surgical assistance system 100 of FIG. 1 and/or the 3D surgical assistance system 600 of FIG. 6.
- the device 800 may include a communication interface 820, a processor 830, and a memory 840.
- the communication interface 820 which may be coupled to a network (such as network 812) and to the processor 830, may transmit signals to and receive signals from other wired or wireless devices, including remote (e.g., cloud-based) storage devices, cameras, processors, compute nodes, processing nodes, computers, mobile devices (e.g., cellular phones, tablet computers and the like) and/or displays.
- the communication interface 820 may include wired (e.g., serial, ethernet, or the like) and/or wireless (Bluetooth, Wi-Fi, cellular, or the like) transceivers that may communicate with any other feasible device through any feasible network.
- the communication interface 820 may receive video stream data from a camera 810.
- the processor 830 which is also coupled to the communications interface 820, and the memory 840, may be any one or more suitable processors capable of executing scripts or instructions of one or more software programs stored in the device 800 (such as within memory 840).
- the memory 840 may also include a non-transitory computer-readable storage medium (e.g., one or more nonvolatile memory elements, such as EPROM, EEPROM, Flash memory, a hard drive, etc.) that may store the following software modules:
- a non-transitory computer-readable storage medium e.g., one or more nonvolatile memory elements, such as EPROM, EEPROM, Flash memory, a hard drive, etc.
- an anatomy, tool, and pathology module 842 to recognize anatomies, implants, surgical tools and/or pathologies with a video stream or video data;
- a location determination module 846 to determine the location of surgical tools, surgical regions, and the like.
- the processor 830 may execute the anatomy, tool, and pathology module 842 to determine and/or recognize any feasible anatomies, implants, surgical tools, and/or pathologies that may be included in a video stream.
- execution of the anatomy, tool, and pathology module 842 may execute one or more neural networks that have been trained to recognize and/or identify various anatomies, implants, surgical tools, and/or pathologies.
- execution of the anatomy, tool, and pathology module 842 may cause the execution of one or more of the operations or neural networks described with respect to FIG. 1 to identify anatomies, implants, surgical tools, and/or pathologies.
- a label may be placed next to the recognized anatomy, tool, or pathology in the source video stream.
- a bounding box may surround or highlight the recognized anatomy, tool, or pathology.
- the recognized anatomy, tool, and pathology data may be provided to the metadata stream. Execution of the anatomy, tool, and pathology module 842 may generate an annotated view, such as the annotated view 115 of FIG. 1.
- the processor 830 may execute the 3D scene construction module 843 to generate a 3D model of one or more objects that have been detected or recognized in a video stream. For example, execution of the 3D scene construction module 843 may generate a 3D model of a recognized anatomy , tool, or pathology. In some cases, the 3D scene construction module 843 may receive data indicating a recognized anatomy, tool, or pathology directly or indirectly from the anatomy, tool, and pathology module 842. Furthermore, the 3D scene construction module 843 can receive a patient’s MRI and/or CT data with which to construct associated 3D models.
- Execution of the 3D scene construction module 843 may generate a 3D image for each recognized anatomy , tool, or pathology.
- the generated 3D images may be composited together to compose a rendered (computer-based) scene including representative objects included in the real-time video stream.
- the generated 3D images may be displayed in the 3D view 145.
- the processor 830 can execute the language and vision model module 844 to generate descriptive, human-readable phrases that describe objects and activities that have been recognized within a video stream, including any feasible real-time video stream.
- execution of the language and vision model module 844 may cause the execution of one or more neural networks trained to output a scene description in human- readable text (scene descriptors) based on an input of one or more frames of video as described with respect to FIG. 5.
- execution of the language and vision model module 844 may generate the human-readable scene descriptors 155.
- the human-readable scene descriptors 155 may be output to the metadata stream or stored in a database.
- the processor 830 may execute the scene summary LLM module 845 to generate a human-readable activity summary based at least in part on the human-readable scene descriptors generated by the language and vision model module 844.
- execution of the scene summary LLM module 845 may cause the execution of one or more large language model neural networks to operate on scene descriptors from multiple scenes (video frames) to determine or generate one or more activity summaries.
- the neural networks may be trained with the scene descriptors from the language and vision model module 844.
- execution of the scene summary LLM module 845 can operate on a “rolling window” of scene descriptors to generate an activity summary.
- the processor 830 may execute the location determination module 846 to determine the location of one or more objects visible by a camera, such as the camera 810. Execution of the location determination module 846 may process video data from one or more cameras (including camera 810 and/or a surgical camera, not shown) to locate one or more fiducial markers. In some examples, the location determination module 846 can determine the location of objects (surgical cameras, and the like) and regions (surgical regions and the like) based on view the one or more fiducial markers. Execution of the location determination module 846 may cause the processor 830 to perform one or more operations described with respect to FIGS. 6 and 7.
- the processor 830 may execute the communication control module 847 to communicate with any other feasible devices.
- execution of the communication control module 847 may enable the device 800 to communicate through the communications interface 820 via networks 812 including cellular networks conforming to any of the LTE standards promulgated by the 3 rd Generation Partnership Project (3GPP) working group, WiFi networks conforming to any of the IEEE 802.11 standards, Bluetooth protocols set forth by the Bluetooth Special Interest Group (SIG), Ethernet protocols, or the like.
- execution of the communication control module 847 may enable the device 800 to communicate with one or more cameras 810, displays 811, or input devices such as keyboards, mice, touch pads, touch screens, or the like (not shown).
- execution of the communication control module 847 may implement encryption and/or decryption procedures.
- the methods and apparatuses described herein may use three or more video streams. In some examples two or more of them may be linked. In some examples one or more of them have a lower frame rate. In particular, the video stream used for natural language interpretation may have a lower famer rate, as used by the language/vision module. These methods and apparatuses may also include patient imaging date, e.g., MR.I/CT data.
- the natural language module(s) may use 1-2 frames to describing the scene using human-understandable language (natural language). This language may be expressed by the module(s) in human-readable text that may be output by the one or more (e.g., language) modules.
- these methods and apparatuses may use scene descriptions and then scene summarizations, including determining what is occurring in the scene and describing, in natural language, the scene in order to develop an understanding of what the procedure is being performed, e.g., based on an understanding of the scene.
- These methods and apparatuses may include or access a database.
- the different data streams do not need to be and may not be merged.
- the text annotations may not be merged.
- the methods and apparatuses described herein may keep the video streams separate and distinct.
- multiple streams e.g., four distinct streams
- the display may be presented as a picture-in-picture and/or may be toggled.
- the use of scene description by natural language may be use to describe the actions taken by the physician (e.g., surgeon) during a procedure; the scene description may be performed in real time and/or after the procedure and may be shown during the procedure, particularly to parties other than the surgeon, such as technician, nurses, etc. and/or may be entered into the patient or hospital records.
- these method and apparatuses may use a scene activity database.
- the scene description using natural language describe herein may be used to provide an assistant (nurse, technician, etc.) context and information about what may be coming next during the procedure, e.g., tools needed, etc. in order to improve the flow of the procedure and assist a surgeon.
- the use of scene summarization may provide natural language interpretation of the ongoing procedure.
- Any of these methods and apparatuses may include a database of videos (e.g., instruction videos, procedure videos, tool demos, etc.). These methods and apparatuses may be used or useful to suggest one or more tools and/or to provide a video of how a tool may be deployed, including in the context of the procedure being performed or to be performed.
- videos e.g., instruction videos, procedure videos, tool demos, etc.
- scene descriptions described herein may infer activities using natural language based on the scene.
- a scene is distinct from a frame.
- Scene description is not necessarily a comparison between different frames, but is an interpretation, typically in natural language of the action occurring in the context of the images.
- a scene description e.g., using video-to-natural language, as described herein, may be part of any system, or may be implemented independently of the other data streams described herein, e.g., as an independent tool, apparatus or method.
- the methods and apparatuses described herein may be a natural language scene description including description and/or inference of a range that a tool is moving.
- the scene description may provide a history and context of the content of a video stream using natural language.
- the language/vision (and scene summarization) modules may determine the context of the scenes and express them in natural language. This allows the scene summarization to provide a relatively high level of abstraction and summarization describing not just the immediate characteristics of the scene (e.g. location, presence of particular tools, identification of landmarks, etc.) but commenting on the context of these characteristics, including indicating the procedure type, possible complications or expectations, risks, etc. described in words.
- the use of the language and scene summarization modules may be performed (and/or in some examples, output) in parallel with the other video data streams, including the 3D stream, the unmodified/original stream, the Al stream, etc.
- the LLM data may be provided to, and/or may use information from, a metadata stratum, as shown in FIG. 1.
- the LLM may be separated from the other streams.
- the frame rate for the LLM may be different, and in particular, may be slower than the other streams, including slower than video rate.
- the various streams including but not limited to the LLM stream, may be unblended, so that they are not overlaid onto each other.
- the primary view may remain unblended.
- the LLM may provide a language-based (rather than pixel-based) segmentation of the video images/scenes.
- the LLM may identify components of the scene in natural language, and describes these ‘segment’ descriptions in the summary. This may provide a more robust tracking, e.g., across images (scenes).
- the natural language segmentation in sentences may describe a region of the tissue (e.g., lesion) from when it appears, it’s relative location (on a wall of the lumen, etc.), a diagnostic characteristic (discolored, inflamed, swollen, bleeding, etc.), changes during the scene(s), etc.
- the use of natural language (sentences) for the description of tools may provide significantly more context and information than simple visual segmentation.
- the LLM modules may be trained machine learning (e.g., deep learning) agents. These machine learning agents may be trained using a language-based, rather than tag-based approach.
- the approach may be iterative. For example, an initial approach may provide frame recognition, e.g., describe a scene initially identifying structures and the relative appearance of the structures (e.g., a first natural language module may describe the structures as being in the top left, describe the appearance of a tool, the rem oval/di sappearance of the tool, etc.); based on this language, the language module(s) may infer the action being performed (based on the language-inferred actions describing what is going on).
- frame recognition e.g., describe a scene initially identifying structures and the relative appearance of the structures
- a first natural language module may describe the structures as being in the top left, describe the appearance of a tool, the rem oval/di sappearance of the tool, etc.
- the language module(s) may infer the action
- the module(s), such as the scene summarization modules may be trained purely on the model language. Images may be used to get the frame description. This description may then be integrated into a natural language narrative, e.g., summarizing the action(s) being performed. [0166] Although the use of the LLM stream is described herein in the medical/surgical context, it should be understood that these techniques may be applied more broadly, e.g., for narration of scenes not limited to the medical context.
- any of the methods (including user interfaces) described herein may be implemented as software, hardware or firmware, and may be described as a non-transitory computer-readable storage medium storing a set of instructions capable of being executed by a processor (e.g., computer, tablet, smartphone, etc.), that when executed by the processor causes the processor to control perform any of the steps, including but not limited to: displaying, communicating with the user, analyzing, modifying parameters (including timing, frequency, intensity, etc.), determining, alerting, or the like.
- any of the methods described herein may be performed, at least in part, by an apparatus including one or more processors having a memory storing a non-transitory computer-readable storage medium storing a set of instructions for the processes(s) of the method.
- computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions, such as those contained within the modules described herein.
- these computing device(s) may each comprise at least one memory device and at least one physical processor.
- one or more of the devices described herein may transform data, physical devices, and/or representations of physical devices from one form to another. Additionally or alternatively, one or more of the modules recited herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form of computing device to another form of computing device by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
- computer-readable medium generally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions.
- Examples of computer-readable media comprise, without limitation, transmission-type media, such as carrier waves, and non-transitory-type media, such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks (DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-state drives and flash media), and other distribution systems.
- transmission-type media such as carrier waves
- non-transitory-type media such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks (DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-state drives and flash media),
- spatially relative terms such as “under”, “below”, “lower”, “over”, “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is inverted, elements described as “under” or “beneath” other elements or features would then be oriented “over” the other elements or features. Thus, the exemplary term “under” can encompass both an orientation of over and under.
- first and second may be used herein to describe various features/elements (including steps), these features/elements should not be limited by these terms, unless the context indicates otherwise. These terms may be used to distinguish one feature/element from another feature/element. Thus, a first feature/element discussed below could be termed a second feature/element, and similarly, a second feature/element discussed below could be termed a first feature/element without departing from the teachings of the present invention.
- a numeric value may have a value that is +/- 0.1% of the stated value (or range of values), +/- 1% of the stated value (or range of values), +/- 2% of the stated value (or range of values), +/- 5% of the stated value (or range of values), +/- 10% of the stated value (or range of values), etc.
- Any numerical values given herein should also be understood to include about or approximately that value, unless the context indicates otherwise. For example, if the value " 10" is disclosed, then “about 10" is also disclosed.
- any numerical range recited herein is intended to include all sub-ranges subsumed therein. It is also understood that when a value is disclosed that “less than or equal to” the value, “greater than or equal to the value” and possible ranges between values are also disclosed, as appropriately understood by the skilled artisan. For example, if the value "X” is disclosed the “less than or equal to X” as well as “greater than or equal to X” (e.g., where X is a numerical value) is also disclosed. It is also understood that the throughout the application, data is provided in a number of different formats, and that this data, represents endpoints and starting points, and ranges for any combination of the data points.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Surgery (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Veterinary Medicine (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Molecular Biology (AREA)
- Radiology & Medical Imaging (AREA)
- Optics & Photonics (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Robotics (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un procédé de fourniture d'une assistance chirurgicale tridimensionnelle. Des modèles tridimensionnels peuvent être construits à partir de données d'imagerie par résonance magnétique (IRM) et de tomodensitométrie (CT) d'un patient. Des anatomies, des outils chirurgicaux, des pathologies ou d'autres objets qui sont inclus dans une vidéo chirurgicale en temps réel peuvent être identifiés. Les objets identifiés peuvent être remplacés par les modèles tridimensionnels dans une vue reconstruite de la région chirurgicale.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463552650P | 2024-02-12 | 2024-02-12 | |
| US63/552,650 | 2024-02-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025174815A1 true WO2025174815A1 (fr) | 2025-08-21 |
Family
ID=96773968
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/015501 Pending WO2025174815A1 (fr) | 2024-02-12 | 2025-02-12 | Guidage chirurgical peropératoire à l'aide d'une reconstruction tridimensionnelle |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025174815A1 (fr) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120289782A1 (en) * | 2011-05-13 | 2012-11-15 | Tyco Healthcare Group Lp | Twin camera endoscope |
| US20220079675A1 (en) * | 2018-11-16 | 2022-03-17 | Philipp K. Lang | Augmented Reality Guidance for Surgical Procedures with Adjustment of Scale, Convergence and Focal Plane or Focal Point of Virtual Data |
| US20220122263A1 (en) * | 2017-09-28 | 2022-04-21 | Shanghai United Imaging Healthcare Co., Ltd. | System and method for processing colon image data |
| US20220287676A1 (en) * | 2021-03-10 | 2022-09-15 | Onpoint Medical, Inc. | Augmented reality guidance for imaging systems |
| WO2022249190A1 (fr) * | 2021-05-26 | 2022-12-01 | Beyeonics Surgical Ltd. | Système et procédé de vérification de conversion d'emplacements entre des systèmes de coordonnées |
| US20230190136A1 (en) * | 2020-04-13 | 2023-06-22 | Kaliber Labs Inc. | Systems and methods for computer-assisted shape measurements in video |
-
2025
- 2025-02-12 WO PCT/US2025/015501 patent/WO2025174815A1/fr active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120289782A1 (en) * | 2011-05-13 | 2012-11-15 | Tyco Healthcare Group Lp | Twin camera endoscope |
| US20220122263A1 (en) * | 2017-09-28 | 2022-04-21 | Shanghai United Imaging Healthcare Co., Ltd. | System and method for processing colon image data |
| US20220079675A1 (en) * | 2018-11-16 | 2022-03-17 | Philipp K. Lang | Augmented Reality Guidance for Surgical Procedures with Adjustment of Scale, Convergence and Focal Plane or Focal Point of Virtual Data |
| US20230190136A1 (en) * | 2020-04-13 | 2023-06-22 | Kaliber Labs Inc. | Systems and methods for computer-assisted shape measurements in video |
| US20220287676A1 (en) * | 2021-03-10 | 2022-09-15 | Onpoint Medical, Inc. | Augmented reality guidance for imaging systems |
| WO2022249190A1 (fr) * | 2021-05-26 | 2022-12-01 | Beyeonics Surgical Ltd. | Système et procédé de vérification de conversion d'emplacements entre des systèmes de coordonnées |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11786310B2 (en) | Intermodal synchronization of surgical data | |
| US11594002B2 (en) | Overlay and manipulation of medical images in a virtual environment | |
| JP6400793B2 (ja) | 画像表示の生成方法 | |
| JP7127785B2 (ja) | 情報処理システム、内視鏡システム、学習済みモデル、情報記憶媒体及び情報処理方法 | |
| Navab et al. | Action-and workflow-driven augmented reality for computer-aided medical procedures | |
| JP6230708B2 (ja) | 撮像データセットの間の所見のマッチング | |
| US20160004917A1 (en) | Output control method, image processing apparatus, and information processing apparatus | |
| JP7497930B2 (ja) | 臨床記録データ生成方法及び装置 | |
| JP2023523560A (ja) | Ai支援手術のためのシステムおよび方法 | |
| US20230149092A1 (en) | Systems and methods for compensating for obstructions in medical images | |
| Arakaki et al. | Artificial Intelligence in Minimally Invasive Surgery: Current State and Future Challenges | |
| CN114945990A (zh) | 用于基于操作情境提供外科手术辅助的系统和方法 | |
| WO2025174815A1 (fr) | Guidage chirurgical peropératoire à l'aide d'une reconstruction tridimensionnelle | |
| CN208017582U (zh) | 计算机辅助微创手术装置 | |
| US12475578B2 (en) | Systems and methods for registering visual representations of a surgical space | |
| De Paolis | Advanced navigation and augmented visualization in minimally invasive surgery | |
| De Paolis et al. | An Augmented Reality Platform for the Enhancement of Surgical Decisions in Pediatric Laparoscopy | |
| De Paolis et al. | Visualization system to improve surgical performance during a laparoscopic procedure | |
| De Paolis et al. | A navigator of the patient's anatomy model for intra-operative surgical guidance | |
| HK1216707B (en) | Intramodal synchronization of surgical data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25755523 Country of ref document: EP Kind code of ref document: A1 |