WO2025198964A1

WO2025198964A1 - Machine learning based anatomical state tracking

Info

Publication number: WO2025198964A1
Application number: PCT/US2025/020017
Authority: WO
Inventors: Benjamin Mueller; Rui Guo; Conor Perreault
Original assignee: Intuitive Surgical Operations Inc
Current assignee: Intuitive Surgical Operations Inc
Priority date: 2024-03-18
Filing date: 2025-03-14
Publication date: 2025-09-25
Anticipated expiration: 2026-09-18

Abstract

Machine learning based anatomical state tracking is described. A system can include one or more processors, coupled with memory, to receive data of a medical procedure performed on a subject with a robotic medical system. The one or more processors can identify, with one or more models trained with machine learning, an anatomical structure based on the data. The one or more processors can detect, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure. The one or more processors can provide, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

Description

MACHINE LEARNING BASED ANATOMICAL STATE TRACKING

CROSS-REFERENCE TO RELATED PATENT APPLICATION

[0001] This application claims the benefit of, and priority to, under 35 U.S.C. § 119, U.S. Provisional Patent Application No. 63/566,531, filed March 18, 2024, which is hereby incorporated by reference herein in its entirety.

BACKGROUND

[0002] A robotic medical system can include an instrument for performing a medical session or procedure. For example, the instrument can be used to perform surgery, therapy, or a medical evaluation. The robotic medical system can include an endoscope that captures a video of the medical procedure.

SUMMARY

[0003] Technical solutions disclosed herein can include machine learning based anatomical state tracking. A computing system can include a multi-layered machine learning model to detect and assess anatomical states. The model can include layers or branches built on top of a baseline model or baseline or base branch. The base branch can use deep learning to perform anatomy segmentation to identify and track anatomical structures, such as organs, tissue, or bones. Other branches can use the identified anatomical structures of the base branch to determine anatomical states for the structures. For example, the computing system can implement machine learning models on the outputs of the base anatomy segmentation model to determine anatomical states such as size, coloration, bleeding, burning, or transected state. The computing system can implement a machine learning model to segment an anatomical structure into different parts, e.g., identify regions of bruising, charring, burning, fatty tissue, muscle tissue, or connective tissue. With the layered approach, the models can consume less computing resources (e.g., processor resources, memory resources, power resources) to be trained, stored, or executed to detect states. Because the multiple layers of the model can be combined into one model or model set and trained together, the processing resources required to train the model can be decreased, and the trained model can be stored on less memory, and consume less processing resources to execute, as compared to multiple separate models that might be trained and executed separately. Furthermore, because a model or branch can execute on the determinations of an underlying base model or branch, the higher level model branches may be lighter or smaller models that do not require as much training data or number of dimensional inputs.

[0004] One aspect of the present disclosure is directed to a system. The system can include one or more processors, coupled with memory, to receive data of a medical procedure performed on a subject with a robotic medical system. The one or more processors can identify, with one or more models trained with machine learning, an anatomical structure based on the data. The one or more processors can detect, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure. The one or more processors can provide, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

[0005] The one or more processors can execute the one or more models on a frame of a video of the medical procedure. The one or more processors can generate, using the identified anatomical structure, a first mask of the frame for the anatomical structure. The one or more processors can generate, using the detected state, a second mask of the frame for the state.

[0006] The one or more processors can execute the one or more models trained with the machine learning. The one or more models can include a first branch including at least one first model, the first branch to identify the anatomical structure. The one or more models can include a second branch including at least one second model, the second branch to detect the state of the anatomical structure using the anatomical structure identified by the first branch.

[0007] The one or more processors can receive a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system. The one or more processors can generate an encoding of the frame. The one or more processors can generate an embedding of the identification of the medical procedure. The one or more processors can execute a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure. The one or more processors can execute a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

[0008] The one or more processors can generate, using the identified anatomical structure, a first mask of a frame for the anatomical structure. The one or more processors can generate, using the detected state, a second mask of the frame for the state. The one or more processors can determine a level of overlap between the first mask and the second mask. The one or more processors can train the one or more models to maximize the level of overlap.

[0009] The one or more processors can determine a first loss of a first branch of the one or more models, the first branch to identify the anatomical structure. The one or more processors can determine a second loss of a second branch of the one or more models, the second branch to detect the state of the anatomical structure. The one or more processors can concurrently train the first branch and the second branch with the first loss and the second loss.

[0010] The one or more processors can receive a frame of a video of the medical procedure, hyperspectral data of the frame, and an identification of the medical procedure from the robotic medical system. The one or more processors can execute a first branch of the one or more models on the frame and the identification of the medical procedure to identify the anatomical structure. The one or more processors can execute a second branch of the one or more models on the frame, the identification of the medical procedure, and the identified anatomical structure to detect the state of the anatomical structure. The one or more processors can execute a third branch of the one or more models on the hyperspectral data and at least one of the identified anatomical structure or the state of the anatomical structure to detect an oxygen saturation of the anatomical structure.

[0011] The one or more processors can execute a first model of the one or more models using the identified anatomical structure and the detected state to generate a first frame including a first mask that indicates the state. The one or more processors can execute a second model of the one or more models using the identified anatomical structure to generate a second frame including a second mask that indicates the anatomical structure.

[0012] The one or more processors can detect, with the one or more models and based on the identified anatomical structure, the state of the anatomical structure at points in time using frames of a video of the medical procedure. The one or more processors can track the state over time using the detected state from the frames. The one or more processors can generate the indication of the performance of the medical procedure using changes to the state over time. [0013] The one or more processors can search videos of medical procedures with the identified anatomical structure and the state of the anatomical structure to identify a subset of the videos. The one or more processors can generate data to cause a graphical user interface to display indications of the subset of the videos.

[0014] The one or more processors can receive the data during the medical procedure. The one or more processors can generate an alert during the medical procedure using the detected state of the anatomical structure by the one or more models. The one or more processors can generate data to cause a graphical user interface to display the alert.

[0015] The one or more processors can generate a video of the medical procedure including masks to indicate the anatomical structure identified by the one or more models and the state of the anatomical structure detected by the one or more models. The one or more processors can generate data to cause a graphical user interface to display the video including the masks.

[0016] At least one aspect of the present disclosure is directed to a method. The method can include receiving, by one or more processors, coupled with memory, data of a medical procedure performed on a subject with a robotic medical system. The method can include identifying, by the one or more processors, with one or more models trained with machine learning, an anatomical structure based on the data. The method can include detecting, by the one or more processors, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure. The method can include providing, by the one or more processors, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

[0017] The method can include executing, by the one or more processors, the one or more models trained with the machine learning. The one or more models can include a first branch including at least one first model, the first branch to identify the anatomical structure. The one or more models can include a second branch including at least one second model, the second branch to detect the state of the anatomical structure using the anatomical structure identified by the first branch.

[0018] The method can include receiving, by the one or more processors, a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system. The method can include generating, by the one or more processors, an encoding of the frame. The method can include generating, by the one or more processors, an embedding of the identification of the medical procedure. The method can include executing, by the one or more processors, a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure. The method can include executing, by the one or more processors, a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

[0019] The method can include generating, by the one or more processors, using the identified anatomical structure, a first mask of a frame for the anatomical structure. The method can include generating, by the one or more processors, using the detected state, a second mask of the frame for the state. The method can include determining, by the one or more processors, a level of overlap between the first mask and the second mask. The method can include training, by the one or more processors, the one or more models to maximize the level of overlap.

[0020] The method can include determining, by the one or more processors, a first loss of a first branch of the one or more models, the first branch to identify the anatomical structure. The method can include determining, by the one or more processors, a second loss of a second branch of the one or more models, the second branch to detect the state of the anatomical structure. The method can include concurrently training, by the one or more processors, the first branch and the second branch with the first loss and the second loss.

[0021] At least one aspect of the present disclosure is directed to a non-transitory computer readable medium. The medium can store processor-executable instructions that, when executed by one or more processors, cause the one or more processors to receive data of a medical procedure performed on a subject with a robotic medical system. The instructions can cause the one or more processor to identify, with one or more models trained with machine learning, an anatomical structure based on the data. The instructions can cause the one or more processors to detect, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure. The instructions can cause the one or more processors to provide, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

[0022] The instructions can cause the one or more processors to receive a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system. The instructions can cause the one or more processors to generate an encoding of the frame. The instructions can cause the one or more processors to generate an embedding of the identification of the medical procedure. The instructions can cause the one or more processors to execute a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure. The instructions can cause the one or more processors to execute a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

[0023] The instructions can cause the one or more processors to generate using the identified anatomical structure, a first mask of a frame for the anatomical structure. The instructions can cause the one or more processors to generate using the detected state, a second mask of the frame for the state. The instructions can cause the one or more processors to determine a level of overlap between the first mask and the second mask. The instructions can cause the one or more processors to train the one or more models to maximize the level of overlap.

[0024] These and other aspects and implementations are discussed in detail below. The foregoing information and the following detailed description include illustrative examples of various aspects and implementations, and provide an overview or framework for understanding the nature and character of the claimed aspects and implementations. The drawings provide illustration and a further understanding of the various aspects and implementations, and are incorporated in and constitute a part of this specification. The foregoing information and the following detailed description and drawings include illustrative examples and should not be considered as limiting.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] The accompanying drawings are not intended to be drawn to scale. Like reference numbers and designations in the various drawings indicate like elements. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:

[0026] FIG. 1 depicts an example computing system to identify an anatomical structure and detect a state of the anatomical structure. [0027] FIG. 2 depicts an example model to identify an anatomical structure and detect a state of the anatomical structure.

[0028] FIG. 3 depicts an example method of identifying an anatomical structure and detecting a state of the anatomical structure.

[0029] FIG. 4 depicts an example method of training a model with a combined branch loss.

[0030] FIG. 5 depicts an example computing architecture of a computing system.

DETAILED DESCRIPTION

[0031] Following below are more detailed descriptions of various concepts related to, and implementations of, methods, apparatuses, and systems for machine learning based anatomical state tracking. The various concepts introduced above and discussed in greater detail below may be implemented in any of numerous ways.

[0032] This disclosure is generally directed to a machine learning system to identify anatomical structures and states of the anatomical structures in frames of a video of a medical procedure performed by a robotic medical system. Performance indicators can track the performance of a medical procedure, such as a surgery. The performance indicators can include manipulator distance traveled, energy consumption, number of clutches, etc. During surgery, a surgeon can quickly assess the state of an organ of a patient and take actions to ensure an effective and efficient treatment of a condition. The actions that the surgeon takes can be linked to the performance indicators, or can be used to generate the performance indicators. However, simply using performance indicators to track the performance of a surgeon may only quantify the actions the surgeon took, and may not provide a contextual insight regarding why the surgeon took the action, or the result that the action had on the patient. A reliance on performance indicators can lead to surgeon confusion, because surgeons may not be able to tell the reason why a performance indicator has decreased or increased during or after a surgery. For example, a performance indicator for manipulator distance traveled might be low because of an enlarged or swollen liver, making navigation of robotic manipulators more difficult during surgery. Similarly, a performance indicator such as energy usage can be increased because of a large degree of bleeding, or a thick connective tissue, requiring additional surgical work. However, because this context regarding the anatomical structures that the surgeon is operating on is not factored into the calculation of the performance indicators, a surgeon may not have a clear picture regarding why their performance indicators may increase or decrease from operation to operation. Without an understanding of the states of the anatomical structures operated on, a surgeon may have difficulty quantifying their performance during surgery.

[0033] To assess anatomical states during and after surgery, a surgeon may manually record and report the states of various organs or other anatomical structures. Furthermore, a surgeon can take pathology or biopsy samples or perform additional imaging during a surgery to determine the states of the anatomical structures. Taking samples or performing additional imaging can lead to longer and more invasive or intrusive surgeries, and can increase surgeon work, increase surgery duration, and increase power consumption of the robotic medical system. The extended duration of surgeries can lead to further wear on a medical robotic system and make even routine surgeries take longer and cause medical robotic systems to wear out faster, and require more maintenance. Furthermore, the accuracy of the reports made by the surgeons may rely on the surgeon providing accurate information. If a surgeon makes an error, and improperly indicates what region a sample was taken from, the reported states of various anatomical structures can be incorrect. Furthermore, if a surgeon reviews a video after a surgery, the surgeon provided quantification of the states of the anatomical structures may be subjective, and different surgeons may conclude that anatomical structures have different states based on the same video.

[0034] A model can be trained to detect the states of anatomical structures from input frames. However, because anatomical states can appear different for different anatomical structures, the model training may need a large training dataset for a vast number of different types of anatomical structures, which may not be available. Training a single model to detect a state of an anatomical structure from a frame with a large dataset can result in increased training time, increased model size, increased memory usage, or increased processing resource usage. This can make model training difficult and resource intensive (e.g., consume large amounts of processing, memory, or storage resources), and the large size of the model can make real-time or intraoperative implementation of the model difficult. For postoperative implementations, the model may consume a large amount of processing resources, memory resources, or power consumption resources. [0035] Therefore, there is a need to efficiently assess the anatomical states of anatomical structures in an objective and quantifiable manner. An objective assessment of anatomical states can allow a surgeon to understand the success of a surgery and the final anatomical states of the surgery, allowing the surgeon to predict whether a patient will successfully heal after surgery. An objective determination of anatomical states can help a surgeon identify the difficulty of a case, and respond to sudden changes or states indicating damage to anatomical structures, such as bleeding, burning, or tearing, which may need surgeon attention or changes in approach.

[0036] To solve these, and other technical problems, technical solutions of this disclosure can include a multi-layered machine learning model to detect and assess anatomical states. A computing system can implement a multi-layered machine learning model that incorporates multiple data sources to provide assessments of different categories of anatomical states in an efficient manner. Each assessment can be a separate branch built on top of a baseline model or baseline branch. The base branch can use machine learning or deep learning to perform anatomy segmentation to identify and track anatomical structures, such as organs, tissue, or bones. Other branches can use the identified anatomical structures of the base branch to determine anatomical states or parts of the structures. For example, the computing system can implement machine learning models or deterministic computations on the outputs of the base anatomy segmentation model to determine anatomical states such as size, coloration, burning, bleeding, or transected state. The computing system can implement a machine learning model to segment an anatomical structure into different parts, e.g., identify regions of bruising, charring, burning, fatty tissue, muscle tissue, connective tissue, etc. Furthermore, the computing system can implement a machine learning model that incorporates additional data sources, e.g., for states such as oxygen (02) saturation a model can be implemented to incorporate hyperspectral image data. Other branches could be implemented to recognize surgical equipment or instruments, such as sutures, grafts, or clips.

[0037] With the layered approach, the model can consume less computing resources (e.g., processor resources, memory resources, power resources) to be trained, stored, or executed to detect states. Because the multiple layers of the model can be combined into one model or model set and trained together, the processing resources required to train the model can be decreased, and the trained model can be stored on less memory, and consume less processing resources to execute, as compared to multiple separate models that might be trained and executed separately. Furthermore, because a model or branch can execute on the determinations of an underlying base model or branch, the higher level model branches may be lighter or smaller models that do not require as much training data or number of dimensional inputs. At least one data input into the model system can be medical robotic system data, such as medical procedure type, medical procedure phase, or medical procedure step. The medical robotic system data can provide a latent state that focuses and accelerates the training of the model system. Furthermore, using a layered model approach, the model system can be flexible and adaptable to new data modalities that become available to the computing system. For example, if hyperspectral image data becomes available to the computing system, an additional branch or model can be implemented over the base anatomy segmentation model. The additional model can consume the hyperspectral image data in addition to the anatomy detection of the base anatomy segmentation model to detect states such as oxygen saturation of tissue.

[0038] With the multi-layer model approach, the system can accurately describe anatomical structures and anatomical states in a video. The states can be tracked over time to quantify the changes to the state of the anatomical structure. These states can be outputs leveraged by other systems or applications, such as surgical quality quantification and assessment systems. Furthermore, by detecting the states of the anatomical structures, video review software can be improved by allowing a surgeon to quickly navigate to sections of a procedure video where the state of an anatomical structure needs to be reviewed or where there was a drastic change in state, such as transection or hemostasis. Furthermore, historical states and procedure outcomes can be tracked, and used to make post-operative recommendations. Similarly, the computing system can use the states to make intra-operative recommendations based on real-time state assessments and comparisons.

[0039] Referring now to FIG. 1, among others, an example system 100 including a computing system 110 to identify an anatomical structure and detect a state of the anatomical structure is shown. The system 100 can include at least one computing system 110. The computing system 110 can be a data processing system, a computing system, a computer system, a computer, a desktop computer, a laptop computer, a tablet, a control system, a console system, an embedded system, a cloud computing system, a server system, or any other type of computing system. The computing system 110 can be an on-premises system or an off-premises system. The computing system 110 can be a hybrid system, where some components of the computing system 110 are located on-premises, and some components of the computing system 110 are located off-premises.

[0040] The system 100 can include at least one robotic medical system 105. The robotic medical system 105 can be a robotic system, apparatus, or assembly including at least one instrument. The instrument can be or include a tip or end. The tip or end can be installed with or to the instrument. The tip can be removable or a permanent component of the instrument or the robotic medical system 105. For example, the tip can be a scalpel, scissors, a monopolar curved scissors (MCS), a cautery hook tip, a cautery spatula tip, a needle driver, a forceps, a tooth retractor, a drill, or a clip applier. The instrument can be or include a robotic arm, a robotic appendage, a robotic snake, or any other motor-controlled member that can be articulated by the robotic medical system. The instrument can include at least one actuator, such as a motor, servo, or other actuator device. The instrument can be manipulated by motors, servos, actuators, or other devices to perform a medical procedure. The robotic medical system 105 can perform a medical session or medical procedure. For example, the robotic medical system 105 can articulate the instrument to perform surgery, therapy, or a medical evaluation with the instrument. The medical procedure can be performed on a subject, e.g., a human, an adult, a child, or an animal. A medical practitioner, such as a surgeon, technician, nurse, or other operator can provide input via a user device or input apparatus (e.g., joystick, buttons, touchpad, keyboard, steering apparatus, etc.) to manipulate the instrument to perform a medical procedure. The robotic medical system 105 can include an endoscope, in some implementations. The endoscope can be an instrument that is manipulated by the medical practitioner and controlled via a motor, servo, or other input device.

[0041] The computing system 110 can receive data of a medical procedure performed on a subject with the robotic medical system 105. The computing system 110 can receive at least one image frame 115 from the robotic medical system 105. The computing system 110 can receive system data 120 from the robotic medical system 105. The image frame 115 can be an image of at least one anatomical structure of a patient (e.g., human, animal, or biological material) captured during a medical procedure. The system data 120 can provide an indication, a name, or an identifier that specifies the type of the medical procedure that is being performed or was performed. The system data 120 can be linked or tagged to the image frame 115 to indicate that the image frame 115 was captured for a particular type of medical procedure. The system data 120 can indicate that a medical procedure is a polypectomy, cataract surgery, caesarean section, appendectomy, or any other type of medical or surgical procedure.

[0042] The robotic medical system 105 can generate, store, or produce at least one image frame 115. At least one endoscope of the robotic medical system 105 can capture at least one image frame 115 and provide the image frames 115 to the computing system 110. The image frame 115 can be a part of a video captured by the endoscope of the robotic medical system 105. The image frame 115 can include images, pictures, or pixels representing at least one anatomical structure of a subject or patient. The robotic medical system 105 can transmit the image frame 115 to the computing system 110. The computing system 110 can receive the image frame 115 from the robotic medical system 105. The robotic medical system 105 can tag each image frame 115, or a set of image frames 115, with the system data 120 to indicate what type of medical procedure the image frames 115 were captured for.

[0043] The image frame 115 or the system data 120 can be streamed to the computing system 110 during the medical procedure. For example, while the robotic medical system 105 performs the medical procedure, the robotic medical system 105 can stream the image frame 115 or the system data 120 to the computing system 110. For example, the robotic medical system 105 can transmit the image frame 115 or the system data 120 in real-time, or as the image frame 115 or the system data 120 is generated, captured, or stored. The robotic medical system 105 can provide the image frame 115 or the system data 120 intraoperatively, e.g., during a medical procedure.

[0044] The computing system 110 can implement a machine learning based model 125. The computing system 110 can execute the one or more models 125 trained by machine learning to detect anatomical structures 145. The anatomical structures 145 can be organs (e.g., a brain, a liver, a pancreas, a stomach, an intestine), tissue (e.g., muscle tissue, connective tissue, nervous tissue), or bones (e.g., a rib, a tibia, or a skull). In some implementation, the computing system 110 can simultaneously identify multiple different anatomical structures 145 within a single image frame 115. In some implementations, even if multiple anatomical structures are included within the single image frame, the computing system 110 can execute the model 125 to identify a single anatomical structure of interest, such as the anatomical structure that the robotic medical system 105 is operating on, an anatomical structure that is centered within the image frame 115, or an anatomical structure that is the focal point of the image frame 115.

[0045] The computing system 110 can execute the one or more models 125 trained by machine learning to detect states of the anatomical structures 145, or detect parts of the anatomical structures 145. Parts of an anatomical structure 145 can be sections, segments, or portions of an image that pertain to a particular state or status. The model 125 can be a multiple level segmentation model 125. For example, the computing system 110 can execute a first branch 135 including at least one first model to identify the anatomical structure 145. The computing system 110 can execute a second branch 140 including at least one second model to identify the anatomical structure 145. The second branch 140 can detect the state of the anatomical structure 145 using the anatomical structure 145 identified by the first branch 135.

[0046] The model 125 can be or include at least one neural network, such as a transformer network, an embedding network, an encoding network, a convolutional neural network, a feed forward neural network, or any other neural network topology. The model 125 can include multiple branches. The model 125 can include a first branch 135 and a second branch 140. The branches 135 and 140 can each include at least one model or multiple models. The first branch 135 can be a first portion of the model 125, a first model, or a first set of models. The second branch 140 can be a second portion of the model 125, a second model, or a second set of models. The computing system 110 can identify the anatomical structure 145 or the state of the anatomical structure 145 by executing the model 125 based on data. The data can be information, images, videos, data packages, time series information, data sets, or data structures. For example, the computing system 110 can identify the anatomical structure 145 and the state of the anatomical structure based on the image frame 115 or the system data 120.

[0047] The computing system 110 can include a machine learning engine 130. The machine learning engine 130 can train the model 125. The machine learning engine 130 can train the model 125 to learn to recognize an anatomy of interest. The model 125 can include at least one base branch or base level 135 to recognize the anatomy. The model 125 can further segment states of the anatomical structure into parts or states (e.g., bums, bleeding, or other processed parts within the anatomy). The model 125 can include at least one part level or state branch 140 to segment the states of the anatomical structure. The anatomical structure 145 detected by the branch 135 can be encoded as a feature used as context to assist the second branch 140 to segment a part or state of the anatomical structure 145. This can assist the accuracy of the state or part detection by the second branch 140 and reduce training time, reduce processing or memory resources for training, or reduce the size of the training data 150. The branches 135 and 140 can be trained together by the machine learning engine 130. Because multiple branches of the model 125 can be trained together, this can reduce the total amount of training data 150 and processing resources (e.g., processors, memory, or power) used by the machine learning engine 130 to segment an anatomical structure or anatomical state. Furthermore, the computing system 110 can use the system data 120 such as procedure type or step label as an input to the model 125. The system data 120 can be a time series that can be encoded as an embedded feature which is associated or linked with the visual features of the image frame 115. The embedded features generated from the system data 120 can provide a latent label to narrow down and accelerate learning to detect the state.

[0048] The computing system 110 can identify, with at least one model 125 trained with machine learning, an anatomical structure 145. The computing system 110 can store at least one model 125. The computing system 110 can retrieve the model 125 from storage. The computing system 110 can execute the retrieved model 125. The computing system 110 can execute at least one model 125 on the image frame 115 or the system data 120 to identify the anatomical structure 145. The computing system 110 can execute one or more models 125 on the frame 115 of a video of a medical procedure.

[0049] The computing system 110 can execute the first branch 135 using the image frame 115 and the system data 120 to determine the anatomical structure 145. The computing system 110 can execute the model 125 to encode or embed the image frame 115 or the system data 120. The computing system 110 can execute the first branch 135 on an encoding of the frame 115 and an embedding of the system data 120 to identify the anatomical structure 145. Execution of the first branch 135 can generate the anatomical structure detection 145. The computing system 110 can generate a first mask 175 of the image frame 115 for the anatomical structure 145. The first branch 135 can output a first mask 175. The first mask 175 can identify pixels in the image frame 115 corresponding to the anatomical structure 145. The first mask 175 can identify pixels in the image frame 115 that do not correspond to the anatomical structure 145. For example, the first mask 175 can include a label, identifier, or number indicating that a pixel is associated with the anatomical structure 145, and a second label, identifier, or number indicating that a pixel is not associated with the anatomical structure 145. The first branch 135 can generate the first mask 175 based on the detection or identification of the anatomical structure 145.

[0050] The computing system 110 can detect a state or part of the anatomical structure 145 based on the identified anatomical structure 145. The computing system 110 can execute the model 125 based on the identified anatomical structure 145. For example, a state or part that the model 125 detects may be specific to the anatomical structure 145. For example, if the state or part is burned tissue, the burned tissue may appear different on skin versus on an internal organ, such as a liver. In this regard, the state or part detected by the model 125 can be based at least in part on the anatomical structure 145. The computing system 110 can execute the second branch 140 on the identified anatomical structure 145 to identify the part or state of the anatomical structure 145. For example, the identified anatomical structure 145 can be an input to the second branch 140. The computing system 110 can execute the second branch 140 on the frame 115 (e.g., an encoding of the frame 115), the system data 120 (e.g., an embedding of the system data 120), and the identified anatomical structure 145 (e.g., a hidden or latent state of the model 125 representing the anatomical structure 145) to identify the state or part of the anatomical structure 145.

[0051] The state of the anatomical structure 145 determined by the second branch 140 can be a native state. The native state can be a state of the anatomical structure 145 before a medical procedure is performed on the anatomical structure 145 or near the anatomical structure 145. The native state could be inflammation coloration, perfusion, or malformations. The state of the anatomical structure 145 determined by the second branch 140 can be an altered state. The altered state of the anatomical structure 145 can be a state of the anatomical structure after a medical procedure is performed on the anatomical structure 145 or near the anatomical structure 145. The altered state can be removal, transection, exposure, stretching, or deformation. The state of the anatomical structure 145 determined by the second branch 140 can be a reconstituted state. For example, the reconstituted state can be closure, grafting, or anastomosis. The state of the anatomical structure 145 determined by the second branch 140 can be a damaged state. The damaged state can include tears, burns, bruises, or cuts. The state of the anatomical structure 145 determined by the second branch 140 can be a final state. For example, the final state could be a state of a final output or last frame 115 of the medical procedure that includes the anatomical structure 145. The final state can provide an indication of the state of the anatomical structure 145 or the patient at an end or conclusion of the medical procedure. The states can be Boolean states, e.g., indicating the presence or absence of a condition or state. The states can be variables or degrees of a condition or state on a scale or in a range.

[0052] In some implementations, states can be broken down or combined into other states. Furthermore, as new branches or new data become available to the computing system 110, the model 125 can be updated to produce new or different states to provide greater detail and definition of the anatomical structure 145. The nature of some of the states can be complex, with some states being interlinked and dependent on one another or alternatively independent of one another. To account for this in the output, each category and subcategory of states can be measured independently and can be quantified as a percentage of relation to the base branch 135.

[0053] The computing system 110 can generate a second mask 180 of the image frame 115 for the state or part using the detected state or part. The second branch 140 can generate a second mask 180. The second mask 180 can identify the part or state of the anatomical structure 145. The second mask 180 can identify pixels in the image frame 115 corresponding to the part or state of the anatomical structure 145. The second mask 180 can identify pixels in the image frame 115 that do not correspond to the part or state of the anatomical structure 145. For example, the second mask 180 can include a label, identifier, or number indicating that a pixel is associated with the state of the anatomical structure 145, and a second label, identifier, or number indicating that a pixel is not associated with the state of the anatomical structure 145. The second branch 140 can generate the second mask 180 based on the detection or identification of the anatomical structure 145 or the identified part or state of the anatomical structure 145.

[0054] In some implementations, the computing system 110 can determine states with a variety of levels of granularity. In some implementations, the user via the client device 199 can provide an input identifying a level of granularity for the states determined by the second branch 140. The level of granularity can be a Boolean indication, such as true or false. The level of granularity can be a value in a range, e.g., a percentage value or a value between 0- 10. The model 125 can adjust the granularity of the state based on the user input. For example, for a Boolean level of granularity, the state could indicate that the anatomical structure 145 is burned or not burned. The level can be a continuous level, such as degree of bruising.

[0055] In some implementations, the model 125 can include more than two branches. In some implementations, the model 125 can be adaptable, and new branches can be added to the model 125 over time. The branches can operate in a hierarchy, where determinations, detections, or identifications of one branch are used by another branch to produce more determinations, detections, or identifications. For example, a base branch (e.g., the first branch 135) can detect the anatomical structure 145, while higher level branches that operate on the base branch or on each other can detect information about the anatomical structure 145, or about the states or parts of the anatomical structure 145. For example, as new data or new data streams become available to the computing system 110, the computing system 110 can load or train a new branch to execute on the new data stream.

[0056] For example, if the computing system 110 is coupled with a robotic medical system 105 that produces hyperspectral data, or a new hyperspectral data sensor is installed in an existing robotic medical system 105, the new hyperspectral data can be provided to a new branch of the model 125. The computing system 110 and the architecture of the model 125 can support new products, robotic medical systems 105, or updates to the robotic medical systems 105. For example, hyperspectral capabilities can be added to the robotic medical system 105, and the computing system 110 can be updated to utilize the hyperspectral capabilities.

[0057] The computing system 110 can be updated with a software update, or an update to the model 125 adding a new branch to allow for real-time or near-real time assessment and tracking during a surgery or medical procedure with new data sources, as well as allowing the post-operative review of new imaging technologies through artificial intelligence suite products. As new or future imaging techniques (e.g., perfusion imagery, white light firefly, and other advanced imaging solutions) are implemented in the robotic medical system 105, the model 125 can be updated to operate on the new data to allow for the analysis of more anatomical states, or for another method of analyzing an anatomical state. The computing system 110 can provide identified anatomical structures 145 and states of anatomical structures 145 during a surgical procedure, and can be applied to intra-operative or postoperative surgical feeds. This can provide a sustainable system that can advance with changes in available imaging technologies. [0058] For example, the computing system 110 can receive hyperspectral data of the image frame 115. The hyperspectral data can correspond to the image frame 115. The computing system 110 can execute a first branch 135 of the one or more models 125 on the image frame 115 and the system data 120 including the identification of the medical procedure to identify the anatomical structure 145. The computing system 110 can execute the second branch 140 on the image frame 115, the system data 120 including the identification of the medical procedure, and the identified anatomical structure 145 to detect a part or state of the anatomical structure. The computing system 110 can execute a third branch of the one or more models 125 based on the hyperspectral data, and at least one of the identified anatomical structure 145 or the state or part of the anatomical structure 145. The third branch can detect or identify an oxygen saturation of the anatomical structure 145. The third branch can generate a mask that indicates levels of oxygen saturation in the anatomical structure 145.

[0059] The model 125 can include additional branches that build directly on the base or first branch 135. The model can include at least one additional branch that builds on the second branch 140. The additional branches can operate on additional input data, such as hyperspectral data. Furthermore, the model 125 can include an additional branch to perform postprocessing. For example, a post processing branch can determine color gradients across a segmented area of the image frame 115 identified as the anatomical structure 145. The model 125 can include an additional branch to perform direct processing of a segmentation mask (e.g., the first mask 175 or the second mask 180). The direct processing branch can detect, based on a segmentation of the anatomical structure 145, whether the anatomy has been transected or cut.

[0060] The computing system 110 can include at least one machine learning engine 130. The machine learning engine 130 can train the model 125. For example, the machine learning engine 130 can train the model 125 using training data 150. The machine learning engine 130 can train the model 125 based on the training data 150. The machine learning engine 130 can include at least one of losses 155-170. The machine learning engine 130 can generate, calculate, or determine a first branch loss 155 that indicates a loss of the first branch 135. The machine learning engine 130 can train the first branch 135 to minimize, reduce, or lower the first branch loss 155. The machine learning engine 130 can generate, calculate, or determine a second branch loss 160 that indicates a loss of the second branch 140. The machine learning engine 130 can train the second branch 140 to minimize, reduce, or lower the second branch loss 160.

[0061] The machine learning engine 130 can determine a level of overlap between the first mask 175 and the second mask 180. The machine learning engine 130 can train the one or more models 125 to maximize the level of overlap. For example, the computing system 110 can determine a combined loss 170. The machine learning engine 130 can improve model performance with the combined loss 170 to maintain the consistence of the segmentation results. The combined loss 170 can be a loss that takes into account the loss of both the first branch 135 and the second branch 140, e.g., multiple branches together. The machine learning engine 130 can determine the combined loss 170 based on an overlap or intersection between an area of the image frame 115 identified as the anatomical structure 145 and an area of the image frame 115 identified as a part or state of the anatomical structure 145. For example, the computing system 110 can indicate a level of overlap between the first mask 175 and the second mask 180. The machine learning engine 130 can minimize or reduce the combined loss 170 to maximize or increase an overlap between the first mask 175 and the second mask 180. For example, because the second mask 180 can represent the part or state of the anatomical structure represented by the first mask 175, the second mask 180 should be fully included or contained within the first mask 175. More specifically, the first mask 175 should completely encapsulate the second mask 180.

[0062] The machine learning engine 130 can concurrently, simultaneously, or contemporaneously train multiple models, branches, or levels of the model 125. For example, the machine learning engine 130 can simultaneously train the first branch 135 and the second branch 140. The machine learning engine 130 can concurrently train the first branch 135 and the second branch 140 with at least one of the first branch loss 155, the second branch loss 160, and the combined loss 170. For example, the machine learning engine 130 can concurrently train the first branch 135 and the second branch 140 with the first loss 155 and the second loss 160. For example, the machine learning engine 130 can concurrently train the first branch 135 and the second branch 140 with the first loss 155, the second loss 160, and the combined loss 170. For example, the training can generate, identify, or determine parameters or configurations of the model 125 that minimizes or reduces the losses 155-170. The machine learning engine 130 can perform backpropagation to train the model 125. For example, the machine learning engine 130 can execute a machine learning algorithm, such as gradient descent of the losses 155-170 or stochastic gradient descent of the losses 155-170 with respect to parameters of the model 125. The machine learning algorithm can implement second order gradient descent, newton method, conjugate gradient, quasi-newton method, or Levenberg-Marquardt algorithm to train the model 125.

[0063] The machine learning engine 130 can train the model 125 with the training data 150. The training data 150 can include multiple image frames 115, corresponding system data 120, and corresponding masks. For example, for each image frame 115, the training data 150 can include system data 120 that indicates the type of the medical procedure that the image frame 115 was captured for. Furthermore, for each image frame 115, the training data 150 can identify the first mask 175 identifying an anatomical structure 145 depicted in the image frame 115. Furthermore, for each image frame 115, the training data 150 can identify a second mask 180 identifying a part or state of the anatomical structure 145 depicted in the image frame 115.

[0064] The computing system 110 can include at least one interface manager 190. The interface manager 190 can generate at least one graphical user interface 197. The interface manager 190 can generate data to cause at least one user or client device 199 to display the graphical user interface 197. The graphical user interface 197 can be displayed on at least one display of the client device 199. The client device 199 can be a smartphone, a laptop computer, a desktop computer, a console, a tablet, or any other computing system or device. The client device 199 can be integrated with the computing system 110, or can be a separate device or system. The client device 199 can be a mobile device or a stationary device. The client device 199 can be communicably coupled with the computing system 110. The client device 199 can communicate with the computing system 110 via at least one network, such as the Internet, a large area network (LAN), a wide area network (WAN), or a Wi-Fi network.

[0065] The interface manager 190 or the computing system 110 can generate or provide an indication of performance of the robotic medical system 105 or the medical procedure. For example, the interface manager 190 can provide, based at least in part on the state of the anatomical structure 145, an indication of the performance of the medical procedure. For example, the performance of the medical procedure can be an indication of the state or a characterization of the state determined by the model 125. For example, if the state is bleeding, the interface manager 190 can cause the graphical user interface 197 to display an amount or level of bleeding of the anatomical structure 145. [0066] The interface manager 190 can generate at least one metric that indicates the performance of the medical procedure. The interface manager 190 can generate the metric based at least in part on the masked frame 115, the first mask 175, the second mask 180, the detected anatomical structure 145, or the detected state of the anatomical structure 145. The metric can quantify the state of the anatomical structure 145. The metric can be an objective measurement determined via tracking a state over time and determining changes to the state. The metric can be determined by analyzing a change in patterns, such as from optical flow mapping, e.g., assigning directionality and magnitudes to pixel changes. The metric can be determined from pixel area ratios, e.g., an area associated with a state versus an entire visible area. The metric can be paired with kinematics data, such as energy data of an energy application. The metrics can allow for incorporation of additional image data, e.g., hyperspectral data outputs of the robotic medical system 105.

[0067] The metrics or performance indicators generated by the computing system 110 can be used for research or evaluation. For example, the metrics or performance indicators can be used to improve clinical outcomes, and can allow for improved outcome assessments or enabling new medical practitioner bases or groups to use the metrics or performance indicators to care for or treat patients, e.g., pathologists, post-operative care teams. The metrics or performance indicators can provide information on surgical skill, e.g., the metrics can be linked to results or outcomes of a medical procedure. The metrics linked to the clinical results can reflect the skill of the surgeon to handle different situations or types of surgeries. The metrics can anatomically illustrate how different surgeon skills effect an anatomical state. Recorded outputs can be organized such that they can be added to postoperative reporting to explain the condition of the patient following the surgery.

[0068] In some implementations, the computing system 110 can track a state determined by the model 125 of the anatomical structure 145. For example, the computing system 110 can record the state of the anatomical structure 145 at multiple points in time, e.g., for multiple different image frames 115. For example, the computing system 110 can detect, with the one or more models 125 and based on the identified anatomical structure 145, the state of the anatomical structure at a plurality of points in time using multiple frames 115 of a video of the medical procedure. The computing system 110 can track the state throughout the medical procedure. For example, the computing system 110 can generate a trend or series of values for a state of the anatomical structure 145 over time, e.g., a first level of the state at a first time, a second level of the state at a second time, and a third level of the state at a third time. For example, the changes in state can represent that bleeding has started, increased to a particular amount, and then subsided. This can provide a computing system or user with an understanding of how the state originates, changes, is mitigated, resolves, or is left unresolved during the medical procedure. The interface manager 190 can generate an indication of the performance of the medical procedure using the trend.

[0069] For example, a trend that indicates that a bleeding level of an anatomical structure 145 has decreased over time can indicate that the anatomical structure 145 is healing. A trend that indicates that a bleeding level of an anatomical structure 145 has increased over time can indicate that that the anatomical structure 145 is deteriorating. The computing system 110 can generate the performance indication of the medical procedure with the changes to the state of the anatomical structure 145 over time. The interface manager 190 can display an indication of a rate of change of the state, or can indicate whether the state has increased or decreased over a time window. The computing system 110 can generate a metric indicative of the trend of rate of change of a state of the anatomical structure. The rate of change can refer to an amount of change, deterioration, degradation, or other type of change of the anatomical structure over a period of time. For example, a positive rate of change of state can correspond to healing of the anatomical structure, whereas a negative rate of change of state can correspond to a deterioration, injury, damage, or degradation of the anatomical structure.

[0070] In some implementations, the model 125 can be executed in real-time or intraoperatively to determine the states of the anatomical structure 145 during the medical procedure. The interface manager 190 can generate an alarm, alert, or message in response to a detection of a specific state or a detection of a state associated with a level that satisfies a threshold (e.g., a level that is greater than a threshold or a level that is less than a threshold). The interface manager 190 can generate data to cause the graphical user interface 197 to display the alarm, alert, or message. The alarm can include an indication of the anatomical structure 145, an indication of the state of the anatomical structure 145, an indication of a level of the state, or an indication of a rate of change of the level of the state.

[0071] The computing system 110 (e.g., via interface manager 190) can prevent damage of the anatomical structure, or otherwise mitigate or reduce the amount or likelihood of an undesired change of state in the anatomical structure by providing the alert. For example, the computing system 110 can improve safety by providing an alert based on tracking the state of the anatomical structure (e.g., track the amount of bleeding or amount of burn over time), thereby preventing a damaged state of the anatomical structure based on analyzing the changes in the state or based on a sequence or trend in states of the anatomical structure. The computing system 110 can automatically provide an alert based on an amount of the damaged state, such as the amount of bleeding or burn exceeding a threshold amount (e.g., percentage, absolute value, ratio relative to size on anatomical structure). In some cases, upon exceeding a threshold for a burn state (e.g., an amount of burn) of the anatomical structure, the computing system 110 can automatically reduce an amount of energy provided by an instrument or tool of the robotic medical system 105 to reduce, mitigate, or stop burning the anatomical structure, thereby slowing or stopping the rate of change of the anatomical structure. In some cases, the computing system 110 detect a slow bleed that is not stopping, and provide an alert or automatically control an instrument of robotic medical system 105 to reduce the amount of bleeding or provide feedback or indication that the bleeding is not stopping. The slow bleed may be indicative of an incomplete anastomosis, and the computing system 110 can automatically perform or cause to be performed an action to complete the anastomosis or cross-connection between anatomical structures or components of the subject.

[0072] The interface manager 190 can generate a video based on a masked frame 115. For example, the interface manager 190 can generate a video of a medical procedure. The video can include various masks 175 or 180 to indicate the anatomical structure 145 identified by the one or more models 125 and the state of the anatomical structure 145 detected by the one or more models 125. For example, the masked frame 115 can include the first mask 175 and the second mask 180. The interface manager 190 can combine multiple masked frames 115 of a medical procedure into one video of the medical procedure. The interface manager 190 can store a video of the masked frames 115 as a video file (e.g., WMV, AVI, FLV, or any other type of video file.). The interface manager 190 can generate data to cause the video that segments that anatomical structure 145 and the state of the anatomical structure 145 to be displayed in the graphical user interface 197. The interface manager 190 can overlay state information on the video. For example, the interface manager 190 can overlay an indication of a damaged state on the anatomical structure, for example, the indication can be a symbol, color, metric, numeric value, or other indication that corresponds to an amount of the damage state (e.g., amount of burn). A user via the client device 199 can cause the video to play, stop, pause, review, scan forward, scan backwards, or move to a specific state.

[0073] In some implementations, the interface manager 190 can navigate a medical procedure video with the identified anatomical structures 145 or identified anatomical states. For example, the interface manager 190 can provide state based navigation, where a user can navigate a medical procedure video based on state, such as state criticality. A user can select a particular state or request to view a most critical state or most critical anatomical structure via the client device 199, and the interface manager 190 can navigate the video to frames associated with the selected state, the most critical state, or the most critical anatomical structure. For example, the graphical user interface 197 can include a menu, selectable element, slider, or drop down that allows a user to select a specific type of anatomical structure 145 or specific state of an anatomical structure 145. The interface manager 190 can display a portion or navigate to a particular image frame 115 of the video based on the selection of the user. For example, the interface manager 190 can navigate to an image frame 115 including the selected anatomical structure 145 or the selected anatomical structure 145 including a specific selected state.

[0074] In some implementations, the interface manager 190 can generate at least one timeline for a video of medical procedure. The timeline can be interactive, and allow a user to view or navigate to a specific section or frame of the video. The timeline can identify anatomical structures. For example, the timeline can indicate names for anatomical structures displayed in different portions of the video or can display a color to identify the particular anatomical structure. The timeline can indicate a start time when the anatomical structure becomes visible in the video, and an end time, when the anatomical structure stopped being visible. The timeline can indicate multiple segments when an anatomical structure was visible, e.g., during two or more different segments of time of the video, a particular anatomical structure was visible. The timeline can include similar indications of states, e.g., when states started, ended, or changed in level. For example, the timeline can indicate when an anatomical structure 145 began bleeding, the level of the bleeding over time, and when the bleeding stopped. The interface manager 190 can include a first timeline to display when different anatomical structures were visible in the video, and another timeline to display when the states of the anatomical structures started or stopped. The interface manager 190 can populate a timeline with major transitions in state. For example, the timeline can be marked with major state transitions, when a bum occurred, when an anatomical structure was cut, when the anatomical structure was sewn up, when an anatomical structure began bleeding, when the bleeding stopped, etc. The timeline can include elements such as rectangular blocks, arrows, or segments spanning a time length that include different color gradients. The color gradient can indicate the state or level to provide a visual representation of state changing over time.

[0075] The computing system 110 can utilize segmentations of the anatomical structure 145 or the states of the anatomical structure to search through collections or sets of videos or a database of videos of medical procedures. The computing system 110 can navigate through recorded surgical or medical procedure videos via targeting the changes in state for key anatomical structures in a procedure, or filtering libraries to find certain state values. The computing system 110 can implement software programs or artificial intelligence programs to group data for research or review with the determinations of the model 125.

[0076] For example, the database can store videos that are segmented according to anatomical structure 145 and the state of the anatomical structure. The computing system 110 can use the masks 175 or 180 or the type of anatomical structure 145 or the type of state identified for the anatomical structure 145 to search through the database of videos. The computing system 110 can search through multiple videos of medical procedures with the identified anatomical structure 145 and the state of the anatomical structure 145 to identify a subset of the plurality of videos. For example, the computing system 110 can identify which videos include an anatomical structure of the same type identified by the model 125. Then, the computing system 110 can determine whether the identified anatomical structure includes the state identified by the model 125. If the video includes both the type of anatomical structure and state detected by the model 125, the computing system 110 can generate a result set including the video of the same anatomical structure and state. The interface manager 190 can cause videos of the result set to be displayed to a user via the graphical user interface 197 on the client device 199. The user can play or interact with various videos of the return video set. Because the model 125 includes branches that are modular, the exact architecture of models of each branch for segmentation can vary.

[0077] The computing system 110 can include at least one controller 195. The controller 195 can control the robotic medical system 105 using the detection of the anatomical structure 145 by the model 125 and the detection of the state of the anatomical structure by the model 125. The controller 195 can implement a rule based control algorithm, a plant based control algorithm, or any other control algorithm that can move or manipulate the instruments or endoscopes of the robotic medical system 105. The controller 195 can detect that an instrument maneuver accidentally caused cutting or burning of an anatomical structure based on the state detections and anatomical structure detections of the model 125. The controller 195 can manipulate the instrument to avoid causing further cutting or burning. The controller 195 can automatically control the instruments based on the state or anatomical structure detected by the model 125 to perform a medical procedure, e.g., the state detections can guide the controller 195 to determine whether the robotic medical system 105 has properly transected an organ or properly sewn an organ back up. The detected states and anatomical structures can be inputs or feedback to control algorithms for a fully-autonomous or semi-autonomous robotic medical system 105. In some implementations, the controller 195 is located in the computing system 110, while in other implementations, the controller 195 is part of the robotic medical system 105. In some implementations, components of the controller 195 are disposed on both the computing system 110 and the robotic medical system 105.

[0078] Referring now to FIG. 2, among others, an example model 125 to identify an anatomical structure and detect a state of the anatomical structure is shown. In FIG. 2, the model 125 recognizes a liver and a burn state for the liver. However, the model 125 can detect a variety of different types of anatomical structures and different states or parts for the anatomical structure. The model 125 can be a pipeline. The model 125 can be modularized into separate steps, pieces, components, executables, functions, segments, or equations. The model 125 can receive the image frame 115 and the system data 120 as inputs. The model 125 can discretize a surgical video as sequential framed images. The system data 120, which can provide a type or step label, can be synchronized with the image 115. In this regard, each image 115 can have a corresponding type or label.

[0079] The model 125 can include at least one component to embed or encode the inputs. For example, the model 125 can include at least one image encoder 205. The image encoder 205 can receive the frame 115 of a video of the medical procedure. The image encoder 205 can generate an encoding of the frame 115. The image encoder 205 can encode or embed the image 115. The image encoder 205 can translate a high dimensional vector or matrix into a low dimensional vector or matrix. The encoding can be a numerical vector or matrix representation of the image 115. The image encoder 205 can generate a hidden internal state of the model 125 with the image 115. The image encoder 205 can be a machine learning core or neural network that adopts image frames 115. The image encoder 205 can be a vision transformer, in some implementations. The image encoder 205 can receive the image 115 as an input and convert the image 115 into a compact feature vector representing the input image 115. The image encoder 205 can be a network that operates on spatial and temporal correlations. The image encoder 205 can disentangle and mine out the representative semantics of the images 115 by applying an attention mechanism. The attention mechanism can be a selective weighting technology. The attention mechanism can apply different weights to emphasize different parts of the image 115, which results in finding out the best and compressed representation of the image 115 that fulfilling the tasks or goals of the model 125. The attention mechanism can be trained via machine learning based on the tasks or goals of the model 125. The attention mechanism can apply weighting on spatial and temporal dimensions if needed.

[0080] The system data embedding 210 can receive an identification of the medical procedure from the robotic medical system 105, e.g., receive the system data 120. The system data embedding 210 can embed the system data 120. The system data embedding 210 can generate an embedding with the system data 120. The system data embedding 210 can convert the system data 120 into a numerical vector. The system data 120 can be time-series data that includes indications of the type of medical procedure being performed and a corresponding time stamps.

[0081] The model 125 can include a module, software component, or operation 215. The operation 215 can combine the encoding of the image 115 with the embedding of the system data 120. The operation 215 can combine the encoding of the image 115 with the embedding of the system data 120. The operation 215 can concatenate the numerical vector produced by the system data embedding 210 from the system data 120 with the visual features encoded from the image encoder 205 from the image 115. The combined encoding of the image 115 and the embedding of the system data 120 can be provided to a part decoder 220 of the part branch 140. The combined encoding of the image 115 and the embedding of the system data 120 can be provided to an anatomy decoder 225 of the base branch 135.

[0082] The base branch 135 can include at least one anatomy decoder 225. The anatomy decoder 225 can be a neural network. The anatomy decoder 225 can transform the hidden internal states produced by the image encoder 205 and the system data embedding 210 into F_a, a feature, identification, segmentation, or detection 145. For example, the anatomy decoder 225 can output a feature indicating or representing segmentation 145 of an anatomical structure. The anatomy decoder 225 can provide anatomical segmentation with additional transformation layers. The anatomy decoder 225 can output the anatomical structure detection 145. The anatomical structure detection 145 can uniquely identify a type of anatomical structure with a tag, an indicator, a label, or a numeric value. The base branch 135 can provide the anatomical structure detection 145 to the state or part branch 140. For example, the output anatomical structure detection 145 of the encoded features by the anatomy decoder 225 can be fed into the state or part branch 140 to formulate the multiple level hierarchy model 125. Because the anatomical segmentation feature 145 is provided to the state or part branch 140, model learning to detect the part or state can also be based on the anatomical structure detection 145.

[0083] The state branch 140 can include at least one state or part decoder 220. The decoder 220 can decode the combined image encoding and system data embedding into F_p, a state or part feature, identification, segmentation, or detection 230. For example, the state or part decoder 220 can output a feature, indication, representation, or segmentation of a state or part 230 of the anatomical structure detected by the anatomy decoder 225. The detection 230 can be a label, a name, a numeric value, or other identifying piece of data that identifies the part or state of the detected anatomical structure 145. The state or part decoder 220 can generate a segmentation or mask 180 for the state (e.g., burned area or bleeding area) from the integrated or learned feature representations of the image 115 and the system data 120.

[0084] The model 125 can include at least one model 240. The model 125 can include at least one model 245. The models 240 and 245 can be feedforward neural networks or convolutional neural networks that produce an image 115 or a segmentation (e.g., the mask 180 or 175). The computing system 110 can execute the model 240 to generate a frame 115 including the first mask 180 or generate the first mask 180. The first mask 180 can include an indication or segmentation of the image 115 that identifies a portion of the image 115 corresponding to the state detection 230. The model 240 can execute based on the detection 230 and the detection 145. The detections 230 and 145 can be combined by a module, software component, or operation 235. The operation 235 can combine or concatenate the detections 230 and 145. The operation 235 can provide the combined detections 230 and 145 to an input of the model 240. The model 240 can produce the image 115 including the mask 180 based on the combined detections 230 and 145. The computing system 110 can execute the model 245 to generate a frame 115 including the second mask 180 or generate the second mask 180. The second mask 180 can include an indication or segmentation of the image 115 that identifies the anatomical structure. The model 245 can execute based on the detection 145.

[0085] The machine learning engine 130 can train the model 125 via training data 150. The machine learning engine 130 can compute, determine, or generate a loss L_p (loss 160), a L_ap (loss 170), or a loss L_a (loss 155). The loss 160 can be a loss for the state or part branch 140. The loss 155 can be a loss for the base branch 135. The computing system 110 can execute a loss function to generate the loss 160 for the state or part branch 140. Each branch can have its own loss function that measures a discrepancy between a predicted mask and a ground truth mask. The computing system 110 can execute a loss function to generate the loss 155 for the base branch 135. The loss function can generate a loss 155 based on a discrepancy between the mask 175 for the anatomical structure and a ground truth mask for the anatomical structure indicated by the training data 150. The computing system 110 can execute a loss function to generate the loss 160. The loss function can generate a loss 160 based on a discrepancy between the mask 180 for the state or part of the anatomical structure and a ground truth mask for the state or part of the anatomical structure indicated by the training data 150.

[0086] The machine learning engine 130 can include a loss function that maintains consistency between the masks 175 and 180. The machine learning engine 130 can execute the loss function to determine consistency between the mask 175 and the mask 180. The consistency can be that the part mask 180 is contained (e.g., partially or fully contained) within the anatomy mask 175. The machine learning engine 130 can execute the loss function to determine a loss L_a (loss 170). The loss 170 can measure how much overlap there is between the masks 180 and 175. For example, the loss 170 can be zero or a low value if the part mask 180 is completely contained within the anatomical structure mask 175. The more of the part mask 180 that is outside the anatomical structure mask 175, the higher the loss 170 can be. The machine learning engine 130 can determine the loss 170 using the amount of the part mask 180 that is outside the anatomical structure mask 175. The machine learning engine 130 can determine the loss 170 using the amount of the part mask 180 that is within the anatomical structure mask 175. The machine learning engine 130 can determine the loss 170 using a number of pixels of the part mask 180 that are outside the section or area of pixels defined by the anatomical structure mask 175. In this regard, training with the loss 170 can maximize an overlap between the masks 180 and 175, e.g., to maximize an amount of the mask 180 that is within a larger segmented area defined by the mask 175.

[0087] The machine learning engine 130 can train the model 125 to converge. For example, the machine learning engine 130 can execute a machine learning algorithm, such as a training algorithm like back propagation, to minimize the losses 160, 170, and 155. During a training phase of the model 125, the machine learning engine 130 can feed video frames 115 with annotations of anatomical masks 175 and part or state masks 180. The training data 150 can store the masks 175 and 180 for various frames 115 of medical or surgical procedures. Because the losses of the branches 140 and 135 are both defined independently (i.e., losses 160 and 155) and jointly (loss 170), the training of the branches 140 and 135 can be performed simultaneously. The training of all the models included within the model 125 can be performed concurrently, simultaneously, or contemporaneously.

[0088] Referring now to FIG. 3, among others, a method 300 of identifying an anatomical structure and detecting a state of the anatomical structure is shown. The computing system 110, the robotic medical system 105, or the client device 199 can perform at least a portion of the method 300. Any type of computing system, data processing system, processing system, or processing circuit can execute instructions to perform at least a portion of the method 300. The method 300 can be executed as instructions or performed as a hardware circuit or logic circuit. The method 300 can include an ACT 305 of receiving data of a medical procedure. The method 300 can include an ACT 310 of identifying an anatomical structure. The method 300 can include an ACT 315 of detecting a state. The method 300 can include an ACT 320 of providing a performance indication.

[0089] At ACT 305, the method 300 can include receiving, by the computing system 110, data of a medical procedure. The method 300 can include receiving, by the computing system 110, the data from the robotic medical system 105. The method 300 can include receiving, by the computing system 110, one, multiple, or a sequence of image frames 115. The method 300 can include receiving, by the computing system 110, system data 120. The system data 120 and the image frames 115 can correspond to one another, or be linked via timestamps. For example, for a particular image frame 115, the system data 120 can indicate the type of medical procedure, phase of the medical procedure, or the step of the medical procedure. In some implementations, the method 300 can include receiving, by the computing system 110, advanced imaging data, such as hyperspectral image data.

[0090] At ACT 310, the method 300 can include identifying, by the computing system 110, an anatomical structure. The method 300 can include executing a model 125 to determine the anatomical structure 145. The model 125 can include at least one branch. For example, the method 300 can include executing, by the computing system 110, a first branch 135 of the model 125 to identify the anatomical structure 145. The method 300 can include executing, by the computing system 110, the first branch 135 of the model 125 to select a label, identifier, value, or tag that uniquely identifies a type of the anatomical structure 145. For example, the type of the structure could be a liver, a pancreas, a muscle, a specific muscle such as a bicep, a specific bone such as a tibia, etc.

[0091] The method 300 can include executing, by the computing system 110, an image encoder 205 to encode the frame 115. The method 300 can include executing, by the computing system 110, a system data embedding 210 to embed the system data 120. The method 300 can include combining, by the computing system 110, the encoding of the image 115 and the embedding of the system data 120, and provide the combined features to the state or part decoder 220 and the anatomy decoder 225. The method 300 can include executing, by the computing system 110, the anatomy decoder 225. The method 300 can include executing, by the computing system 110, the anatomy decoder 225 on the combined features to detect or output the anatomical structures 145. The method 300 can include executing, by the computing system 110, the model 245 to generate the mask 175.

[0092] The method 300 can include detecting, by the computing system 110, a state. The method 300 can include executing the state or part branch 140 using the detection 145 by the base branch 135 to generate the mask 180 for the state. For example, the method 300 can include decoding, by the state or part decoder 220, the encoded image 115 and the embedded system data 120 into the state or part detection 230. The method 300 can include combining, by the operator 235, the detection 230 and the detection 145. The method 300 can include generating, by the model 240, a mask 180 using the combined detection 230 and detection 145. [0093] At ACT 320, the method 300 can include providing, by the computing system 110, a performance indication. The method 300 can include generating at least one metric that indicates the performance of the medical procedure performed by an operator of robotic medical system 105. The method 300 can include generating a metric based on the detection of the anatomical structure 145 by the first branch 135 and a detection of the state of the anatomical structure by the second branch 140. The method 300 can include generating metrics that indicate a likelihood that the medical procedure will be a successful procedure. The method 300 can include generating metrics that indicate how much unnecessary bleeding, burning, or cutting occurred during the medical procedure. The method 300 can include generating metrics that indicate how well that an anatomical structure was repaired after cutting. The metrics can quantify the states determined by the model 125 or changes to the states by the model 125 over time. For example, if a bleeding state changes over time indicating that bleeding is slowing or stopping, the method 300 can include generating a metric that quantifies how the bleeding has decreased and whether the bleeding will stop.

[0094] Referring now to FIG. 4, among others, a method 400 of training a model 125 with a combined branch loss 170 is shown. The computing system 110, the robotic medical system 105, or the client device 199 can perform at least a portion of the method 400. Any type of computing system, data processing system, processing system, or processing circuit can execute instructions to perform at least a portion of the method 400. The method 400 can include at least one ACT 405. The method 400 can include at least one ACT 410 of training a model with a combined branch loss. The method 400 can include at least one ACT 415 of deploying a model.

[0095] At ACT 405, the method 400 can include receiving, by the computing system 110, a model 125 including a first branch 135 and a second branch 140. The method 400 can include receiving the model 125 from an internal component of the computing system 110, from a storage system or device of the computing system 110, or from an external system. For example, the model 125 can be developed or designed on an external system, and then sent to the computing system 110 by the external system. In some embodiments, the model 125 is developed or designed on the computing system 110. The method 400 can include storing the model 125 on at least one storage or memory device of the computing system 110. The method 400 can include receiving, by the machine learning engine 130, the model 125. The method 400 can include storing, by the machine learning engine 130, the model 125 on at least one storage device, memory device, or memory apparatus.

[0096] At ACT 410, the method 400 can include training, by the computing system 110, the model 125 with a combined branch loss. The method 400 can include determining, by the computing system 110, the combined loss 170 using determinations of the first branch 135 and the second branch 140. The method 400 can include determining, by the computing system 110, a combined loss 170 that takes into account losses in the first branch 135 and the second branch 140 simultaneously, concurrently, or contemporaneously. The method 400 can include determining, by the computing system 110, a combined loss 170. The method 400 can include executing, by the computing system 110, a loss function to determine the combined loss 170. The method 400 can include executing the loss function to determine the combined loss 170 based on an overlap between the mask 180 and the mask 175. The combined loss 170 can be higher when the number of pixels or sections of the masks 175 and 180 do not overlap. For example, the greater the mutually exclusive area or number of pixels in the mutually exclusive area, the higher the combined loss 170 can be.

[0097] The method 400 can include training, by the machine learning engine 130, the model 125 to minimize the combined loss 170. The method 400 can include training, by the machine learning engine 130, multiple branches of the model 125 simultaneously, concurrently, or contemporaneously. The machine learning engine 130 can train the multiple branches of the model 125 in the same training process, training epoch, or training segment. The machine learning engine 130 can tune, adjust, or change the values of both the first branch 135 and the second branch 140 with the combined loss 170. The machine learning engine 130 can train the branches 135 and 140 with the training data 150. For example, the machine learning engine 130 can perform back propagation or another training algorithm to train the model 125 with the combined loss 170. The machine learning engine 130 can train the model 125 with individual branch losses in addition to training with the combined loss 170. For example, the method 400 can include determining, by the machine learning engine 130, a first branch loss 155 for the first branch 135 and determining, by the machine learning engine 130, a second branch loss 160 for the second branch 140. The machine learning engine 130 can train the model 125 with the first branch loss 155 and the second branch loss 160. The machine learning engine 130 can use the branch loss 155 to train, tune, or adjust the parameters of the first branch 135 and the second branch 140. The machine learning engine 130 can use the branch loss 160 to train, tune, or adjust the parameters of the first branch 135 and the second branch 140.

[0098] Referring now to FIG. 5, among others, an example block diagram of a computing system 110 is shown. The computing system 110 can include or be used to implement a data processing system or its components. The architecture described in FIG. 5 can be used to implement the computing system 110, the robotic medical system 105, or the client device 199. The computing system 110 can include at least one bus 525 or other communication component for communicating information and at least one processor 530 or processing circuit coupled to the bus 525 for processing information. The computing system 110 can include one or more processors 530 or processing circuits coupled to the bus 525 for processing information. The computing system 110 can include at least one main memory 510, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus 525 for storing information, and instructions to be executed by the processor 530. The main memory 510 can be used for storing information during execution of instructions by the processor 530. The computing system 110 can further include at least one read only memory (ROM) 515 or other static storage device coupled to the bus 525 for storing static information and instructions for the processor 530. A storage device 520, such as a solid state device, magnetic disk or optical disk, can be coupled to the bus 525 to persistently store information and instructions.

[0099] The computing system 110 can be coupled via the bus 525 to a display 500, such as a liquid crystal display, or active matrix display. The display 500 can display information to a user. An input device 505, such as a keyboard or voice interface can be coupled to the bus 525 for communicating information and commands to the processor 530. The input device 505 can include a touch screen of the display 500. The input device 505 can include a cursor control, such as a mouse, a trackball, or cursor direction keys, for communicating direction information and command selections to the processor 530 and for controlling cursor movement on the display 500.

[00100] The processes, systems and methods described herein can be implemented by the computing system 110 in response to the processor 530 executing an arrangement of instructions contained in main memory 510. Such instructions can be read into main memory 510 from another computer-readable medium, such as the storage device 520. Execution of the arrangement of instructions contained in main memory 510 causes the computing system 110 to perform the illustrative processes described herein. One or more processors in a multiprocessing arrangement can be employed to execute the instructions contained in main memory 510. Hard-wired circuitry can be used in place of or in combination with software instructions together with the systems and methods described herein. Systems and methods described herein are not limited to any specific combination of hardware circuitry and software.

[00101] Although an example computing system has been described in FIG. 5, the subject matter including the operations described in this specification can be implemented in other types of digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.

[00102] Some of the description herein emphasizes the structural independence of the aspects of the system components or groupings of operations and responsibilities of these system components. Other groupings that execute similar overall operations are within the scope of the present application. Modules can be implemented in hardware or as computer instructions on a non-transient computer readable storage medium, and modules can be distributed across various hardware or computer based components.

[00103] The systems described above can provide multiple ones of any or each of those components and these components can be provided on either a standalone system or on multiple instantiations in a distributed system. In addition, the systems and methods described above can be provided as one or more computer-readable programs or executable instructions embodied on or in one or more articles of manufacture. The article of manufacture can be cloud storage, a hard disk, a CD-ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape. In general, the computer-readable programs can be implemented in any programming language, such as LISP, PERL, C, C++, C#, PROLOG, Python, or in any byte code language such as JAVA. The software programs or executable instructions can be stored on or in one or more articles of manufacture as object code.

[00104] Example and non-limiting module implementation elements include sensors providing any value determined herein, sensors providing any value that is a precursor to a value determined herein, datalink or network hardware including communication chips, oscillating crystals, communication links, cables, twisted pair wiring, coaxial wiring, shielded wiring, transmitters, receivers, or transceivers, logic circuits, hard-wired logic circuits, reconfigurable logic circuits in a particular non-transient state configured according to the module specification, any actuator including at least an electrical, hydraulic, or pneumatic actuator, a solenoid, an op-amp, analog control elements (springs, filters, integrators, adders, dividers, gain elements), or digital control elements.

[00105] The subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. The subject matter described in this specification can be implemented as one or more computer programs, e.g., one or more circuits of computer program instructions, encoded on one or more computer storage media for execution by, or to control the operation of, data processing apparatuses. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. While a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate components or media (e.g., multiple CDs, disks, or other storage devices including cloud storage). The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.

[00106] The terms “computing device”, “component” or “data processing apparatus” or the like encompass various apparatuses, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.

[00107] A computer program (also known as a program, software, software application, app, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program can correspond to a file in a file system. A computer program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[00108] The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatuses can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). Devices suitable for storing computer program instructions and data can include non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[00109] The subject matter described herein can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the subject matter described in this specification, or a combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).

[00110] While operations are depicted in the drawings in a particular order, such operations are not required to be performed in the particular order shown or in sequential order, and all illustrated operations are not required to be performed. Actions described herein can be performed in a different order.

[00111] Having now described some illustrative implementations, it is apparent that the foregoing is illustrative and not limiting, having been presented by way of example. In particular, although many of the examples presented herein involve specific combinations of method acts or system elements, those acts and those elements may be combined in other ways to accomplish the same objectives. ACTs, elements and features discussed in connection with one implementation are not intended to be excluded from a similar role in other implementations.

[00112] The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including” “comprising” “having” “containing” “involving” “characterized by” “characterized in that” and variations thereof herein, is meant to encompass the items listed thereafter, equivalents thereof, and additional items, as well as alternate implementations consisting of the items listed thereafter exclusively. In one implementation, the systems and methods described herein consist of one, each combination of more than one, or all of the described elements, acts, or components.

[00113] Any references to implementations or elements or acts of the systems and methods herein referred to in the singular may also embrace implementations including a plurality of these elements, and any references in plural to any implementation or element or act herein may also embrace implementations including only a single element. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements to single or plural configurations. References to any ACT or element being based on any information, act or element may include implementations where the act or element is based at least in part on any information, act, or element.

[00114] Any implementation disclosed herein may be combined with any other implementation or example, and references to “an implementation,” “some implementations,” “one implementation” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation may be included in at least one implementation or example. Such terms as used herein are not necessarily all referring to the same implementation. Any implementation may be combined with any other implementation, inclusively or exclusively, in any manner consistent with the aspects and implementations disclosed herein.

[00115] References to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms. References to at least one of a conjunctive list of terms may be construed as an inclusive OR to indicate any of a single, more than one, and all of the described terms. For example, a reference to “at least one of ‘A’ and ‘B’” can include only ‘A’, only ‘B’, as well as both ‘A’ and ‘B’. Such references used in conjunction with “comprising” or other open terminology can include additional items.

[00116] Where technical features in the drawings, detailed description or any claim are followed by reference signs, the reference signs have been included to increase the intelligibility of the drawings, detailed description, and claims. Accordingly, neither the reference signs nor their absence have any limiting effect on the scope of any claim elements.

[00117] Modifications of described elements and acts such as variations in sizes, dimensions, structures, shapes and proportions of the various elements, values of parameters, mounting arrangements, use of materials, colors, orientations can occur without materially departing from the teachings and advantages of the subject matter disclosed herein. For example, elements shown as integrally formed can be constructed of multiple parts or elements, the position of elements can be reversed or otherwise varied, and the nature or number of discrete elements or positions can be altered or varied. Other substitutions, modifications, changes and omissions can also be made in the design, operating conditions and arrangement of the disclosed elements and operations without departing from the scope of the present disclosure.

Claims

CLAIMS What is claimed is:

1. A system, comprising: one or more processors, coupled with memory, to: receive data of a medical procedure performed on a subject with a robotic medical system; identify, with one or more models trained with machine learning, an anatomical structure based on the data; detect, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure; and provide, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

2. The system of claim 1, comprising: the one or more processors to: execute the one or more models on a frame of a video of the medical procedure; generate, using the identified anatomical structure, a first mask of the frame for the anatomical structure; and generate, using the detected state, a second mask of the frame for the state.

3. The system of claim 1, comprising: the one or more processors to: execute the one or more models trained with the machine learning, the one or more models comprising: a first branch comprising at least one first model, the first branch to identify the anatomical structure; and a second branch comprising at least one second model, the second branch to detect the state of the anatomical structure using the anatomical structure identified by the first branch.

4. The system of claim 1, comprising: the one or more processors to: receive a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system; generate an encoding of the frame; generate an embedding of the identification of the medical procedure; execute a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure; and execute a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

5. The system of claim 1, comprising: the one or more processors to: generate, using the identified anatomical structure, a first mask of a frame for the anatomical structure; generate, using the detected state, a second mask of the frame for the state; determine a level of overlap between the first mask and the second mask; and train the one or more models to maximize the level of overlap.

6. The system of claim 1, comprising: the one or more processors to: determine a first loss of a first branch of the one or more models, the first branch to identify the anatomical structure; determine a second loss of a second branch of the one or more models, the second branch to detect the state of the anatomical structure; and concurrently train the first branch and the second branch with the first loss and the second loss.

7. The system of claim 1, comprising: the one or more processors to: receive a frame of a video of the medical procedure, hyperspectral data of the frame, and an identification of the medical procedure from the robotic medical system; execute a first branch of the one or more models on the frame and the identification of the medical procedure to identify the anatomical structure; execute a second branch of the one or more models on the frame, the identification of the medical procedure, and the identified anatomical structure to detect the state of the anatomical structure; and execute a third branch of the one or more models on the hyperspectral data and at least one of the identified anatomical structure or the state of the anatomical structure to detect an oxygen saturation of the anatomical structure.

8. The system of claim 1, comprising: the one or more processors to: execute a first model of the one or more models using the identified anatomical structure and the detected state to generate a first frame comprising a first mask that indicates the state; and execute a second model of the one or more models using the identified anatomical structure to generate a second frame comprising a second mask that indicates the anatomical structure.

9. The system of claim 1, comprising: the one or more processors to: detect, with the one or more models and based on the identified anatomical structure, the state of the anatomical structure at a plurality of points in time using a plurality of frames of a video of the medical procedure; track the state over time using the detected state from the plurality of frames; and generate the indication of the performance of the medical procedure using changes to the state over time.

10. The system of claim 1, comprising: the one or more processors to: search a plurality of videos of medical procedures with the identified anatomical structure and the state of the anatomical structure to identify a subset of the plurality of videos; and generate data to cause a graphical user interface to display indications of the subset of the plurality of videos.

11. The system of claim 1, comprising: the one or more processors to: receive the data during the medical procedure; generate an alert during the medical procedure using the detected state of the anatomical structure by the one or more models; and generate data to cause a graphical user interface to display the alert.

12. The system of claim 1, comprising: generate a video of the medical procedure comprising a plurality of masks to indicate the anatomical structure identified by the one or more models and the state of the anatomical structure detected by the one or more models; and generate data to cause a graphical user interface to display the video comprising the plurality of masks.

13. A method, comprising: receiving, by one or more processors, coupled with memory, data of a medical procedure performed on a subject with a robotic medical system; identifying, by the one or more processors, with one or more models trained with machine learning, an anatomical structure based on the data; detecting, by the one or more processors, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure; and providing, by the one or more processors, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

14. The method of claim 13, comprising: executing, by the one or more processors, the one or more models trained with the machine learning, the one or more models comprising: a first branch comprising at least one first model, the first branch to identify the anatomical structure; and a second branch comprising at least one second model, the second branch to detect the state of the anatomical structure using the anatomical structure identified by the first branch.

15. The method of claim 13, comprising: receiving, by the one or more processors, a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system; generating, by the one or more processors, an encoding of the frame; generating, by the one or more processors, an embedding of the identification of the medical procedure; executing, by the one or more processors, a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure; and executing, by the one or more processors, a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

16. The method of claim 13, comprising: generating, by the one or more processors, using the identified anatomical structure, a first mask of a frame for the anatomical structure; generating, by the one or more processors, using the detected state, a second mask of the frame for the state; determining, by the one or more processors, a level of overlap between the first mask and the second mask; and training, by the one or more processors, the one or more models to maximize the level of overlap.

17. The method of claim 13, comprising: determining, by the one or more processors, a first loss of a first branch of the one or more models, the first branch to identify the anatomical structure; determining, by the one or more processors, a second loss of a second branch of the one or more models, the second branch to detect the state of the anatomical structure; and concurrently training, by the one or more processors, the first branch and the second branch with the first loss and the second loss.

18. A non-transitory computer-readable medium storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to: receive data of a medical procedure performed on a subject with a robotic medical system; identify, with one or more models trained with machine learning, an anatomical structure based on the data; detect, with the one or more models and based on the identified anatomical structure, a state of the anatomical structure; and provide, based at least in part on the state of the anatomical structure, an indication of a performance of the medical procedure.

19. The non-transitory computer-readable medium of claim 18, wherein the instructions cause the one or more processors to: receive a frame of a video of the medical procedure and an identification of the medical procedure from the robotic medical system; generate an encoding of the frame; generate an embedding of the identification of the medical procedure; execute a first branch of the one or more models on the encoding of the frame and the embedding of the identification to identify the anatomical structure; and execute a second branch of the one or more models on the encoding of the frame, the embedding of the identification, and the identified anatomical structure to detect the state of the anatomical structure.

20. The non-transitory computer-readable medium of claim 18, wherein the instructions cause the one or more processors to: generate using the identified anatomical structure, a first mask of a frame for the anatomical structure; generate using the detected state, a second mask of the frame for the state; determine a level of overlap between the first mask and the second mask; and train the one or more models to maximize the level of overlap.