WO2025199083A1

WO2025199083A1 - Extended reality contextual switching for robotic medical systems

Info

Publication number: WO2025199083A1
Application number: PCT/US2025/020341
Authority: WO
Inventors: Joseph NATEL; Govinda PAYYAVULA
Original assignee: Intuitive Surgical Operations Inc
Current assignee: Intuitive Surgical Operations Inc
Priority date: 2024-03-19
Filing date: 2025-03-18
Publication date: 2025-09-25
Anticipated expiration: 2026-09-19

Abstract

Technical solutions provide extended reality based capabilities for just-in-time context-based assistance to robotic medical system users. A processor can receive a data stream of a medical session with a robotic medical system. The data stream can be captured by sensors in a medical environment. The processor can determine, based on the data stream and via a context-identification model trained with machine learning, a context of the medical session. The processor can determine, based on the context, an extended reality content to be presented to a user. The processor can detect, based on the data stream and via one or more object-identification models, one or more objects of the medical environment. The processor can establish, based on the objects detected, an interaction field in the medical environment in which to interact with the extended reality content and present the extended reality content to the user for interaction in the interaction field.

Description

EXTENDED REALITY CONTEXTUAL SWITCHING FOR ROBOTIC MEDICAL SYSTEMS

CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application No. 63/567,186, filed March 19, 2024, which is hereby incorporated by reference herein in its entirety.

BACKGROUND

[0002] Medical procedures can be performed in a medical environment, such as an operating room. As the amount and variety of equipment in the operating room increases and medical procedures increase in complexity, it can be challenging to maintain the equipment operating efficiently, reliably, or without incident.

SUMMARY

[0003] The technical solutions of this disclosure provide extended reality (XR) based context switching capabilities for an efficient and reliable just-in-time context-based assistance to the users of robotic medical systems. Setting up robotic medical systems for surgeries can be challenging due to usage of specialized skills and knowledge in configuring surgical robots and understanding the intricacies of planned surgical operations. Consistent and reliable setup across diverse medical environments is further complicated, as addressing faults and emergencies in ongoing medical sessions can be difficult as operators may not have complete awareness of risks or issues affecting their performance. The technical solutions presented in this disclosure can overcome these, or other, challenges by introducing context-switching capabilities that seamlessly transition between live feed video data, virtual reality (VR), and augmented reality (AR) content to provide just-in-time alerts, instructions, and corrective actions for both the setup of robotic medical systems and support in ongoing medical sessions. By incorporating live-view video operations and inserting AR or VR objects into displayed content, the technical solutions facilitate XR content-based guidance to users in various stages of robotic surgeries.

[0004] At least one aspect of the technical solutions relates to a system. The system can include one or more processors coupled with memory. The one or more processors can be configured to receive a data stream of a medical session with a robotic medical system. The data stream can be captured by one or more sensors in a medical environment. The one or more processors can determine, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session. The one or more processors can determine, based on the context determined, an extended reality content to be presented to a user. The one or more processors can detect, based on the data stream and via one or more object-identification models, one or more objects located in the medical environment. The one or more processors can establish, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content. The one or more processors can present the extended reality content to the user for interaction in the interaction field.

[0005] The one or more processors can generate, in an augmented reality (AR) mode of a head-mounted display (HMD) and using one or more locations of the one or more objects, a sterile space within a portion of the interaction field. The sterile space can be marked with one or more AR objects indicating one or more boundaries of the sterile space. The one or more processors can overlay, within the interaction field presented via the HMD, the one or more AR objects indicative of the one or more boundaries of the sterile space.

[0006] The one or more processors can generate, in a virtual reality (VR) mode, a simulation of a virtual robotic medical system corresponding to the robotic medical system. The virtual robotic medical system can be in a virtual medical environment corresponding to the medical environment. A location of the virtual robotic medical system within the virtual medical environment can correspond to a location of the robotic medical system within the medical environment. The one or more processors can display, for selection in the VR mode, a virtual control to move the virtual robotic medical system within the virtual medical environment. The virtual control can correspond to a control to move the robotic medical system via a live-view mode of operation in the medical environment.

[0007] The one or more processors can generate an augmented reality (AR) object to control a performance of a task of the medical session via the robotic medical system in an AR mode of operation. The one or more processors can receive an input to switch from one of a live-view mode of operation or a virtual reality (VR) mode of operation to the AR mode of operation. The one or more processors can switch, responsive to the input, from the one of the live-view mode or the VR mode to the AR mode to overlay the AR object over the live-view of at least a portion of the medical environment within the interactive field. [0008] The one or more processors can generate a virtual reality (VR) object to control a performance of a simulation of a task of the medical session in a VR mode of operation using a virtual robotic medical system corresponding to the robotic medical system in a virtual medical environment corresponding to the medical environment. The one or more processors can receive an input to switch from one of a live-view mode of operation or an augmented reality (AR) mode of operation to the VR mode of operation. The one or more processors can switch, responsive to the input, from the one of the live-view mode or the AR mode to the VR mode to display the simulation in the VR mode.

[0009] The one or more processors can receive, from a head-mounted device (HMD) configured to operate in an augmented reality (AR) mode of operation and a virtual reality (VR) mode of operation, a signal corresponding to one of a hand gesture, an eye movement or a voice input. The one or more processors can select, via the HMD, responsive to the signal, a task to perform in at least one of the AR mode or the VR mode.

[0010] The one or more processors can determine, using one or more depth sensors, a location of a medical instrument based at least on one or more locations of the one or more objects in the medical environment. The one or more processors can overlay, in an augmented reality (AR) mode of operation, an AR object indicative of an anatomy of a patient, the AR object positioned according to the at least one or more locations of the one or more objects. The one or more processors can modify, in the interaction field, at least one of the medical instrument or the AR object based at least on a movement of the medical instrument in the interaction field.

[0011] The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session indicates at least one of a location of a patient side cart or a location of the robotic medical system positioned away from a desired location within the medical environment. The one or more processors can generate the extended reality content indicative of the desired location for the at least one of the patient side cart or the robotic medical system within the interaction field.

[0012] The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of a desired adjustment to one of tilt, rotation or height of a patient side cart in the medical environment. The one or more processors can generate, in the interaction field, the extended reality content comprising an instruction to implement the desired adjustment to the one of tilt, rotation or height of the patient side cart.

[0013] The one or more processors can identify, based at least on one or more portions of the data stream input into the one or more context-identification models, that the context of the medical session corresponds to a current mode of operation of a first one of a live-view mode of operation, an augmented reality (AR) mode of operation or a virtual reality (VR) mode of operation. The one or more processors can identify based at least one on one or more portions of the data stream input into the one or more context-identification models, an event in a second one of the live-view mode, the AR mode or the VR mode. The one or more processors can initiate, responsive to the current mode of operation and the event, a switch from the first one of the live-view mode, the AR mode or the VR mode to the second one of the live-view mode, the AR mode or the VR mode.

[0014] The event can include one of a failure of a medical instrument, a collision involving a medical instrument or a collision involving an anatomy of a patient. The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of one of a desired tilt, rotation or height adjustment of a patient side cart within the medical environment. The one or more processors can generate, in the interaction field, the extended reality content indicative of an instruction to adjust the one of the desired tilt, rotation or height adjustment of the patient side cart.

[0015] The one or more processors can identify, based on at least a portion of the data stream, that an account associated with the medical session corresponds to a profile of a user having a level of skill for the medical session that is below a threshold level of skill for the medical session. The one or more processors can generate, in the interaction field responsive to the identification, the extended reality content including a guideline to perform one or more tasks of the medical session.

[0016] The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that a value of a performance indicator of an account associated with the medical session is below a threshold level for the performance indicator. The one or more processors can generate, in the interaction field responsive to the identification, the extended reality content including an instruction to perform one or more or more tasks of the medical session. The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that a likelihood of a fault in the medical session exceeds a threshold for a likelihood of fault. The one or more processors can activate a clutch for the medical instrument, responsive to the identification.

[0017] At least one aspect of the technical solutions relates to a method. The method can include receiving, by one or more processors coupled with memory, a data stream of a medical session with a robotic medical system captured by one or more sensors in a medical environment. The method can include determining, by the one or more processors, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session and an extended reality content to be presented to a user. The method can include detecting, by the one or more processors, based on the data stream and via one or more object-identification models, one or more objects located in the medical environment. The method can include establishing, by the one or more processors, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content. The method can include presenting, by the one or more processors, the extended reality content to the user for interaction in the interaction field.

[0018] The method can include generating, by the one or more processors, an augmented reality (AR) object to control a performance of a task of the medical session via the robotic medical system in an AR mode of operation. The method can include receiving, by the one or more processors, an input to switch from one of a live-view mode of operation or a virtual reality (VR) mode of operation to the AR mode of operation. The method can include switching, by the one or more processors, responsive to the input, from the one of the live-view mode or the VR mode to the AR mode to overlay the AR object over the live-view of at least a portion of the medical environment within the interactive field.

[0019] The method can include generating, by the one or more processors, a virtual reality (VR) object to control a performance of a simulation of a task of the medical session in a VR mode of operation using a virtual robotic medical system that corresponds to the robotic medical system. The virtual robotic medical system can be in a virtual medical environment that corresponds to the medical environment. The method can include receiving, by the one or more processors, an input to switch from one of a live-view mode of operation or an augmented reality (AR) mode of operation to the VR mode of operation. The method can include switching, by the one or more processors, responsive to the input, from the one of the live-view mode or the AR mode to the VR mode to display the simulation in the VR mode. [0020] The method can include determining, by the one or more processors, using one or more depth sensors, a location of a medical instrument based at least on one or more locations of the one or more objects in the medical environment. The method can include overlaying, by the one or more processors, in an augmented reality (AR) mode of operation, an AR object indicative of an anatomy of a patient, the AR object positioned according to the at least one or more locations of the one or more objects. The method can include modifying, by the one or more processors, in the interaction field, at least one of the medical instrument or the AR object based at least on a movement of the medical instrument in the interaction field.

[0021] An aspect of the technical solutions relates to a non-transitory computer-readable medium storing processor executable instructions. When executed by one or more processors, the instructions can cause the one or more processors to receive a data stream of a medical session with a robotic medical system. The data stream can be captured by one or more sensors in a medical environment. When executed by one or more processors, the instructions can cause the one or more processors to determine, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session. When executed by one or more processors, the instructions can cause the one or more processors to determine, based on the context determined, an extended reality content to be presented to a user. When executed by one or more processors, the instructions can cause the one or more processors to detect, based on the data stream and via one or more objectidentification models, one or more objects located in the medical environment. When executed by one or more processors, the instructions can cause the one or more processors to establish, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content. When executed by one or more processors, the instructions can cause the one or more processors to present the extended reality content to the user for interaction in the interaction field.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] The accompanying drawings are not intended to be drawn to scale. Like reference numbers and designations in the various drawings indicate like elements. For purposes of clarity, not every component can be labeled in every drawing. In the drawings:

[0023] FIG. 1 depicts an example system for an extended reality contextual switching in robotic systems. [0024] FIG. 2 illustrates an example of a surgical system, in accordance with some aspects of the technical solutions.

[0025] FIG. 3 illustrates an example of an extended reality content used to instruct a user to rearrange the locations of one or more objects in a medical environment.

[0026] FIG. 4 depicts an example block diagram of an example computer system is shown, in accordance with some embodiments.

[0027] FIG. 5 illustrates an example of a flow diagram of a method for providing an extended reality contextual switching for a robotic system.

DETAILED DESCRIPTION

[0028] Following below are more detailed descriptions of various concepts related to, and implementations of, systems, methods, apparatuses for extended reality contextual switching for robotic medical systems. The various concepts introduced above and discussed in greater detail below can be implemented in any of numerous ways.

[0029] Although the present disclosure is discussed in the context of a surgical procedure, in various aspects, the technical solutions of this disclosure can be applicable to other medical treatments, sessions, environments or activities, as well as non-medical activities where extended reality functionality is utilized in robotics. For instance, technical solutions can be applied in any environment, application or industry in which activities, operations, processes or acts are performed with tools or instruments that can be captured on video and for which ML modeling can be used to identify or recognize the tools or instruments utilized by the robotic system using various streams of data.

[0030] Setting up robotic medical systems for implementation of robotic surgeries can be difficult. Specialized skills and knowledge used to configure a surgical robot, along with intricate details of the surgical operations to be performed, can make it challenging to consistently and reliably setup and configure a robotic medical system in various medical environments involving different spatial and systemic arrangements. In addition, addressing faults, operational errors or emergencies during the course of a medical session can be difficult as an operator of the surgical medical system may not be aware of all the ongoing risks, issues or challenges affecting the performance. These challenges can create a risk for a systemic error, such as a collision of robotic components or a suboptimal performance of the system. [0031] The technical solutions of the present disclosure can overcome these, or other, challenges by providing context switching capabilities between two or more of live feed video data, virtual reality (VR) and augmented reality (AR) content, allowing for real-time alerts, instructions and corrective actions for both robotic medical system setup and support during an ongoing medical session. The technical solutions can utilize live-view video operations for inserting AR or VR objects into displayed content, thereby providing extended reality (XR) content based alerts, instructions, suggestion to assist the operator in the medical session. This approach allows for natural context switching, offering guidance to the user (e.g., a surgeon) during a medical session. The system can highlight obscured features in live-view mode, present a suitable procedure flow, identify and maintain sterile fields, and even offer a virtual sandbox environment for practicing maneuvers before their execution in the real medical environment.

[0032] FIG. 1 depicts an example system 100 for an extended reality (XR) contextual switching in robotic systems, such as robotic medical systems used in robot-assisted surgeries. Example system 100 can include a combination of hardware and software for providing augmented reality (AR) content overlay over a live-view video stream as well as virtual reality (VR) objects and simulations of procedures implemented using robots, such as medical session procedures implemented using a robotic medical system.

[0033] Example system 100 can include a medical environment 102, including one or more RMSs 120 communicatively coupled with one or more head mounted devices (HMDs) 122 and one or more data processing systems (DPSs) 130 via one or more networks 101. Medical environment 102 can include one or more sensors 104, objects 106, data capture devices 110, medical instruments 112, visualization tools 114, displays 116 and robotic medical systems (RMS) 120. Medical environment 102 can include an interaction field 108 covering any portion of a medical environment 102 (e.g., any one or more of sensors 104, objects 106, data capture devices 110, medical instruments 112, visualization tools 114, displays 116 and any portion of RMS 120). RMS 120 can receive, generate, include or process one or more data streams 162, including kinematics data 172, sensor data 174 and events data 176.

[0034] Head-mounted device (HMD) 122 can include one or more displays 116 and sensors 104. HMD 122 can include one or more eye trackers 124, hand trackers 126 and voice controllers 128. Across the network 101, a DPS 130 can include one or more machine learning (ML) trainers 132 that can be used to generate one or more context ML models 134 and object ML models 136. DPS 130 can include one or more extended reality (XR) environment functions 140 that can include, generate or execute one or more AR objects 142, VR objects 144 and VR simulations 146. DPS 130 can include a data repository storing any number of data streams 162, including kinematics data 172, sensor data 174 and events data 176. DPS 130 can include one or more operating modes controllers 180 having, controlling or executing one or more live view modes 182, augmented reality (AR) modes 184, virtual reality (VR) models 186 and interaction fields 108. Operating modes controller 180 can include one or more overlay functions 196 for context switching and taking action across AR, VR and live view modes. DPS 130 can include or execute on or more indicator functions 188 for performing alerts or indications and one or more performance analyzers 190 generating or providing performance metrics 192 and thresholds 194.

[0035] The HMD 122 can be integrated with a grounded virtual portal (GVP) to improve the precision and focus of the surgeon during robotic medical procedures. The GVP can include a grounded head-mounted display (HMD) 122 which can have a fixed reference frame (e.g., a fixed location) to which the HMD 122 is attached. Such HMD 122 attachment to a fixed frame of reference provides improved visual, oral (e.g., voice-based) or audio, and control referencing when the surgeon's head is placed into the stationary or fixed-location HMD 122. The fixed head position of the GVP-configured HMD 122 can also allow for more controlled operation across various operating modes, such as by allowing the surgeon to seamlessly switch between different operating modes while maintaining the same view (e.g., view angle) of the live view, AR or VR version of the operating scene. The HMD 122 configured as a GVP can incorporate force sensors on the boundaries of the grounded HMD 122, allowing the surgeon to control any virtual cameras by applying pressure to the mounted headset.

[0036] The HMD 122 with a GVP configuration can utilize grounded controllers 118 to provide improved control or inputs during surgical procedures. Grounded controllers 118 can include any controllers or handlers allowing a surgeon to control movements or operation of RMS 120 components, such as medical instruments, in any operating modes (e.g., live view, AR or VR modes). The grounded controllers 118 can include objects (e.g., handlers) that are physically grounded, attached or connected to a physical reference point. For instance, grounded controller can be grounded to a same reference point (e.g., component or part) as the grounded HMD 122. The grounded controllers 118 can be connected to the reference point via mounting brackets, one or more connected components, such as spring-loaded arms, maintaining Cartesian degrees of freedom, while allowing for physical tracking of the controller movements. The grounded controllers 118 can include buttons for input and force sensing resistors integrated into wearable flex circuits to sense grip force. This grip force can be replicated through the instrument, providing user-mediated haptics that allow the surgeon to feel whether manipulated anatomy components are hard or soft, tight or loose. The grounded controllers 118 of a GVP-configured HMD 122 can allow for seamless shifting of depictions of medical instruments between live view and AR and VR modes, allowing for the locations of the medical instruments to be in the same locations when the operating modes shift.

[0037] The HMD 122 with the GVP configuration can improve the system's ability to more seamlessly switch and operate across AR, VR, or live modes based on the context of the medical session. For instance, an operating modes controller 180 can utilize context ML models 134 to detect shifts or changes in context and adjust the level of AR/VR immersion accordingly, while leveraging the fixed location to allow for seamless continuation of movements across the modes. For example, a surgeon can start a movement in live view mode, such as making an incision, and then seamlessly transition to AR mode where the same live- view scene is replicated with overlaid AR objects providing additional guidance. The surgeon can then continue the movement with the controls and view repositioned at the same positions at which the surgeon has left them off, allowing for continuity and precision across the operation modes.

[0038] Robotic medical system 120, also referred to as an RMS 120, can be deployed in any medical environment 102. Medical environment 102 can include any space or facility for performing medical procedures, maneuvers or tasks in medical sessions, including for example a surgical facility, or an operating room. Medical environment 102 can include medical instruments 112 (e.g., surgical tools used for various surgical maneuvers or tasks) which the RMS 120 can facilitate or utilize for performing surgical patient procedures, whether invasive, non-invasive, or any in-patient or out-patient procedures. Robotic medical system 120 can be centralized or distributed across a plurality of components, computing devices or systems, such as computing devices 300 (e.g., used on servers, network devices or cloud computing products) to implement various functionalities of the RMS 120, including network communication or processing of data streams 162 across various devices over the network 101.

[0039] The medical environment 102 can include one or more data capture devices 110 (e.g., optical devices, such as cameras) as well as sensors 104 (e.g., detectors or sensing devices) for making measurements and capturing data streams 162. Data streams 162 can include any sensor data 174, such as images or videos of a surgery, measurements of distances (e.g., depth), temperature, stress (e.g., pressure of vibration), light, motion, humidity, motion, velocity, acceleration, force or material (e.g., gas) concentration. Data streams 162 can include kinematics data 172 on any movement of medical instruments 112, users (e.g., medical staff), or any devices in the medical environment 102. Data streams 162 can include any events data 176, such as instances of a failure of a medical instrument 112, a collision involving a medical instrument or a collision involving an anatomy of a patient, as well as occurrences of installation, configuration or selection of any devices in the medical environment 102.

[0040] The medical environment 102 can include one or more visualization tools 114 to gather the captured data streams 162 and process the data for display to the user (e.g., a surgeon, a medical professional or an engineer or a technician configuring the RMS) via one or more (e.g., touchscreen) displays 116 or displays of an HMD 122. A display 116 can present data stream 162 (e.g., sensor data 174, kinematics data 172 or events data 176) during the course of a medical procedure (e.g., a surgery) performed via an RMS 120, which can be used to handle, manipulate, hold or utilize medical instruments 112 to perform surgical tasks at the surgical site. RMS 120 can include system configurations based on which RMS 120 can operate, and the functionality of which can impact the data flow of the data streams 162.

[0041] System 100 can include any number of sensors 104 dispersed throughout the medical environment 102. Sensors 104 can include electronic or electrical components, a combination of electronic and software components, mechanical or electromechanical components, or any combination thereof. Various sensors 104 can include devices, systems or components detecting, measuring and/or monitoring a variety of signals in a variety of applications, such as temperature, pressure, location, proximity, light, humidity, motion, acceleration, velocity, distance, depth, magnetic field, electric field, pressure or movement of gas, images (e.g., camera), sounds (e.g., microphone), force, touch, moisture, radiation, pH, humidity or vital signs of a person. For instance, a sensor 104 can include a global positioning system sensor or transceiver for location, one or more image sensors for capturing images, one or more accelerometers, one or more gyroscopes, one or more magnetometers, or another suitable form of sensor that detects motion and/or location. Sensor 104 can include a camera or a device measuring a distance between the sensor 104 and an object 106 (e.g., a patient side cart, patient table or a bed, or any other device or component of the medical environment 102). Sensors 104 can include one or more gyroscopes that can detect rotational movement (e.g., pitch, yaw, roll) and one or more accelerometers can measure translational movement (e.g., forward/back, up/down, left/right). Sensor 104 can detect, determine, measure or quantify a motion, such of a physical object, such as a hand gesture a user gesture, the contour of the hand, a movement of an eye (e.g., location of an iris), a user interaction, a voice command or any other user action.

[0042] Medical environment 102 can include various types of objects 106. An object 106 can include any device, system, component or a thing that can be positioned, deployed, located, provided or used in a medical environment 102. Object 106 can include a patient table, a medical instrument 112 or a data capture device 110. Object 106 can include a display 116 or a visualization tool 114. Object 106 can include a wall of an operating room, a picture, an artifact or a feature on a wall, or a corner of a wall. Object 106 can include an RMS 120, or any component of an RMS 120. Object 106 can include a patient side cart (PSC), such as a cart or a robotic arm system (e.g., an RMS 120) that can be positioned next to a patient during a patient surgery.

[0043] Data processing system 130 can include any combination of hardware and software for providing extended reality content or contextual switching between AR, VR and live-view video operating mode in a system utilizing an RMS 120. DPS 130 can include any computing device (e.g., computing device 300) and can include one or more servers, virtual machines or can be part of, or include a cloud computing environment. The data processing system 130 can be provided via a centralized computing device (e.g., 300), or can be provided via distributed computing components, such as including multiple, logically grouped servers and facilitating distributed computing techniques. The logical group of servers may be referred to as a data center, server farm or a machine farm. The servers, which can include virtual machines, can also be geographically distributed or dispersed. A data center or machine farm may be administered as a single entity, or the machine farm can include a plurality of machine farms. The servers within each machine farm can be heterogeneous - one or more of the servers or machines can operate according to one or more type of operating system platform.

[0044] The data processing system 130, or components thereof can include a physical or virtual computer system operatively coupled, or associated with, the medical environment 102. In some embodiments, the data processing system 130, or components thereof can be coupled, or associated with, the medical environment 102 via a network 101, either directly or directly through an intermediate computing device or system. The network 101 can be any type or form of network. The geographical scope of the network can vary widely and can include a body area network (BAN), a personal area network (PAN), a local-area network (LAN) (e.g., Intranet), a metropolitan area network (MAN), a wide area network (WAN), or the Internet. Network 101 can include wireless communication links between an HMD 122 and any combination of devices or components of a medical environment 102 or any portion of a DPS 130. The topology of the network 101 can assume any form such as point-to-point, bus, star, ring, mesh, tree, etc. The network 101 can utilize different techniques and layers or stacks of protocols, including, for example, the Ethernet protocol, the internet protocol suite (TCP/IP), the ATM (Asynchronous Transfer Mode) technique, the SONET (Synchronous Optical Networking) protocol, the SDH (Synchronous Digital Hierarchy) protocol, etc. The TCP/IP internet protocol suite can include application layer, transport layer, internet layer (including, e.g., IPv6), or the link layer. The network 101 can be a type of a broadcast network, a telecommunications network, a data communication network, a computer network, Wireless Fidelity (Wi-Fi) network, a Bluetooth network, a cellular network (e.g., 4G, 5G or 6G network), or other types of wired or wireless networks.

[0045] The data processing system 130, or components thereof, can be located at least partially at the location of the surgical facility associated with the medical environment 102 or remotely therefrom. Elements of the data processing system 130, or components thereof can be accessible via portable devices such as laptops, mobile devices, wearable smart devices, etc. The data processing system 130, or components thereof, can include other or additional elements that can be considered desirable to have in performing the functions described herein. The data processing system 130, or components thereof, can include, or be associated with, one or more components or functionality of a computing including, for example, one or more processors coupled with memory that can store instructions, data or commands for implementing the functionalities of the DPS 130 discussed herein.

[0046] Data repository 160 of the DPS 130 can include one or more data streams 162. Data repository 160 can include or be implemented in a storage device 325. Data streams 162 can include any series of data packets of a particular type or a form, which can be generated by a particular device (e.g., a sensor, a camera, a robotic device, a detector or an event detection function). For instance, data stream 162 can include a stream of data packets of a sensor data 172, including or more video frames, video images of a video camera or an endoscopic device. Data stream 162 can include measurements or sensors 104, including timestamped data corresponding to force, torque or biometric data, haptic feedback data, endoscopic images or data, ultrasound images or videos and any other sensor data. Data stream 162 can include a stream of kinematics data 172, including any data indicative of temporal positional coordinates of a device, or indicative of movement of a medical instrument 112 on an RMS 120. Data stream 162 can include a stream of data packets corresponding to events, such as events indicative of, or corresponding to, errors in operation or faults, collisions of a medical instrument 112, occurrences of installation, uninstallation, engagement or disengagement, setting or unsetting of any medical instrument 112 on an RMS 120.

[0047] The system 100 can include one or more data capture devices 110 (e.g., video cameras, sensors or detectors) for collecting any data stream 162, that can be used for machine learning, including detection of objects 106 from sensor data 174 (e.g., video frames or force or feedback data), detection of particular events (e.g., user interface selection of, or a surgeon’s engaging of, a medical instrument 112) or detection of kinematics (e.g., movements of the medical instrument 112). Data capture devices 110 can include cameras or other image capture devices for capturing videos or images from a particular viewpoint within the medical environment 102. The data capture devices 110 can be positioned, mounted, or otherwise located to capture content from any viewpoint that facilitates the data processing system capturing various surgical tasks or actions. Data capture devices 110 can be used to detect or recognize user gestures or actions, and determine distances to, from and between one or more objects 106 of a medical environment 102.

[0048] Data capture devices 110 can include any of a variety of sensors 104, such as detectors, cameras, video imaging devices, infrared imaging devices, visible light imaging devices, intensity imaging devices, including black, color and grayscale imaging devices, depth imaging devices, (e.g., stereoscopic imaging devices, time-of-flight imaging devices, etc.), medical imaging devices such as endoscopic imaging devices, ultrasound imaging devices, etc., non-visible light imaging devices, any combination or sub-combination of the above mentioned imaging devices, or any other type of imaging devices that can be suitable for the purposes described herein. Data capture devices 110 can include cameras that a surgeon can use to perform a surgery and observe manipulation components within a purview of field of view suitable for the given task performance. Data capture devices can output any type of data streams 162, including data streams 162 of kinematics data 172 (e.g., kinematics data stream), data streams 162 of events data 176 (e.g., events data stream) and data streams 162 of sensor data 174 (e.g., sensors data stream).

[0049] For instance, data capture devices 110 can capture, detect, or acquire sensor data 174, such as videos or images, including for example, still images, video images, vector images, bitmap images, other types of images, or combinations thereof. The data capture devices 110 can capture the images at any suitable predetermined capture rate or frequency. Settings, such as zoom settings or resolution, of each of the data capture devices 110 can vary as desired to capture suitable images from any viewpoint. For instance, data capture devices 110 can have fixed viewpoints, locations, positions, or orientations. The data capture devices 110 can be portable, or otherwise configured to change orientation or telescope in various directions. The data capture devices 110 can be part of a multi-sensor architecture including multiple sensors, with each sensor being configured to detect, measure, or otherwise capture a particular parameter (e.g., sound, images, or pressure).

[0050] Data capture devices 110 can generate sensor data 174 from any type and form of a sensor, such as a positioning sensor, a biometric sensor, a velocity sensor, an acceleration sensor, a vibration sensor, a motion sensor, a pressure sensor, a light sensor, a distance sensor, a current sensor, a focus sensor, a temperature or pressure sensor or any other type and form of sensor used for providing data on medical instruments or tools 112, or data capture devices (e.g., optical devices). For example, a data capture device 110 can include a location sensor, a distance sensor or a positioning sensor providing coordinate locations of a medical tool 112 (e.g., kinematics data 172). Data capture device 110 can include a sensor providing information or data on a location, position or spatial orientation of an object (e.g., medical tool 112 or a lens of data capture device 110) with respect to a reference point for kinematics data 172. The reference point can include any fixed, defined location used as the starting point for measuring distances and positions in a specific direction, serving as the origin from which all other points or locations can be determined.

[0051] Display 116 can show, illustrate or play data stream 162, such as a video stream, in which medical tools 112 at or near surgical sites are shown. For example, display 116 can display a rectangular image of a surgical site along with at least a portion of medical tools 112 (e.g., instruments) being used to perform surgical tasks. Display 116 can provide compiled or composite images generated by the visualization tool 114 from a plurality of data capture devices 110 to provide a visual feedback from one or more points of view. Display 116 can include a display used an HMD 122, for delivering visual content in live view mode 182, AR mode 184 or VR mode 186. Positioned close to user’s eyes, display 116 of an HMD 122 can include or utilize OLED or LCD to provide high-resolution visuals with low latency. The display's refresh rate can be at least 60Hz, 90Hz, 120Hz or more than 120 Hz and can provide a wide field of view, such as covering an entire field of view of a user wearing the HMD 122.

[0052] The visualization tool 114 that can be configured or designed to receive any number of different data streams 162 from any number of data capture devices 110 and combine them into a single data stream displayed on a display 116. The visualization tool 114 can be configured to receive a plurality of data stream components and combine the plurality of data stream components into a single data stream 162. For instance, the visualization tool 114 can receive a visual sensor data from one or more medical tools 112, sensors or cameras with respect to a surgical site or an area in which a surgery is performed. The visualization tool 114 can incorporate, combine or utilize multiple types of data (e.g., positioning data of a medical tool 112 along sensor readings of pressure, temperature, vibration or any other data) to generate an output to present on a display 116. Visualization tool 114 can present locations of medical tools 112 along with locations of any reference points or surgical sites, including locations of anatomical parts of the patient (e.g., organs, glands or bones).

[0053] Medical tools 112 can be any type and form of tool or instrument used for surgery, medical procedures or a tool in an operating room or environment. Medical tool 112 can be imaged by, associated with or include an image capture device. For instance, a medical tool 112 can be a tool for making incisions, a tool for suturing a wound, an endoscope for visualizing organs or tissues, an imaging device, a needle and a thread for stitching a wound, a surgical scalpel, forceps, scissors, retractors, graspers, or any other tool or instrument to be used during a surgery. Medical tools 112 can include hemostats, trocars, surgical drills, suction devices or any instruments for use during a surgery. The medical tool 112 can include other or additional types of therapeutic or diagnostic medical imaging implements. The medical tool 112 can be configured to be installed in, coupled with, or manipulated by an RMS 120, such as by manipulator arms or other components for holding, using and manipulating the medical instruments or tools 112.

[0054] RMS 120 can be a computer-assisted system configured to perform a surgical or medical procedure or activity on a patient via or using or with the assistance of one or more robotic components or medical tools 112. RMS 120 can include any number of manipulator arms for grasping, holding or manipulating various medical tools 112 and performing computer-assisted medical tasks using medical tools 112 controlled by the manipulator arms.

[0055] Data streams 162 can be generated by the RMS 120. For instance, sensor data 174 can include images (e.g., video images) captured by a medical tool 112 can be sent to the visualization tool 114. For instance, a touchscreen display 116 can be used by a surgeon to select, engage or configure a particular medical instrument 112, thereby triggering an event that can be indicated or included in data packets of a data stream 162 of events data 176. RMS 120 can include one or more input ports to receive direct or indirect connection of one or more auxiliary devices. For example, the visualization tool 114 can be connected to the RMS 120 to receive the images from the medical tool when the medical tool is installed in the RMS 120 (e.g., on a manipulator arm for handing medical instruments 112). For example, data stream 162 can include data indicative of positioning and movement of medical instruments 112 that can be captured or identified by data packets of a kinematics data 172. The visualization tool 114 can combine the data stream components from the data capture devices 110 and the medical tool 112 into a single combined data stream 162 including multiple streams for kinematics data 172, sensor data 174 and events data 176, any of which can be indicated or presented on a display 116.

[0056] Interaction field 108 can include any portion of a medical environment 102 that can be used for overlaying extended reality (XR) content, such as AR objects 142 or VR objects 144. Interaction field 108 can include any portion of a medical environment 102, such as a portion of an operating room within which a sterile region is maintained during the medical session. Interaction field 108 can include any portion of an RMS 120, including medical instruments, displays 116, any objects 106 or sensors 104, any data capture devices 110 or visualization tools 114. Interaction field 108 can include a region or an area around a patient table, including a patient side cart (PSC) or any other device or a system in a medical environment 102.

[0057] Interaction field 108 can include a space on a display 116 corresponding to a view of a portion of the medical environment 102 in which the user can interact with extended reality content, such as based on detected objects 106. Interaction field 108 can include any area or volume in which user gestures, actions or selections can be used or detected to facilitate interaction with overlaid or displayed AR objects 142 or VR objects 144. Interaction field 108 can include, or correspond to, a portion of the medical environment 102 in which AR objects 142 or VR objects 144 can be overlaid or displayed. Interaction field 108 can be displayed or shown on a display 116 in the medical environment 102, facilitating user interaction via a touch screen. Interaction field 108 can be presented on a display 116 of an HMD 122, facilitating user interaction through eye movements or eye tracking. Interaction field 108 can be provided via a video feed corresponding to an area of a displayed view in which user’s hand gestures can be used select, unselect AR objects 142 or VR objects 144 to control operation or actions.

Interaction field 108 can include an area or a space within which VR simulations 146 can be implemented. For instance, the system can establish, based on the one or more detected AR objects 142 or VR objects 144, an interaction field in the medical environment to allow the operator to interact with the XR content. For instance, the XR environment functions 140 can present the XR content to the user for interaction in the interaction field 108 in an AR mode 184 or VR mode 186.

[0058] Interaction field 108 can be used or monitored by operating modes controller 180 or the overlay function 196 to identify actions to be taken based on developments across different modes (e.g., 182, 185 and 186). For instance, interaction field 108 can include movement of devices or systems in a live view mode 182 and based on AR objects 142 of a AR mode 184, a collision between devices (e.g., arms of a RMS 120) can be detected, allowing the operating mode controller 180 to take action and suspend the motion the device to prevent the collision within the interaction field 108. Head-Mounted Display (HMD) 122 can include any combination of hardware and software that can be worn on a user’s head to provide a user with a virtual, augmented, live-view or mixed reality experiences. HMD 122 can include a device, which can include a pair of glasses or a helmet, which can be worn over the user’s eyes. HMD 122 can include one or more displays 116 that can be positioned close to the eyes, providing the user with a visual interface that replaces or augments the user’s perception of the surrounding environment.

[0059] HMD 122 can include any number of sensors 104 to track head movements or eye trackers 124, to facilitate the virtual or augmented content to respond dynamically to the user's perspective as well as receive user inputs or selections. HMD 122 can be utilized by users, such as medical professionals, to participate in medical procedures remotely. HMD 122 can be worn on a user’s head or can be fixed to a specific location, providing stability, and display extended reality (XR) that includes augmented, virtual, live-view or mixed realities, depicting various features of the medical environment 102. HMD 122 can include sensors 104 providing information about the user's location, orientation, and gaze direction, allowing the HMD 122 to generate corresponding views and images.

[0060] HMD 122 can include sensors 104, a communication interface, and a visualization tool 114 to detect its location, orientation, and the user's gaze direction. Using this information, the HMD 122 can render images representing the current state of the medical environment 102 and integrate real-time data from different capture devices 110. HMD 122 can include the functionality to deliver extended or mixed reality content, including by rendering images, videos, or audios based on user interactions. HMD 122 can be included in a medical environment 102 and can be considered an object 106 to be simulated, rendered or displayed as an AR object 142, VR object 144 or a VR simulation 146. [0061] Eye tracker 124 can include any combination of hardware and software for measuring, detecting, monitors and analyzing a user's gaze, including eye movements and locations. Eye tracker 124 can include the functionality to use the user’s gaze (e.g., movements and locations of the eyes) to determine various user commands or actions, such as selections of features displayed. Eye tracker 124 can utilize sensor data (e.g., camera images) or infrared sensors to track the user's eye movements in real-time, or gestures indicated by eye movements or movement of any other part of the user’s body, allowing for accurate detection of where the user is looking with respect to the features displayed in the extended reality on the display 116. Eye tracker 124 can interpret the user’s visual focus and trigger actions such as selecting options, navigating through menus, activating controls, or even initiating specific commands without the need for physical input devices.

[0062] Hand tracker 126 can include any combination of hardware and software that uses data from various sensors 104 to monitor and analyze the movements of any part of a user’s body, such as users’ hands, arms, head movement or any other portion of the user’s body Hand tracker 126 can include or utilize computer vision techniques (e.g., object ML model 136) to use images or videos of user’s hands to detect or discern various gestures that the user makes. Hand tracker 126 can track movements of user’s hands, arms, head, torso, legs or any other portion of body of the user. Body or body parts movements can be tracked individually (e.g., movements of hands, arms, eyes) or collectively (e.g., body of a plurality of body parts moving together). Hand tracker 126 can operate in real time and allow the user to seamlessly interact with DPS 130. For instance, the user can use the hand tracker 126 to (e.g., via sensors 104) to control medical tools 112 in the medical environment 102. Hand tracker 126 can detect gestures, such as finger movements for scrolling or selecting items, hand waving for navigation or moving of objects, pinching or expanding fingers for zooming in or out, and making specific hand movements or shapes to trigger specific commands or actions.

[0063] Voice controller 128 can include any combination of hardware and software that uses voice for controlling devices. Voice controller 128 can include a microphone to detect a user’s verbal command or an instruction. Voice controller 128 can detect statements, such as code words, to trigger a command for the user to control RMS 120 via HMD 122. For example, a voice controller 128 can facilitate the user to provide a verbal instruction to an HMD 122 to implement an action, such as a switch between different modes of operation (e.g., 182, 184 or [0064] The data repository 160 can include one or more data files, data structures, arrays, values, or other information that facilitates operation of the data processing system 130. The data repository 160 can include one or more local or distributed databases and can include a database management system. The data repository 160 can include, maintain, or manage one or more data streams 162. The data stream 162 can include or be formed from one or more of a video stream, image stream, stream of sensor measurements, event stream, or kinematics stream. The data stream 162 can include data collected by one or more data capture devices 110, such as a set of 3D sensors from a variety of angles or vantage points with respect to the procedure activity (e.g., point or area of surgery).

[0065] Data stream 162 can include any stream of data. Data stream 162 can include a video stream, including a series of video frames or organized into video fragments, such as video fragments of about 1, 2, 3, 4, 5, 10 or 15 seconds of a video. Each second of the video can include, for example, 30, 45, 60, 90 or 120 video frames per second. Video fragments can be used to form a composite video to be used for determining performance metrics 192. Data streams 162 can include an event stream which can include a stream of event data 176 or information, such as packets, which identify or convey a state of the robotic medical system 120 or an event that occurred in association with the robotic medical system 120. For example, data stream 162 can include any portion of system configuration, including information on operations on data streams 162, data on installation, uninstallation, calibration, set up, attachment, detachment or any other action performed by or on an RMS 120 with respect to medical instruments 112.

[0066] Data stream 162 can include data about an event, such as a state of the robotic medical system 120 indicating whether the medical tool or instrument 112 is calibrated, adjusted or includes a manipulator arm installed on a robotic medical system 120. Stream of event data 176 (e.g., event data stream) can include data on whether a robotic medical system 120 was fully functional (e.g., without errors) during the procedure. For example, when a medical instrument 112 is installed on a manipulator arm of the robotic medical system 120, a signal or data packet(s) can be generated indicating that the medical instrument 112 has been installed on the manipulator arm of the robotic medical system 120.

[0067] Data stream 162 can include a stream of kinematics data 172 which can refer to or include data associated with one or more of the manipulator arms or medical tools 112 (e.g., instruments) attached to the manipulator arms, such as arm locations or positioning. Data corresponding to medical tools 112 can be captured or detected by one or more displacement transducers, orientational sensors, positional sensors, or other types of sensors and devices to measure parameters or generate kinematics information. The kinematics data 172 can include sensor data along with time stamps and an indication of the medical tool 112 or type of medical tool 112 associated with the data stream 162.

[0068] Data repository 160 can store sensor data 174 having video frames that can include one or more static images or frames extracted from a sequence of images of a video file. Video frame can represent a specific moment in time and can be identified by a metadata including a timestamp. Video frame can display a visual content of the video of a medical procedure being analyzed by performance analyzer 190 to determine performance metrics 192 indicative of the performance of the surgeon performing the procedure. For example, in a video file capturing a robotic surgical procedure, a video frame can depict a snapshot of the surgical task, illustrating a movement or usage of a medical instrument 112 such as a robotic arm manipulating a surgical tool within the patient's body.

[0069] Extended reality (XR) environment functions 140 can include any combination of hardware and software for providing extended reality content. XR environment functions 140 can include any functions, computer code, or combination of circuits and instructions to generate, implement, or display AR objects 142 or VR objects 144 objects via one or more displays 116, such as an OLED display 116 on an HMD 122. XR environment functions 140 can include, provide or facilitate VR simulations 146, such as simulations of a medical environment (e.g., provide a 3D virtual rendering of a medical environment 102 with all of the components (e.g., 104-116). EX environment functions 140 can generate any AR objects, VR objects 144 or VR simulations 146, responsive to determinations of context by the context identifying ML models 134 and object identifying ML models 136.

[0070] XR content can include any combination of AR or VR content. XR content, such as the AR objects 142 and VR objects 144 can include, provide or indicate any content generated and superimposed on a display image or video. AR objects 142 and VR objects 144 can include illustrated, opaque, transparent or semi-transparent, features, alerts, indications, texts, images, illustrations or notifications concerning, medical procedure tasks, phases or sessions, devices or systems, patient anatomies or information. AR objects 142 and VR objects 144 can represent, include, provide or indicate, augmented reality illustrations, such as illustrations or representations of an organ of a patient, location of a device, such as a medical instrument 112 within a patient’s body. [0071] AR objects 142 can include any augmented reality content. AR objects 142 can include objects provided (e.g., illustrated, displayed or rendered) in an augmented reality mode 184. AR objects 142 can include or correspond to any representation of medical instruments 112, alerts, indications or instructions for specific actions, guidance or instructions for users (e.g., surgeons) to undergo particular actions or tasks, and provide directives for surgical maneuvers. AR objects 142 can facilitate resetting or reconfiguring devices like RMS 120, or other objects 106, and highlighting interaction fields 108 within a medical environment 102. AR objects 142 can include illustrated or superimposed real-time representations of medical tools or instruments 112, data capture devices 110, sensors 104, displays 116, or any other RMS 120 components, such as manipulation arms for handling medical instruments 112. AR objects 142 can aid users in a visualization and decision-making during medical procedures.

[0072] VR objects 144 can include any virtual reality content, including any illustration, depiction, sketch, representation or a visualization of a feature, object, patient or a system in a virtual reality mode 186. VR objects 144 can include features representing a simulated version of a medical environment, robotic medical system or any component within a medical environment (e.g., 104-120). VR object 144 can include a virtual medical environment with a 3D illustration of RMS, virtual objects within an operating room, or simulations of specific maneuvers or tasks, providing users with immersive and realistic illustration, animation or rendering of a medical task or process, that a user can utilize for training, practicing or planning a particular task or an execution.

[0073] VR simulations 146 can include any combination of hardware and software for simulating any portion of a medical environment 102. VR simulation 146 can include a virtual representation of an RMS 120, objects 106, displays 116, medical instruments 112, sensors 104, data capture device 110, visualization tools 114 or HMDs 122. VR simulation 146 can include a creation and representation of VR objects 144 in a computer-generated environment. In the context of medical simulations, these VR objects 144 can include virtualized RMS or a virtualized patient along with any anatomies. VR simulation 146 can further incorporate virtualized movements of manipulator arms of the RMS holding virtual medical instruments 112, allowing users to interact with and manipulate these elements in a virtual space. VR simulations 146 can provide simulated medical procedures, maneuvers or tasks, training scenarios, or the recreation of specific surgical environments, facilitating the user’s practice in a risk-free virtual environment. [0074] Operating modes controller 180 can include any combination of hardware and software for controlling the modes of operations. Operating modes controller 180 can include the functionality to switch between a live view mode 182, AR mode 184 and VR mode 186. Operating modes controller 180 can facilitate the provision, management, and transition between any of the live-view mode 182, an AR mode 184, and VR mode 186, such as in response to context determinations made by context ML models 134 assessing the context of a medical session. For instance, operating modes controller 180 can trigger the overlay of an AR object 142 when operating in AR mode 184 or display a VR object 144 object during VR simulation 146 that can be implemented in a VR mode 186.

[0075] For instance, the operating modes controller 180 can identify, based on at least a portion of the data stream input into the one or more context-identification models (e.g., context ML models 134), that the context of the medical session corresponds to a current mode of operation of a first one of a live-view mode 182 of operation, an AR mode 184 of operation or a VR mode 186 of operation. The operating modes controller 180 can identify or detect, based at least on one or more portions of the data stream input into the one or more context ML models 134, an event in a second one (e.g., a different mode from the one in which the user operates) of the live-view mode 182, the AR mode 184 or the VR mode 186. The operating modes controller 180 can initiate, responsive to the current mode of operation and the event, a switch from the first one of the live-view mode 182, the AR mode 184 or the VR mode 186 to the second one of the live-view mode 182, the AR mode 184 or the VR mode 186.

[0076] The operating modes controller 180 can utilize context ML models 134 to detect shifts or changes in context of the medical procedure. For instance, the operating mode controller 180 can determine, based on the data stream and via one or more context ML models 134 trained with machine learning to detect various medical sessions contexts, a particular context of the medical session. The contexts can include recognized or detected phases of a surgery, tasks or maneuvers implemented during a medical procedure, such as an incision, a dissection, an act of imaging or an act of suturing. The context ML models 134 can identify specific events, such as a collision involving a medical instrument or a failure of a medical instrument, which may trigger a change in the context and lead to a shift between the operating modes. The ML models 134 can assess the skill level of the user and provide real-time guidance or alerts to maintain the safety and efficacy of the medical procedure.

[0077] The operating modes controller 180 can adjust the controls based on the detected context or changes in the context in the medical session. For instance, when the context ML models 134 identify a shift from a live-view mode 182 to an AR mode 184, the operating modes controller 180 can activate AR-specific controls, such as overlaying AR objects 142 to guide the surgeon's actions. For instance, if the operating modes controller 180 detects that a context has shifted to a VR mode 186, the operating modes controller 180 can provide, generate or produce VR-specific controls, such as simulating the movement of medical instruments in a virtual environment. For example, the operating modes controller 180 can modify the interaction field 108 to accommodate the new operating mode, provide a particular or expected AR object 142 or a VR object 144 and allow for the controls to remain intuitive and responsive. For instance, in AR mode 184, the controls may include real-time alerts and instructions, while in VR mode 186, the controls may focus on providing a simulated environment for practice and training.

[0078] Live view mode 182 can include any mode of operation in which a live or a realtime video feed is provided, absent any AR objects 142 or VR objects 144 inserted into the display or view. Live view mode 182 can allow the user to observe the state of the medical environment 102 and the medical session in real time and without any XR content inserted into the displayed content.

[0079] Augmented reality (AR) mode 184 can include any mode of operation in which one or more AR objects 142 are overlaid or inserted into the video stream (e.g., real-time video) of the medical session or a procedure. AR mode 184 can include a mode in which an AR object 142 can include an instruction or a warning overlaid over a portion of a video stream, providing instructions or guiding the user into a particular way of performing a task or a maneuver. AR object 142 can include an AR object 142 of a medical instrument 112 inserted into an anatomy of a patient, facilitating the surgeon’s ability to visualize the location of the instrument with respect to the patient’s body.

[0080] Virtual reality (VR) mode 186 can include any mode of operation in which one or more VR objects 144 or one or more simulations 146 are provided. VR mode 186 can allow the user to simulate a task or a phase of a medical procedure or a session. VR mode 186 can include VR objects 144 representing the virtual medical environment along with virtual representation of all of components of the medical environment 102, including any one or more of 104-122, in any arrangement, physical position, movement or action.

[0081] Overlay function 196 can include any combination of hardware and software for determining and providing context and take cross-context actions based on events or occurrences within the live view mode 182, AR mode 184 and VR mode 186. Overlay function 196 can monitor modes 182, 184 and 186 and keep track of movements of users, objects, systems or devices. As users may utilize AR modes 184 or VR modes 186 for various tasks, overlaps of objects may occur, which in some instances can lead to collisions. Overlay function 196 can detect such anticipated collisions and provide alarms or take action to suspend the motion of the colliding devices, thereby preventing the collision from occurring. Overlay function 196 can be used by, used with or work together with one or more context ML models 134 to detect context changes and take actions, such as engage a clutch or stop movement of a RMS 120 component (e.g., manipulator arm holding a medical instrument 112) in response to detecting a collision or determining that a likelihood of a possibility of a collision exceeds a threshold for a collision.

[0082] Overlay function 196 can include the functionality to maintain the correlation between locations and movements of users, objects, systems or any features within the medical environment or VR or AR environment. Overlay function 196 can monitor and determine likelihood of an action taking place and take action in response to the likelihood exceeding a threshold. For instance, overlay function 196 can detect that a likelihood of a collision involving an RMS 120 exceeds a threshold. Responsive to the threshold for the likelihood of the collision being exceeded, the overlay function 196 can utilize the operating modes controller 180 to issue an instruction to the RMS 120 to take action, such as engage a clutching mechanism to stop any further movement of the RMS 120 to prevent the collision. Overlay function 196 can identify particular courses of actions in a particular mode (e.g., AR mode 184) which can be applied in live view mode 182. For instance, overlay function 196 can identify a path through which a system or a device (e.g., PSC) can be moved from a starting location within a medical environment 102 to a destination within the medical environment 102 without any collisions along the way. Overlay function 196 can, responsive to such a path being identify, display AR objects 142 on the users HMD 122, highlighting or overlaying the path through which the user can move the PSC to the intended destination.

[0083] Overlay function 196 can provide the functionality for switching between different modes, as needed. For instance, the overlay function 196 can utilize the interaction field 108 to provide context switching actions across AR, VR and live view modes of operation. Overlay function 196 can monitor the interaction field 108 for various AR objects 142, VR objects 144 or movements of uses, systems or devices within the interaction field 108 and trigger action responsive to anticipated events. For instance, overlay function 196 can detect an anticipated collision between two devices (e.g., two instruments, arms of RMS 120 or an instrument 112 and a PSC) across different modes 182, 184 or 186 and can trigger actions to prevent such collisions or similar events, as detected within the interaction field 108. Interaction field 108 can include paths in the medical environment 102 in which the XR environment functions 140 can determine that a particular systems or tool (e.g., PSC) can be moved. Overlay function 196 can determine, generate or provide AR objects 142 or VR objects 144 to indicate in the AR mode 184 or VR mode 186 the path along which the particular system or tool can be moved or located.

[0084] The ML trainer 132 can any combination of hardware and software for training any ML models, such as content ML models 134 or object ML models 136. The ML trainer 132 can train any type and form of ML models including ML models for determining the performance metrics 192 of a user (e.g., a surgeon) using any data from the data streams 162 used as input (e.g., video data, sensors data 174, kinematics data 172 or events data 176. ML trainer 132 can include a framework or functionality for training any machine learning models, such as neural network models identifying or detecting context of a medical procedure, or a medical session being implemented using an RMS 120. ML trainer 132 can train a context ML model 134 to utilize attention mechanisms for cross-referencing features between data streams 162 (e.g., kinematic data stream, sensor data stream and events data stream) to detect contextual data from the medical session.

[0085] For instance, ML trainer 132 can train one or more object ML models 136 and context ML models 134 to utilize attention mechanisms to correlate data stream 162 (e.g., 172, 174 or 176) to detect events or occurrences for making a switch between live-view mode 182, AR mode 184 and VR mode 186. For instance, ML trainer 132 can train an object ML models 136 to detect or recognize any objects 106, AR objects 142 or VR objects 144, including, for example, any sensors 104, medical instruments 112, displays 116, data capture devices 110 or any component of an RMS 120 in a medical environment 102, via data in a data stream 162. ML trainer 132 can train a context ML model 136 to determine or detect a trigger for a context switch, such as when a collision is about to occur (e.g., between a medical instrument 112 and another object 106), or when an object 106 is setup incorrectly (e.g., a patient side cart is not in a position that facilitates a full range of motion for manipulator arms handling a medical instrument 112), or when a patient side cart has not been docket, or when a system fault or an error has occurred. [0086] ML trainer 132 can include any combination of hardware and software for training or generating object ML models 136 for detecting or recognizing objects 106, AR objects 142 or VR objects 144. ML trainer 132 can include the functionality to train ML models 134 or 136 using supervised or unsupervised techniques, including for example self-supervision techniques applied to neural network models, such as feedforward neural networks (FNN), long short-term memory networks (LSTM), gated recurrent units (GRU) and generative adversarial networks (GANs) and transformers. ML trainer 132 can utilize training datasets to train any of the ML models 134 or 136. ML trainer 132 can use the training dataset for label-based or non-label based (e.g., self-supervision) training and can develop or establish encoders and classifiers (e.g., classification blocks) for any of the ML models 134 or 136, utilizing weights to adjust and tune loss functions for different ML models 134 or 136. For instance, ML trainer 132 can improve the performance of the ML models 134 or 136 using weights to more accurately detect and identify determinations or predictions of ML models 134 or 136 to one or more embedding layers or attention mechanisms, depending on the implementation.

[0087] Context ML models 134, also referred to as context-identification ML models 134 can include any machine learning models for detecting or identifying events, occurrences or situations in which an operation mode can be switched between a live view mode 182, augmented reality mode 184 and virtual reality mode 186. Context ML models 134 can include neural network models, including RNN, LSTM, GAN, GRU, FNN, convolutional neural network, autoencoder, capsule networks or any other type of machine learning models. Context ML models 134 can utilize any data stream 162 (e.g., sensor data 174, kinematics data 172 or events data 176) to determine or infer an occasion at which a mode of operation can be switched (e.g., between any of the modes 182, 184 or 186).

[0088] Context-identification ML model 134 can be trained to recognize, identify or detect any one or more contexts of an ongoing medical procedure or medical session, based on the data from the data stream 162. Context ML model 134 can uses data stream 162 as input and provide as output a context of the medical procedure. The context can include any feature or characteristic of the medical procedure that can be used to trigger a change in the operating mode between the live view mode 182, AR mode 184 and VR mode 186, or to provide one or more AR objects 142 or VR objects 144 (e.g., individually or as a part of a VR simulation 146) for display.

[0089] The contexts, such as outputs of the context-identification ML model 134, can include any characteristics or features of the medical session or a procedure. For example, context can include recognized or detected phase of a surgery, or a task or a maneuver implemented during a medical procedure, such as an incision, a dissection, an act of imaging or an act of suturing. The context can include identified surgical instruments 112, localizes the anatomical region under operation, categorized procedure type, medical procedure phase, quantified procedural progress or surgical performance. The context can include occurrences or events, such as abnormalities in a medical procedure (e.g., actions that are not identified by the model to be implemented during a given procedure phase or a task), a collision between a medical device (e.g., RMS 120 or its manipulator arm) and another object 106 or a layout of AR features or objects. Context can include detection that a threshold for performance has been reached or exceeded, or a threshold for adherence to safety protocols. Context can include assessment of a team collaboration during a procedure, assessment of patient vital signs, or identification of potential divergence between the established flow of a medical procedure and the detected series of tasks performed by the surgeon. These context categories can be used to provide an understanding of the medical session and can be used as triggers for providing AR objects 142, VR objects 144, VR simulations 146 and switching between any of the modes 182, 184 or 186.

[0090] Object ML models 136, also referred to as object identification ML models 136 can include any machine learning models for detecting or identifying objects 106, AR objects 142 or VR objects 144 based on any data in data streams 162. Object ML models 136 can be combined with, or be used together with, one or more context ML models 134 to determine occurrences involving objects 106, AR objects 142 or VR objects 144 that can be indicative of a situation in which a change in the operating mode (e.g., between 182, 184 or 186) can be implemented or determined. Object ML models 136 can include neural network models, including RNN, LSTM, GAN, GRU, FNN, convolutional neural network, autoencoder, capsule networks or any other type of machine learning models. Object ML models 136 can utilize any data stream 162 (e.g., sensor data 174, kinematics data 172 or events data 176) to detect or recognize any one or more objects 106, AR objects 142 or VR objects 144.

[0091] Indicator function 188 can include any combination of hardware and software for generating indications, alerts, messages or textual output. Indicator function 188 can generate instructions or guidelines to a surgeon to perform a particular task in a particular way or using a particular maneuver. Indicator function 188 can generate and provide one or more indications or alerts, conveying information to users, including options, controls (e.g., AR objects 142 or VR objects 144 providing control buttons for system control via AR control or VR simulated control. Indications can include alerts include notifying users of a system status, alerting on a specific event or change, providing guidance on procedural steps, or signaling the completion of tasks.

[0092] Performance analyzer 190 can include any combination of hardware and software for analyzing the performance of a surgeon with respect to a medical procedure captured by the data streams 162. Performance analyzer 190 can generate metrics 192 for various surgical tasks, phases or procedures, based on the performance of the surgeon with respect to such tasks. Performance analyzer 190 can utilize data streams 162 to determine the performance metrics 192. Performance analyzer can generate scores indicative of the level of confidence in the performance metrics 192. Performance analyzer 190 can generate the performance metrics 192 based on the operation and feedback from context ML models 134 or object ML models 136.

[0093] FIG. 2 depicts a surgical system 200, in accordance with some aspects of the technical solutions. The surgical system 200 can be an example of a medical environment 102. The surgical system 200 may include a robotic medical system 205 (e.g., the robotic medical system 120), a user control system 210, and an auxiliary system 215 communicatively coupled one to another. A visualization tool 220 (e.g., the visualization tool 114) may be connected to the auxiliary system 215, which in turn may be connected to the robotic medical system 205. Thus, when the visualization tool 220 is connected to the auxiliary system 215 and this auxiliary system is connected to the robotic medical system 205, the visualization tool may be considered connected to the robotic medical system. In some embodiments, the visualization tool 220 may additionally or alternatively be directly connected to the robotic medical system 205.

[0094] The surgical system 200 may be used to perform a computer-assisted medical procedure on a patient 225. In some embodiments, surgical team may include a surgeon 230 A and additional medical personnel 230B-230D such as a medical assistant, nurse, and anesthesiologist, and other suitable team members who may assist with the surgical procedure or medical session. The medical session may include the surgical procedure being performed on the patient 225, as well as any pre-operative (e.g., which may include setup of the surgical system 200, including preparation of the patient 225 for the procedure), and post-operative (e.g., which may include clean up or post care of the patient), or other processes during the medical session. Although described in the context of a surgical procedure, the surgical system 200 may be implemented in a non-surgical procedure, or other types of medical procedures or diagnostics that may benefit from the accuracy and convenience of the surgical system.

[0095] The robotic medical system 205 can include a plurality of manipulator arms 23 SA- 235D to which a plurality of medical tools (e.g., the medical tool 112) can be coupled or installed. Each medical tool can be any suitable surgical tool (e.g., a tool having tissueinteraction functions), imaging device (e.g., an endoscope, an ultrasound tool, etc.), sensing instrument (e.g., a force-sensing surgical instrument), diagnostic instrument, or other suitable instrument that can be used for a computer-assisted surgical procedure on the patient 225 (e.g., by being at least partially inserted into the patient and manipulated to perform a computer- assisted surgical procedure on the patient). Although the robotic medical system 205 is shown as including four manipulator arms (e.g., the manipulator arms 235A-235D), in other embodiments, the robotic medical system can include greater than or fewer than four manipulator arms. Further, not all manipulator arms can have a medical tool installed thereto at all times of the medical session. Moreover, in some embodiments, a medical tool installed on a manipulator arm can be replaced with another medical tool as suitable.

[0096] One or more of the manipulator arms 235A-235D and/or the medical tools attached to manipulator arms can include one or more displacement transducers, orientational sensors, positional sensors, and/or other types of sensors and devices to measure parameters and/or generate kinematics information. One or more components of the surgical system 200 can be configured to use the measured parameters and/or the kinematics information to track (e.g., determine poses of) and/or control the medical tools, as well as anything connected to the medical tools and/or the manipulator arms 235A-235D.

[0097] The user control system 210 can be used by the surgeon 230A to control (e.g., move) one or more of the manipulator arms 235A-235D and/or the medical tools connected to the manipulator arms. To facilitate control of the manipulator arms 235A-235D and track progression of the medical session, the user control system 210 can include a display (e.g., the display 116 or 1130) that can provide the surgeon 230A with imagery (e.g., high-definition 3D imagery) of a surgical site associated with the patient 225 as captured by a medical tool (e.g., the medical tool 112, which can be an endoscope) installed to one of the manipulator arms 235A-235D. The user control system 210 can include a stereo viewer having two or more displays where stereoscopic images of a surgical site associated with the patient 225 and generated by a stereoscopic imaging system can be viewed by the surgeon 230 A. In some embodiments, the user control system 210 can also receive images from the auxiliary system 215 and the visualization tool 220.

[0098] The surgeon 230A can use the imagery displayed by the user control system 210 to perform one or more procedures with one or more medical tools attached to the manipulator arms 235A-235D. To facilitate control of the manipulator arms 235A-235D and/or the medical tools installed thereto, the user control system 210 can include a set of controls. These controls can be manipulated by the surgeon 230 A to control movement of the manipulator arms 23 SA- 235D and/or the medical tools installed thereto. The controls can be configured to detect a wide variety of hand, wrist, and finger movements by the surgeon 230 A to allow the surgeon to intuitively perform a procedure on the patient 225 using one or more medical tools installed to the manipulator arms 235A-235D.

[0099] The auxiliary system 215 can include one or more computing devices configured to perform processing operations within the surgical system 200. For example, the one or more computing devices can control and/or coordinate operations performed by various other components (e.g., the robotic medical system 205, the user control system 210) of the surgical system 200. A computing device included in the user control system 210 can transmit instructions to the robotic medical system 205 by way of the one or more computing devices of the auxiliary system 215. The auxiliary system 215 can receive and process image data representative of imagery captured by one or more imaging devices (e.g., medical tools) attached to the robotic medical system 205, as well as other data stream sources received from the visualization tool. For example, one or more image capture devices (e.g., the image capture devices 110) can be located within the surgical system 200. These image capture devices can capture images from various viewpoints within the surgical system 200. These images (e.g., video streams) can be transmitted to the visualization tool 220, which can then passthrough those images to the auxiliary system 215 as a single combined data stream. The auxiliary system 215 can then transmit the single video stream (including any data stream received from the medical tool(s) of the robotic medical system 205) to present on a display (e.g., the display 116) of the user control system 210.

[00100] In some embodiments, the auxiliary system 215 can be configured to present visual content (e.g., the single combined data stream) to other team members (e.g., the medical personnel 230B-230D) who might not have access to the user control system 210. Thus, the auxiliary system 215 can include a display 240 configured to display one or more user interfaces, such as images of the surgical site, information associated with the patient 225 and/or the surgical procedure, and/or any other visual content (e.g., the single combined data stream). In some embodiments, display 240 can be a touchscreen display and/or include other features to allow the medical personnel 230A-230D to interact with the auxiliary system 215.

[00101] The robotic medical system 205, the user control system 210, and the auxiliary system 215 can be communicatively coupled one to another in any suitable manner. For example, in some embodiments, the robotic medical system 205, the user control system 210, and the auxiliary system 215 can be communicatively coupled by way of control lines 245, which can represent any wired or wireless communication link that can serve a particular implementation. Thus, the robotic medical system 205, the user control system 210, and the auxiliary system 215 can each include one or more wired or wireless communication interfaces, such as one or more local area network interfaces, Wi-Fi network interfaces, cellular interfaces, etc. It is to be understood that the surgical system 200 can include other or additional components or elements that can be needed or considered desirable to have for the medical session for which the surgical system is being used.

[00102] FIG. 3 illustrates an example 300 of an extended reality content used to instruct a user to rearrange the locations of one or more objects 106 in a medical environment 102. Example 300 includes a top-down view of an arrangement of a medical environment 102 in which a plurality of objects 106 are arranged to facilitate a medical session, as described in connection with FIG. 2. However, in example 300, a plurality of AR objects 142 are overlaid over the displayed view of the medical environment 102 to provide an instruction to the user to move the auxiliary system 215 from its current location to a new suggested location.

[00103] The AR objects 142 include a first AR object 142 A that corresponds to a first arrow pointing from the current location of the auxiliary system 215 towards the end of a second arrow indicated as an AR object 142B. The second arrow, AR object 142B, continues to point further away from the current location of the auxiliary system 215 towards a rectangular space, marked as AR object 142C. The space marked as the AR object 142C has a footprint that is similar or the same as that of the current location of the auxiliary system 215. The arrows marked as AR objects 142A-B provide a path from the auxiliary system 215, away from the robotic medical system 205, towards the new location (e.g., AR object 142C), such that a collision between the auxiliary system 215 and the robotic medical system 205 is avoided. Accordingly, system 300 illustrates an example of an extended reality content provided using an aspect of the current solutions in which an AR mode 184 can be used to suggest to a user to move an object 106 (e.g., an auxiliary system 215) along a path established by the AR objects 142, so as to avoid a collision with another object 106 (e.g., RMS 205).

[00104] FIG. 4 depicts an example block diagram of an example computer system 400 is shown, in accordance with some embodiments. The computer system 400 can be any computing device used herein and can include or be used to implement a data processing system 130 or its components. The computer system 400 includes at least one bus 405 or other communication component or interface for communicating information between various elements of the computer system. The computer system further includes at least one processor 410 or processing circuit coupled to the bus 405 for processing information. The computer system 400 also includes at least one main memory 415, such as a random-access memory (RAM) or other dynamic storage device, coupled to the bus 405 for storing information, and instructions to be executed by the processor 410. The main memory 415 can be used for storing information during execution of instructions by the processor 410. The computer system 400 can further include at least one read only memory (ROM) 420 or other static storage device coupled to the bus 405 for storing static information and instructions for the processor 410. A storage device 425, such as a solid-state device, magnetic disk or optical disk, can be coupled to the bus 405 to persistently store information and instructions.

[00105] The computer system 400 can be coupled via the bus 405 to a display 430, such as a liquid crystal display, or active-matrix display, for displaying information. An input device 435, such as a keyboard or voice interface can be coupled to the bus 405 for communicating information and commands to the processor 410. The input device 435 can include a touch screen display (e.g., the display 430). The input device 435 can also include a cursor control, such as a mouse, a trackball, or cursor direction keys, for communicating direction information and command selections to the processor 410 and for controlling cursor movement on the display 430.

[00106] The processes, systems and methods described herein can be implemented by the computer system 400 in response to the processor 410 executing an arrangement of instructions contained in the main memory 415. Such instructions can be read into the main memory 415 from another computer-readable medium, such as the storage device 425. Execution of the arrangement of instructions contained in the main memory 415 causes the computer system 400 to perform the illustrative processes described herein. One or more processors in a multiprocessing arrangement can also be employed to execute the instructions contained in the main memory 415. Hard-wired circuitry can be used in place of or in combination with software instructions together with the systems and methods described herein. Systems and methods described herein are not limited to any specific combination of hardware circuitry and software.

[00107] Although an example computing system has been described in FIG. 4, the subject matter including the operations described in this specification can be implemented in other types of digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.

[00108] The technical solutions can be directed to a system 100, which can include any features or characteristics of example systems 100-400. System 100 can include a centralized implementation, such when the system 100 is deployed on a single computing system 400 of a server, or a distributed implementation, such as a Software as a Service function executing the functionalities of the DPS 130 on a cloud. System 100 can include one or more processors 410 coupled with one or more memories 415 and configured to implement any functionalities or characteristics of the technical solutions. For instance, system 100 can include one or more processors 410 that are configured to perform functionalities of the technical solutions by executing instructions stored in the memory (e.g., 415, 420 or 425). System 100 can include a non-transitory computer-readable medium (e.g., 415, 420 or 425) storing processor executable instructions that can be executed implementing various aspects of the technical solutions.

[00109] For instance, when executed by one or more processors 410, the one or more instructions can cause the one or more processors 410 to receive a data stream 162 of a medical session with an RMS 120. The data stream 162 can be captured by one or more sensors 104 in a medical environment 102. The data stream can include any combination of sensor data 174 from any type and form of a sensor 104, kinematics data 172 corresponding to movement of RMS 120 components (e.g., medical instruments 112 handled by arms of the RMS 120), or events data 176 identifying events or occurrences at the RMS 120 or the medical environment 102.

[00110] The one or more instructions can cause the one or more processors 410 to determine a context of the medical session. The one or more processors can determine the context, based on the data stream 162 and via one or more context-identification models (e.g., 134) trained with machine learning. The context can include any characteristic or a feature of a medical session or a medical procedure that can be used to trigger a change in the mode of operation, such as a change between a live view mode 182 into an AR mode 184 and back, or from live view mode 182 to VR mode 186 and back. The context detected by the context-identification models 134 can include a recognized or detected phase of a surgery, or a detected task or a maneuver implemented during a medical procedure, such as an incision, a dissection, an act of imaging or an act of suturing. The context can include identified surgical instruments 112, localizes the anatomical region under operation, categorized procedure type, procedure phase, quantified procedural progress or surgical performance, occurrence of an event, such as identification of an abnormality in a medical procedure. For instance, the context can include an action by a surgeon that is not identified by the context ML model 134 to be included in a given procedure phase or a task. The context can include a detected collision between a medical device (e.g., RMS 120 or its manipulator arm) and another object 106 in the medical environment 102, or a threshold for a possible collision exceeding a threshold. Context can include detection that a performance metric 192 for a surgeon is below a threshold for the performance metrics, or that a threshold for adherence to safety protocols or processes has been exceeded. Context can include assessment of a team collaboration during a procedure, assessment of patient vital signs, or identification of potential divergence between the established flow of a medical procedure and the detected series of tasks performed by the surgeon.

[00111] The one or more instructions can cause the one or more processors 410 to determine, based on the context determined, an extended reality content to be presented to a user. The extended reality content can include one or more AR objects 142, one or more VR objects 144 and one or more VR simulations 146. The extended reality content can be implemented responsive to the mode of operation switching to an augmented reality mode 184 or a virtual reality mode 186. The extended reality content can be provided as an overlay of one or more AR objects 142 displayed over a live-stream or a real-time video feed of a medical environment 102. The one or more AR objects 142 can be provided or displayed within an interaction field 108 that can be displayed within a portion of a display 116.

[00112] The one or more instructions can cause the one or more processors 410 to detect, based on the data stream 162 and via one or more object-identification models (e.g., 136), one or more objects 106 located in the medical environment 102. The one or more objects can include sensors 104, RMS 120, manipulator arms of a surgical robot, medical instruments 112, displays 116, auxiliary systems (e.g., 215), or users (e.g., medical professionals operating within the medical environment 102). Detected objects 106 can include an HMD 122, data capture devices 110 or any other device, system or a component within the medical environment 102.

[00113] The one or more instructions can cause the one or more processors 410 to establish, based on the one or more objects 106 detected, an interaction field 108 in the medical environment 102 in which to interact with the extended reality content. The interaction field 108 can include an area or a volume within which user gestures or selections can be made to AR objects 142 or VR objects 144 displayed. The interaction field 108 can include a portion of the area within a medical environment 102 corresponding to which, or within which, AR objects 142 or VR objects 144 can be overlaid or displayed. The interaction field 108 can be provided on a display 116 on a display screen in a medical environment 102, allowing for the user interaction via a touch screen. The interaction field 108 can be provided on a display 116 of an HMD 122 allowing for a user interaction using eye movements or eye tracking, or via hand gestures.

[00114] The one or more instructions can cause the one or more processors 410 to present the extended reality content (e.g., AR objects 142, VR objects 144 or VR simulations 146) to the user for interaction in the interaction field. For instance, XR content can be provided, overlaid or displayed to the user via a display 116 of an HMD 122 or a desktop display 116 of a medical environment 102. The XR content can be provided in the form of a VR simulation 146 in which the user can practice certain maneuvers or tasks in a VR space (e.g., using a simulated or a virtual RMS in a simulated or virtual medical environment), prior to attempting the performance of such tasks or maneuvers on a patient’s body.

[00115] For instance, the one or more processors 410 can generate, in an augmented reality (AR) mode 184 facilitated or provided in a head-mounted display (HMD) 122 and using one or more locations (e.g., coordinates) of the one or more objects 106, a sterile space within a portion of the interaction field 108. The sterile space can be provided as an overlaid, highlighted, or partially transparent AR object 142 corresponding to a space, volume or an area. The sterile space can be marked with one or more AR objects indicating one or more boundaries of the sterile space. The one or more processors 410 can overlay, within the interaction field presented via the HMD, the one or more AR objects 142 indicative of the one or more boundaries of the sterile space.

[00116] The one or more processors 410 can generate, in a virtual reality (VR) mode 186, a VR simulation 146 of a virtual robotic medical system corresponding to the RMS 120. The virtual robotic medical system can be in a virtual medical environment (e.g., 144) that corresponds to the medical environment 102. A location of the virtual robotic medical system (e.g., 144) within the virtual medical environment (e.g., 144) can correspond to a location of the RMS 120 within the medical environment 102. The one or more processors 410 can display, for selection in the VR mode 186, a virtual control (e.g., VR object 144) to move the virtual robotic medical system within the virtual medical environment. The virtual control (e.g., VR object 144) can correspond to a control (e.g., via a control device for the RMS 120) to move the RMS 120 via a live-view mode 182 of operation in the medical environment 102.

[00117] The one or more processors 410 can generate an augmented reality (AR) object 142 to control a performance of a task of the medical session via the RMS 120 in an AR mode 184 of operation. The one or more processors 410 can receive an input to switch from one of a live- view mode 182 of operation or a virtual reality (VR) mode 186 of operation to the AR mode 184 of operation. The one or more processors 410 can switch, responsive to the input, from the one of the live-view mode 182 or the VR mode 186 to the AR mode 184 to overlay the AR object 142 over the live-view of at least a portion of the medical environment 102 within the interaction field 108.

[00118] The one or more processors 410 can generate a virtual reality (VR) object 144 to control a performance (e.g., 192) of a VR simulation 146 of a task of the medical session in a VR mode 186 of operation using a virtual robotic medical system (e.g., 144) corresponding to the RMS 120 in a virtual medical environment (e.g., 144) corresponding to the medical environment 102. The one or more processors 410 can receive an input to switch from one of a live-view mode 182 of operation or an augmented reality (AR) mode 184 of operation to the VR mode 186 of operation. The one or more processors 410 can utilize smart switching between modes 182, 184 and 186 based on detected events in any one of the modes. For instance, the one or more processors can take action to suspend or stop a motion of a medical instrument 112 in a first of the three modes (e.g., a first one of the live view mode 182, AR mode 184 or VR mode 186) in response to detecting that the medical instrument 112 is on a collision course with another object (e.g., a tool or a system) in another one of the three modes (e.g., a second one of the live view mode 182, AR mode 184 or VR mode 186). The one or more processors 410 can switch, responsive to the input, from the one of the live-view mode 182 or the AR mode 184 to the VR mode 186 to display the simulation in the VR mode 186.

[00119] The one or more processors 410 can receive, from a head-mounted device (HMD) 122 configured to operate in an augmented reality (AR) mode 184 of operation and a virtual reality (VR) mode 186 of operation, a signal corresponding to one of a hand gesture, an eye movement or a voice input. For example, a hand tracker 126 can detect or capture a hand signal or a gesture indicative of a selection of an AR object 142 or a VR object 144. For example, an eye tracker 124 can detect or capture an eye signal or an eye movement indicative of a signal or action, such as a selection of an AR object 142 or a VR object 144. The one or more processors 410 can select, via the HMD 122, responsive to the signal (e.g., detected via eye tracker 124 or hand tracker 126), a task to perform in at least one of the AR mode 184 or the VR mode 186.

[00120] The one or more processors 410 can determine, using one or more depth sensors 104, a location of a medical instrument 112 based at least on one or more locations of the one or more objects 106 in the medical environment 102. The one or more processors 410 can overlay, in an augmented reality (AR) mode 184 of operation, an AR object 142 indicative of an anatomy of a patient. The AR object 142 can be positioned according to the at least one or more locations of the one or more objects 106. For instance, the positioning coordinates (e.g., in x-axis, y-axis or z-axis) of the AR object 142 can be determined based on the positioning coordinates of one or more objects 106, determined based on data of a depth sensor 104. The one or more processors 410 can modify, in the interaction field 108, at least one of the medical instrument 112 or the AR object 142 based at least on a movement of the medical instrument 112 in the interaction field 108.

[00121] The one or more processors 410 can identify, based on at least a portion of the data stream 162 input into the one or more context-identification models (e.g., 134), that the context of the medical session indicates at least one of a location of a patient side cart (e.g., 142) or a location of the RMS 120 (e.g., 142) positioned away from a desired location (e.g., 142) within the medical environment 102. The one or more processors 410 can generate the extended reality content indicative of the desired location (e.g., 142) for the at least one of the patient side cart (e.g., 142) or the RMS 120 within the interaction field 108.

[00122] The one or more processors 410 can identify, based on at least a portion of the data stream 162 input into the one or more context-identification models (e.g., 134), that the context of the medical session is indicative of a desired adjustment to one of tilt, rotation or height of a patient side cart in the medical environment 102. The one or more processors 410 can generate, in the interaction field, the extended reality content comprising an instruction or an indication (e.g., 142) to implement the desired adjustment to the one of tilt, rotation or height of the patient side cart (e.g., 142). [00123] The one or more processors 410 can identify, based at least on one or more portions of the data stream 162 input into the one or more context-identification models (e.g., 134), that the context of the medical session corresponds to a current mode of operation of a first one of a live-view mode 182 of operation, an augmented reality (AR) mode 184 of operation or a virtual reality (VR) mode 186 of operation. The one or more processors 410 can identify based at least one on one or more portions of the data stream 162 input into the one or more contextidentification models (e.g., 134), an event (e.g., context) in a second one of the live-view mode 182, the AR mode 184 or the VR mode 186. The one or more processors 410 can initiate, responsive to the current mode of operation and the event, a switch from the first one of the live-view mode 182, the AR mode 184 or the VR mode 186 to the second one of the live-view mode 182, the AR mode 184 or the VR mode 186.

[00124] The event or the context detected can include one of a failure of a medical instrument, a collision involving a medical instrument 112 or a collision involving an anatomy of a patient (e.g., 142). The one or more processors 410 can identify, based on at least a portion of the data stream 162 input into the one or more context-identification models (e.g., 134), that the context of the medical session is indicative of one of a desired tilt, rotation or height adjustment of a patient side cart (e.g., 142) within the medical environment 102. The one or more processors 410 can generate, in the interaction field, the extended reality content (e.g., 142) indicative of an instruction to adjust the one of the desired tilt, rotation or height adjustment of the patient side cart (e.g., 142).

[00125] The one or more processors 410 can identify, based on at least a portion of the data stream 162, that an account associated with the medical session corresponds to a profile of a user having a level of skill for the medical session that is below a threshold level of skill for the medical session. The one or more processors 410 can generate, in the interaction field responsive to the identification, the extended reality content including a guideline to perform one or more tasks of the medical session.

[00126] The one or more processors 410 can identify, based on at least a portion of the data stream 162 input into the one or more context-identification models (e.g., 134), that a value of a performance indicator (e.g., performance metric 192) of an account associated with the medical session is below a threshold level for the performance indicator (e.g., 194). The one or more processors 410 can generate, in the interaction field 108 responsive to the identification, the extended reality content (e.g., 142, 144) including an instruction to perform one or more or more tasks of the medical session. The one or more processors 410 can identify, based on at least a portion of the data stream 162 input into the one or more context-identification models (e.g., 134), that a likelihood of a fault in the medical session exceeds a threshold (e.g., 194) for a likelihood of fault.

[00127] The one or more processors 410 can activate a clutch for the medical instrument 112, responsive to the identification or detection of a context, such as a context that the user account associated with the current session has never been exposed to a particular medical maneuver. Activating the clutch can disengage the user’s (e.g., surgeon’s) side input controller from the instrument such that further movements made by the user do not result in further physical movements of the instrument. For instance, when a user operates manipulator arms of the RMS 120, DPS 130 can activate the clutch to disconnect the user’s controls from the manipulator arms of the RMS 120, thereby preventing further motion of the arms. By disengaging the controller from the physical instrument, and then switching to a VR mode 186 of operation, the DPS 130 can allow the user to practice a particular maneuver or a task in a VR simulation 146 in which the user can monitor simulated performance metrics 192 for the given task until the user achieves a particular performance metrics 192 score that exceeds the threshold, indicating that the user has become familiar or comfortable with the given maneuver. For instance, the user can practice a given maneuver or a task in a VR mode until the performance metrics 192 for that task satisfy the threshold, after which the system can automatically switch back to the AR mode 184 or a live view mode 182 to re-engage the instrument (e.g., de-activate the clutch). At that point, the user can continue with the task on the patient, having been exposed to it and having received training on it, in a VR environment.

[00128] The one or more processors 410 can provide the interaction field 108 such that the interaction field 108 includes one or more objects presented in the extended reality content. The one or more objects can include an arm of the robotic medical system, a link of two or more parts of the robotic medical system, a joint between two or more parts of the robotic medical system, a clutch feature of the robotic medical system, a medical instrument of the robotic medical system or a patient associated with the medical session. The one or more processors 410 can determine, using one or more depth sensors, a location of the user and overlay, in an augmented reality (AR) mode of operation, an AR object indicative of the user according to the location. The one or more processors 410 can determine, using the one or more contextidentification models, a type of the medical procedure or a type of phase within a medical procedure, and generate, based at least on the type of the medical procedure, at least one of a size or a location of the interaction field. The one or more processors 410 can present the interaction field according to at least one of the size or the location. The one or more processors 410 can detect, using the one or more context-identification models, that a likelihood of a collision in the medical environment exceeds a threshold and present an indication in the interaction field, responsive to the detection.

[00129] Turning now to FIG. 5, an example flow diagram of a method 500 for providing an extended reality contextual switching for a robotic medical system is illustrated. The method 500 can be performed by a system having one or more processors executing computer-readable instructions stored on a memory. The method 500 can be performed, for example, by any combination of features discussed in connection with example systems 100-400 and FIGs. 1-4. For instance, the method 500 can be implemented one or more processors 410 of a computing system 400 executing non-transitory computer-readable instructions stored on a memory (e.g., the memory 415, 420 or 425) and using data from a data repository 160 to implement the functionalities of a system 100.

[00130] The method 500 can be used provide quality control of a plurality of data streams in a robotic system, such as a robotic medical system. At operation 505, the method can receive data. At operation 510, the method can detect context of a medical procedure. At operation 515, the method can determine if the context trigger extended reality (XR) content. At operation 520, the method can establish an interaction field. At operation 525, the method can provide the XR content. At operation 530, the method can receive user selections.

[00131] At operation 505, the method can receive data. The method can include one or more processors coupled with memory receiving a data stream from a robotic medical system or one or more objects in a medical environment. The method can include the one or more processors receiving a data stream of a medical session with a robotic medical system captured by one or more sensors in a medical environment. The method can include the one or more processors receiving a stream of data from one or more depth sensors, motion sensors, cameras or machine learning models trained to provide an output of detected and identified movements or gestures of a robotic medical system. The method can receive the data from the one or more sensors, data capture systems or the robotic medical systems. The data can include any combination of sensor data, events data and kinematics data.

[00132] At operation 510, the method can detect context of a medical procedure. The context of the medical procedure can include a characteristic or a feature of a medical session that can be used to trigger display or providing of the extended reality (XR) content. The context can be detected by the context-identification model, such as a context ML model trained to provide the context as an output responsive to an input data stream including any combination of sensor data, events data and kinematics data. The context can include a feature or a characteristic of a data stream indicative of an event or an occurrence triggering a change in the mode of operation, such as a change between a live view mode, an AR mode and a VR mode of operation.

[00133] The detected context can include a recognized or a detected phase of a surgery, or a detected task or a maneuver implemented during a medical procedure. For example, the context can include a detection or recognition of a surgical maneuver, such as an incision, a dissection, an act of imaging or an act of suturing. The context can include identified surgical instrument, a feature of an anatomical region of a patient in operation, a categorized procedure type, a categorized phase of the procedure, a quantified procedural progress or a surgical performance. The context can include an occurrence of an event or an action, such as identification of an abnormal action or a movement in a medical procedure. For instance, the context can include an action by RMS that is not identified by the context ML model as a part of the procedure phase, or a task determined by the ML model to be performed during the ongoing medical session. The context can include a detected collision between a medical device, such as an RMS or a manipulator arm or a medical instrument and another object in a medical environment. The context can include an exceeded threshold for likelihood of a collision between a medical instrument or a surgical robot and another object. Context can include detection that a performance metric for a medical session is below a threshold for a performance metric determined by an ML model based on the data stream. Context can include a detection that a threshold for adherence to a safety protocol or processes has been exceeded. Context can include an assessment of a team collaboration during a procedure, an assessment of patient vital sign, or an identification of a divergence between the established flow of a medical procedure and the detected series of tasks performed during the medical session.

[00134] The method can include the one or more processors determining, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session and an extended reality content to be presented to a user. The one or more processors can determine, using one or more sensors, such as depth sensors, a location of a medical instrument in a medical environment. The location of the medical instrument can be established based at least on one or more locations of the one or more objects in the medical environment. [00135] The method can include identifying, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session indicates at least one of a location of a patient side cart or a location of the robotic medical system positioned away from a desired location within the medical environment. For instance, the one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of a desired adjustment. The desired adjustment can be any adjustment to any object in a medical environment, such as one of tilt, rotation or height of a patient side cart in the medical environment or a patient table.

[00136] The method can include identifying, based at least on one or more portions of the data stream input into the one or more context-identification models, that the context of the medical session corresponds to a current mode of operation of a first one of a live-view mode of operation, an augmented reality (AR) mode of operation or a virtual reality (VR) mode of operation. For instance, the one or more processors can identify, based on at least a portion of the data stream, that an account associated with the medical session corresponds to a profile of a user having a level of skill for the medical session that is below a threshold level of skill for the medical session. The one or more processors can identify that a value of a performance indicator of an account associated with the medical session is below a threshold level for the performance indicator. Identification can be implemented based on at least a portion of the data stream input into the one or more context-identification models.

[00137] At operation 515, the method can determine if the context trigger extended reality (XR) content. The method can include the one or more processors detecting one or more objects located in the medical environment based on the data stream and via one or more object-identification models. The one or more object-identification models can include object ML models trained to detect, recognize and identify any one or more objects in a medical environment, including a robotic medical system, a manipulator arm, a medical instrument, a patient side cart, a patient table or any other object.

[00138] Data processing system can utilize context-identification ML models to detect the context and determine if the context is to trigger XR content. For instance, a context ML model can continuously identify and monitor various contexts responsive to the data stream periodically input into the context ML model. In response to detecting a context that is configured to trigger an XR content response (e.g., one or more AR objects, VR objects or a VR simulation), the context ML model can send an instruction to generate the XR content. For instance, the context ML model can send an API or a request to a XR environment function to generate one or more AR objects, VR objects or VR simulations, responsive to the detected data.

[00139] The method can include the one or more processors identifying an event in one of a live-view mode, an AR mode or an VR mode. The identification can be implemented based at least on one or more portions of the data stream input into the one or more contextidentification models. The one or more processors can initiate, responsive to the current mode of operation and the event, a switch from the first one of the live-view mode, the AR mode or the VR mode to the second one of the live-view mode, the AR mode or the VR mode.

[00140] Context can include an event that can indicate or include at least one of a failure of a medical instrument, a collision involving a medical instrument or a collision involving an anatomy of a patient. The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of one of a desired tilt, rotation or height adjustment of a patient side cart within the medical environment. The context ML model can, in response to such identifications of contexts, determine, individually or in conjunction with a XR environment function, to generate XR content (e.g., one or more AR objects or VR objects).

[00141] The one or more processors can identify, based on at least a portion of the data stream input into the one or more context-identification models, that a likelihood of a fault in the medical session exceeds a threshold for a likelihood of fault. The one or more processors can activate a clutch for the medical instrument, responsive to the identification or send an AR object indicative of an alert or an indication to utilize a clutch.

[00142] At operation 520, the method can establish an interaction field. The one or more processors can establish or provide a space, volume or an area of a display that can correspond to a portion of the medical environment in which extended reality (XR) content can be provided. For example, the method can provide the interaction field as a portion of a display corresponding to a portion of a user interface in which the user can interact with extended reality content, such as based on detected objects. The method can establish an interaction field having an area, space or a volume in which user gestures, eye motions, or any other actions can be used for selection and interaction with the user. The interaction field can include a space in which AR objects, VR objects or VR simulations can be displayed or provided. Interaction field can be displayed or shown on a display in the medical environment and facilitate user interaction via a touch screen or using eye movements or hand tracking. Interaction field can be provided in a portion of live a video feed corresponding to an area of a medical environment.

[00143] The method can include the one or more processors establishing an interaction field in the medical environment in which to interact with the extended reality content, based on the one or more objects detected or in the medical environment. For instance, the interaction field can be generated based at least one an established sterility zone overlaying a portion of the medical environment in which sterility is to be maintained during the medical session. The one or more processors can generate the extended reality content indicative of the desired location for an object, such as a patient side cart, patient table, a robotic medical system, a sensor, a display or any other object within, or relative to, the interaction field or the medical environment displayed on a display screen.

[00144] At operation 525, the method can provide the XR content. The method can include the one or more processors generating and providing for display any type and form of XR content, including any AR content, VR content or combination of AR and VR content. The one or more processors can generate and provide VR simulation content, including simulated or a virtual medical environment (e.g., operation room), a simulated or a virtual RMS, medical instrument, patient or any other feature of the medical session. XR content can be provided responsive to a particular type of context. For example, when the context indicates that a user diverges from a medical procedure, the XR content can address the divergence between the tasks implemented in the medical session and the procedural flow for the medical session. For example, when the context indicates that a likelihood of a collision with a portion of the RMS is greater than a threshold, the XR content can correspond to the likelihood of the collision.

[00145] The one or more processors can generate and provide, in the interaction field responsive to the identification of the context or one or more objects, the extended reality content including an instruction to perform one or more or more tasks of the medical session. For example, the one or more processors can generate, in an augmented reality (AR) mode of a head-mounted display (HMD) and using one or more locations of the one or more objects, a sterile space within a portion of the interaction field. The sterile space can be marked with one or more AR objects indicating one or more boundaries of the sterile space. The one or more processors can overlay, within the interaction field presented via the HMD, the one or more AR objects indicative of the one or more boundaries of the sterile space. [00146] The method can include generating, in a virtual reality (VR) mode, a simulation of a virtual robotic medical system corresponding to the robotic medical system. The virtual robotic medical system can be simulated or illustrated in a virtual medical environment that can correspond to the medical environment. A location of the virtual robotic medical system within the virtual medical environment can correspond to a location of the robotic medical system within the medical environment. The locations of other objects of the medical environment can correspond to locations of virtual or simulated objects within the virtual medical environment.

[00147] The method can include the one or more processors presenting the extended reality content to the user for interaction in the interaction field. The one or more processors can generate an augmented reality (AR) object to control a performance of a task of the medical session via the robotic medical system in an AR mode of operation. The one or more processors can generate a virtual reality (VR) object to control a performance of a simulation of a task of the medical session in a VR mode of operation using a virtual robotic medical system corresponding to the robotic medical system in a virtual medical environment corresponding to the medical environment.

[00148] The one or more processors can overlay, in an augmented reality (AR) mode of operation, an AR object indicative of an anatomy of a patient. The AR object can be positioned according to the at least one or more locations of the one or more objects. The one or more processors can modify, in the interaction field, at least one of the medical instrument or the AR object based at least on a movement of the medical instrument in the interaction field.

[00149] The method can include the one or more processors generating, in the interaction field, the extended reality content comprising an instruction to implement the desired adjustment to the one of tilt, rotation or height of the patient side cart. The one or more processors can generate, in the interaction field, the extended reality content indicative of an instruction to adjust the one of the desired tilt, rotation or height adjustment of the patient side cart. The one or more processors can generate, in the interaction field responsive to the identification of a context or an object, the extended reality content including a guideline to perform one or more tasks of the medical session.

[00150] At operation 530, the method can receive user selections. The method can include the one or more processors receiving inputs from the XR content in the interaction field, responsive to selections by the user. For example, the one or more processors can display, for selection in the VR mode, a virtual control to move the virtual robotic medical system within the virtual medical environment. The virtual control can correspond to a control to move the robotic medical system via a live-view mode of operation in the medical environment. The data processing system can receive the signals corresponding to selections of the virtual control by the user, such as via hand gestures, touch screen selections or eye movements.

[00151] The one or more processors can receive an input to switch from one of a live-view mode of operation or a virtual reality (VR) mode of operation to the AR mode of operation. The one or more processors can switch, responsive to the input, from the one of the live-view mode or the VR mode to the AR mode to overlay the AR object over the live-view of at least a portion of the medical environment within the interactive field. The one or more processors can receive an input to switch from one of a live-view mode of operation or an augmented reality (AR) mode of operation to the VR mode of operation. The one or more processors can switch, responsive to the input, from the one of the live-view mode or the AR mode to the VR mode to display the simulation in the VR mode. The one or more processors can receive, from a headmounted device (HMD) configured to operate in an augmented reality (AR) mode of operation and a virtual reality (VR) mode of operation, a signal corresponding to one of a hand gesture, an eye movement or a voice input. The one or more processors can select, via the HMD, responsive to the signal, a task to perform in at least one of the AR mode or the VR mode.

[00152] An aspect of the technical solutions is directed to a system that can be run or executed on a Head-Mounted Device (HMD) that can be configured to operate in an augmented reality (AR) mode or a virtual reality (VR) mode, which can be collectively referred to as a mixed reality (XR) mode. The HMD can use information gathered from a surgical robot, including data from various sensors providing measurements from the medical environment of an operating room or from the HMD, to facilitate just-in-time guidance to the user (e.g., surgeon performing a medical operation) as well as transitions between live-view and XR (e.g., AR or VR) modes of operation.

[00153] Setting up robotic surgery can involve a specialized skill and knowledge of the specific surgical robot and the surgical operations to be performed. As a result, it may be challenging for the users in the medical environment (e.g., the operating room) to have all the knowledge or skills to consistently and repeatedly set up the surgical medical system and address every fault or emergency. It may also be challenging and undesirable to build a system for the surgical robot to set up itself autonomously as the surgical medical system may be missing context that a human operator would be aware of during the setup. [00154] The technical solutions can provide context switching utilizing both VR and AR functionalities to provide guidance and support to the user, such as a surgeon performing a medical operation during the medical session. The technical solutions can include pass-through optical devices and facilitate a live-view operation through one or more views of the operating environment via one or more video cameras. The technical solutions can implement a natural context switching, including providing a most suitable procedure or a flow of tasks to perform during a medical session, highlight or display features or objects that are obstructed or difficult to see in a live-view mode of operation, including objects or features (e.g., anatomies or implants) inside of a patient’s anatomy.

[00155] The technical solutions can identify, mark and facilitate maintaining of a sterile field. The technical solutions can include a sandbox environment in which the user (e.g., surgeon) can practice a maneuver or a task in a virtual environment prior to it being performed in a medical environment. Based on the performance indicators determined by one or more context-identification ML models, the user can determine to attempt the maneuver or the task on the patient.

[00156] The AR and VR modes of operation can be linked in terms of their content. For instance, the VR simulation can include objects, features and positioning of medical instruments according to their state in the live-view mode or the AR mode of operation. The VR and AR modes can identify sterile regions of an interaction field having objects that can be handled to maintain sterility as well as identify objects which cannot be handled. For instance, an AR mode can be triggered to provide a display with an indication or a warning that a medical instrument is at a risk of contacting an object that is not sterile (e.g., outside of the sterile region). The technical solutions can allow the modes of operation to switch between the live-view mode of operation (e.g., real-time video view absent any AR or VR objects), AR mode of operation and VR mode of operation. For example, a manipulating arm of a robotic medical system (e.g., surgical robot) can be used to manipulate a medical instrument in a medical environment (e.g., the real world). The technical solutions can utilize data stream corresponding to the manipulating arm movement and simulate the same movement (e.g., same direction and same positioning) of a virtual manipulating arm in a VR mode. For example, a movement of a virtual manipulating arm in a VR mode can be implemented in the medical environment.

[00157] For example, technical solutions can facilitate training of one or more maneuvers or tasks for users using a simulated medical environment in a VR mode or using one or more AR objects in an AR mode of operation. Users, such as surgeons, first assistants to surgeons, second assistant to surgeons, anesthesiologists, nurses and others, can utilize training functions for implementing tasks or maneuvers for various phases or portions of medical procedures in AR or VR modes.

[00158] The technical solutions can include an application executed on a XR-configured HMD receiving and using sensor data gathered from a surgical robot. Sensor data can include data from depth sensors positioned in various locations of the medical environment (e.g., operating room) allowing the HMD to monitor and understand the medical environment and the state of all objects (e.g., medical instruments, data capture devices, visualization tools, displays or the RMSs) in the medical environment. The application can assist the user of the RMS, provide indications, alerts and suggestions regarding various tasks or maneuvers and assist the user, including via just-in-time guidance to facilitate improved user performance according to the best practices.

[00159] For example, sensors can take data on the location, position, size and shape of the patient body, and use such data to generate AR objects of patient’s anatomy for the purposes of overlaying AR objects in the AR mode during the medical session. If a surgical session corresponds to a procedure on a respiratory portion of the patient’s anatomy, the sensor data can focus on gathering data corresponding to the respiratory data to increase the accuracy of the size, shape and location of the AR objects corresponding to the respiratory anatomy of the patient to be presented to the user in the XR mode.

[00160] A user can provide inputs, such as inputs or data indicative of the type of medical session implemented. The ML models of the DPS can identify the workflow or rate of tasks and maneuvers to be performed, allowing for the ML models to identify and monitor (e.g., using the sensor data) the type of procedure or task being performed. The application of the HMD can utilize this data to identify one or more objects that can be used to orient the location of the HMD and other objects in the medical environment, such as medical instruments, RMS or patient side carts. Using this data, a simulator can implement a virtual medical environment in a VR mode, creating a digital twin of the medical environment. The simulator can be used to provide user training in various training scenarios for various medical maneuvers or tasks in various simulated medical sessions. Tasks performed or practiced in a VR mode simulator can be then translated into AR mode to guide the user using AR objects and step-by-step instructions to perform the tasks in the order in which they are simulated in the VR simulator. [00161] The technical solutions can facilitate prevention of faults, such as accidental collisions. Based on a simulation of the virtual medical environment and given the depth sensor measured or determined locations of the RMS, manipulator arms or medical instruments and a determined range of these components with respect to other objects in the medical environment, the HMD application can determine that there is a danger of a collision. In response to such a determination, the HMD application can provide an indication in the AR mode to warn the user of the potential collision. VR simulation of a task or a maneuver can be displayed one on or more monitors or displays, such as a 3D display, indicating to the user how to perform a particular task.

[00162] The technical solutions can include an AR mode that can provide digital information overlay ed on top of real -world objects, showing the user which buttons to press, or what path to take to implement a task, such as surgical maneuver, implementation of a setup of the RMS, or a resolution of a fault. A VR mode can provide an interactable virtual simulation of the robotic medical system in the same position and state as the RMS in the medical environment This virtual RMS can be placed within the virtual space where the user can use the virtual controls and move around the virtual RMS without reaching above or near the physical patient and without breaking sterility.

[00163] Using Sensors in the operating room, the system can monitor and keep track of locations of objects in the room. The location of the sterile field can be marked in the system relative to these key objects. By using the sensors on the HMD, the system can maintain the state of the location of the user relative to at least one of these key objects. Using this information in VR, the limits of the sterile field can be represented as virtual walls, preventing the user from breaking sterility while in the VR mode.

[00164] Using the knowledge of the preferred medical session flow in the medical environment and based on the position of the location of the objects in the medical environment, as well as the current state of the PSC (e.g., the patient), the system can present an AR overlay guiding the user through the tasks (e.g., RMS setup or surgical tasks).

[00165] For instance, the application at the HMD can facilitate a setup guidance using the AR or VR mode of the operation. For example, AR mode can include a context to activate an AR mode when one or more actions or events take places, such as when a patient side cart (PCS) has not been deployed or when it is in an incorrect position. [00166] An AR overlay can show the path the PSC can travel to reach its desired destination (e.g., patient table). This can be shown as a line with arrows on the ground to indicate travel direction. A box representing the final position of the PSC can be shown at the end of the line. The final position can be such that the PSC is the correct distance from the patient table after PSC deployment. This operation can be deactivated when the PSC is at this indicated final position. The operation can be deactivated or a motion can be stopped when the context is changed between different modes (e.g., live view, AR or VR modes), such as when a collision is detected. This path can be generated based on one or more factors, such as the current position and direction of the PSC, the current position and direction of the patient’s table, the turning radius of the PSC and the size and location of other obstacles in the room.

[00167] AR overlay can show the final deployment position of the PSC for the current procedure and the current patient’s position. This can allow the user (e.g., surgeon) to observe the digital PSC from any angle by walking around the PSC. This can assist the user to determine, using the AR mode, if the PSC or another object will collide with another object during the deployment, if the PSC or another object will have enough space for staff to work around the patient and if the PSC position allows for a sufficient amount of space for table motion.

[00168] AR overlay can facilitate the setup guidance. For instance, the HMD application can determine that the PSC has not been docked and that the user is near the patient table. AR overlay can show a recommended first port placement position for the procedure, inducting for example, the representation of the cannula in the body. The overlay can conform to the patient’s body shape by using SLAM to identify the shape of the patient body. An overlay of a human anatomy can be overlay ed over the patient’s body. This anatomy overlay can be shown as transparent, or semi-transparent, around the user’s hands to facilitate the user having a clear view of the patient where they are working.

[00169] This will assist the user by showing the user the recommended first port placement position based on best practices. A digital anatomical overlay can help assist with a first port placement without covering the user’s working area. This can allow the user to make an informed decision on which cannula to use based on the digital representation of the cannula, before the first port placement. This can allow the user to visualize the placement of other ports based on the first port placement. [00170] AR overlay can show a recommended first port placement, inducting the representation of the cannula in the body. The overlay can conform to the patient’s body shape by using SLAM to identify the patient’s shape. A patient’s anatomy can be overlay ed over the patient’s body. This anatomy overlay can become transparent or semi-transparent around the user’s hands to ensure the user has a clear view of the patient where they are working.

[00171] The technical solutions can provide a digital anatomical overlay to help assist with first port placement without covering the user’ s working area. This can allow the user to make an informed decision on which cannula to use based on the digital representation of the cannula, before the first port placement. This can allow the user to visualize the placement of other ports based on the first port placement.

[00172] The technical solutions can provide a guidance for revolving collisions using a VR mode. This mode can be activated when the PSC encounters an interarm collision with installed instruments, and no other system faults are detected. Deactivation can occur once the interarm collision is resolved on the PSC, such that the arms won't self-collide within the current working area. VR mode can be used by the user to create an environment for experimentation or practice in resolving collision scenarios. For example, a user can assess a potential arm collision by testing the range of motion of instruments.

[00173] During a medical session, when an arm collision occurs, it can be a challenge to identify a solution to prevent further similar incidents. The technical solutions can identify such collisions using events data. The technical solutions can prompt the user to enter a VR mode. In the VR mode, the user can address the collision and test the instrument’s motion to reduce the likelihood of future collisions. Once collisions are resolved in the virtual environment, the system can transition to an AR mode. The AR mode can provide step-by-step instructions on implementing the solution developed in the virtual environment to address the collision.

[00174] The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are illustrative, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable,” to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable or physically interacting components or wirelessly interactable or wirelessly interacting components or logically interacting or logically interactable components.

[00175] With respect to the use of plural or singular terms herein, those having skill in the art can translate from the plural to the singular or from the singular to the plural as is appropriate to the context or application. The various singular/plural permutations can be expressly set forth herein for sake of clarity.

[00176] It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.).

[00177] Although the figures and description can illustrate a specific order of method steps, the order of such steps can differ from what is depicted and described, unless specified differently above. Also, two or more steps can be performed concurrently or with partial concurrence, unless specified differently above. Such variation can depend, for example, on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations of the described methods can be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps, and decision steps.

[00178] It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation, no such intent is present. For example, as an aid to understanding, the following appended claims can contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations).

[00179] Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general, such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”

[00180] Further, unless otherwise noted, the use of the words “approximate,” “about,” “around,” “substantially,” etc., mean plus or minus ten percent.

[00181] The foregoing description of illustrative implementations has been presented for purposes of illustration and of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or can be acquired from practice of the disclosed implementations. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims

CLAIMS What is claimed is:

1. A system, comprising: one or more processors, coupled with memory, to: receive a data stream of a medical session with a robotic medical system, the data stream captured by one or more sensors in a medical environment; determine, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session; determine, based on the context determined, an extended reality content to be presented to a user; detect, based on the data stream and via one or more object-identification models, one or more objects located in the medical environment; establish, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content; and present the extended reality content to the user for interaction in the interaction field.

2. The system of claim 1, wherein the interaction field includes the one or more objects presented in the extended reality content.

3. The system of claim 2, wherein the one or more objects include at least one of an arm of the robotic medical system, a link of two or more parts of the robotic medical system, a joint between two or more parts of the robotic medical system, a clutch feature of the robotic medical system, a medical instrument of the robotic medical system or a patient associated with the medical session.

4. The system of claim 1, comprising the one or more processors to: determine, using one or more depth sensors, a location of the user; and overlay, in an augmented reality (AR) mode of operation, an AR object indicative of the user according to the location.

5. The system of claim 1, comprising the one or more processors to: determine, using the one or more context-identification models, a type of medical procedure; generate, based at least on the type of medical procedure, at least one of a size or a location of the interaction field; and present the interaction field according to at least one of the size or the location.

6. The system of claim 1, comprising the one or more processors to: detect, using the one or more context-identification models, that a likelihood of a collision in the medical environment exceeds a threshold; and present an indication in the interaction field, responsive to detecting that a likelihood of a collision in the medical environment exceeds a threshold.

7. The system of claim 1, comprising the one or more processors to: generate, in an augmented reality (AR) mode of a head-mounted display (HMD) and using one or more locations of the one or more objects, a sterile space within a portion of the interaction field, the sterile space marked with one or more AR objects indicating one or more boundaries of the sterile space; and overlay, within the interaction field presented via the HMD, the one or more AR objects indicative of the one or more boundaries of the sterile space.

8. The system of claim 1, comprising the one or more processors to: generate, in a virtual reality (VR) mode, a simulation of a virtual robotic medical system corresponding to the robotic medical system in a virtual medical environment corresponding to the medical environment, wherein a location of the virtual robotic medical system within the virtual medical environment corresponds to a location of the robotic medical system within the medical environment; and display, for selection in the VR mode, a virtual control to move the virtual robotic medical system within the virtual medical environment, the virtual control corresponding to a control to move the robotic medical system via a live-view mode of operation in the medical environment.

9. The system of claim 1, comprising the one or more processors to: generate an augmented reality (AR) object to control a performance of a task of the medical session via the robotic medical system in an AR mode of operation; receive an input to switch from one of a live-view mode of operation or a virtual reality (VR) mode of operation to the AR mode of operation; and switch, responsive to the input, from the one of the live-view mode or the VR mode to the AR mode to overlay the AR object over the live-view of at least a portion of the medical environment within the interactive field.

10. The system of claim 1, comprising the one or more processors to: generate a virtual reality (VR) object to control a performance of a simulation of a task of the medical session in a VR mode of operation using a virtual robotic medical system corresponding to the robotic medical system in a virtual medical environment corresponding to the medical environment; receive an input to switch from one of a live-view mode of operation or an augmented reality (AR) mode of operation to the VR mode of operation; and switch, responsive to the input, from the one of the live-view mode or the AR mode to the VR mode to display the simulation in the VR mode.

11. The system of claim 1, comprising the one or more processors to: receive, from a head-mounted device (HMD) configured to operate in an augmented reality (AR) mode of operation and a virtual reality (VR) mode of operation, a signal corresponding to one of a hand gesture, an eye movement or a voice input; and select, via the HMD, responsive to the signal, a task to perform in at least one of the AR mode or the VR mode.

12. The system of claim 1, comprising the one or more processors to: identify a signal indicative of a movement of a portion of a body of the user in a live- view mode; and update a simulation in a virtual reality (VR) mode of the one or more objects based at least on the movement.

13. The system of claim 1, comprising the one or more processors to: identify a signal indicative of a movement of a portion of a body of the user in a live- view mode; and update a representation of the one or more objects in an augmented reality (AR) mode based at least one the movement.

14. The system of claim 1, comprising the one or more processors to: determine, using one or more depth sensors, a location of a medical instrument based on at least one or more locations of the one or more objects in the medical environment; overlay, in an augmented reality (AR) mode of operation, an AR object indicative of an anatomy of a patient, the AR object positioned according to the at least one or more locations of the one or more objects; and modify, in the interaction field, at least one of the medical instrument or the AR object based at least on a movement of the medical instrument in the interaction field.

15. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session indicates at least one of a location of a patient side cart or a location of the robotic medical system positioned away from a desired location within the medical environment; and generate the extended reality content indicative of the desired location for the at least one of the patient side cart or the robotic medical system within the interaction field.

16. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of a desired adjustment to one of tilt, rotation or height of a patient side cart in the medical environment; and generate, in the interaction field, the extended reality content comprising an instruction to implement the desired adjustment to the one of tilt, rotation or height of the patient side cart.

17. The system of claim 1, comprising the one or more processors to: identify, based at least on one or more portions of the data stream input into the one or more context-identification models, that the context of the medical session corresponds to a current mode of operation of a first one of a live-view mode of operation, an augmented reality (AR) mode of operation or a virtual reality (VR) mode of operation; identify based at least one on one or more portions of the data stream input into the one or more context-identification models, an event in a second one of the live-view mode, the AR mode or the VR mode; and initiate, responsive to the current mode of operation and the event, a switch from the first one of the live-view mode, the AR mode or the VR mode to the second one of the live- view mode, the AR mode or the VR mode.

18. The system of claim 17, wherein the event includes one of a failure of a medical instrument or a collision involving a medical instrument or a collision involving an anatomy of a patient.

19. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream input into the one or more context-identification models, that the context of the medical session is indicative of one of a desired tilt, rotation or height adjustment of a patient side cart within the medical environment; and generate, in the interaction field, the extended reality content indicative of an instruction to adjust the one of the desired tilt, rotation or height adjustment of the patient side cart.

20. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream, that an account associated with the medical session corresponds to a profile of a user having a level of skill for the medical session that is below a threshold level of skill for the medical session; and generate, in the interaction field responsive to the identification of the account, the extended reality content including a guideline to perform one or more tasks of the medical session.

21. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream input into the one or more context-identification models, that a value of a performance indicator of an account associated with the medical session is below a threshold level for the performance indicator; and generate, in the interaction field responsive to the identification, the extended reality content including an instruction to perform one or more or more tasks of the medical session.

22. The system of claim 1, comprising the one or more processors to: identify, based on at least a portion of the data stream input into the one or more context-identification models, that a likelihood of a fault in the medical session exceeds a threshold for a likelihood of fault; and activate a clutch for a medical instrument, responsive to the identification of the likelihood, that a likelihood of a fault in the medical session exceeds a threshold for a likelihood of fault.

23. The system of claim 1, comprising the one or more processors to: determine, using the extended reality content, that a likelihood of a collision involving at least a portion of the robotic medical system in the medical environment exceeds a threshold; and take action to prevent the collisions responsive to the likelihood exceeding the threshold.

24. The system of claim 23, wherein the action includes engaging a clutch of the robotic medical system to stop a movement of the at least the portion of the robotic medical system.

25. A method, comprising: receiving, by one or more processors coupled with memory, a data stream of a medical session with a robotic medical system captured by one or more sensors in a medical environment; determining, by the one or more processors, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session and an extended reality content to be presented to a user; detecting, by the one or more processors, based on the data stream and via one or more object-identification models, one or more objects located in the medical environment; establishing, by the one or more processors, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content; and presenting, by the one or more processors, the extended reality content to the user for interaction in the interaction field.

26. The method of claim 25, comprising: generating, by the one or more processors, an augmented reality (AR) object to control a performance of a task of the medical session via the robotic medical system in an AR mode of operation; receiving, by the one or more processors, an input to switch from one of a live-view mode of operation or a virtual reality (VR) mode of operation to the AR mode of operation; and switching, by the one or more processors, responsive to the input, from the one of the live-view mode or the VR mode to the AR mode to overlay the AR object over the live-view of at least a portion of the medical environment within the interactive field.

27. The method of claim 25, comprising: generating, by the one or more processors, a virtual reality (VR) object to control a performance of a simulation of a task of the medical session in a VR mode of operation using a virtual robotic medical system corresponding to the robotic medical system in a virtual medical environment corresponding to the medical environment.

28. The method of claim 25, comprising: receiving, by the one or more processors, an input to switch from one of a live-view mode of operation or an augmented reality (AR) mode of operation to the virtual reality (VR) mode of operation; and switching, by the one or more processors, responsive to the input, from the one of the live-view mode or the AR mode to the VR mode to display the simulation in the VR mode.

29. The method of claim 25, comprising: determining, by the one or more processors, using one or more depth sensors, a location of a medical instrument based at least on one or more locations of the one or more objects in the medical environment.

30. The method of claim 25, comprising: overlaying, by the one or more processors, in an augmented reality (AR) mode of operation, an AR object indicative of an anatomy of a patient, the AR object positioned according to at least one or more locations of the one or more objects; and modifying, by the one or more processors, in the interaction field, at least one of a medical instrument or the AR object based at least on a movement of the medical instrument in the interaction field.

31. A non-transitory computer-readable medium storing processor executable instructions, that when executed by one or more processors, cause the one or more processors to: receive a data stream of a medical session with a robotic medical system, the data stream captured by one or more sensors in a medical environment; determine, based on the data stream and via one or more context-identification models trained with machine learning, a context of the medical session; determine, based on the context determined, an extended reality content to be presented to a user; detect, based on the data stream and via one or more object-identification models, one or more objects located in the medical environment; establish, based on the one or more objects detected, an interaction field in the medical environment in which to interact with the extended reality content; and present the extended reality content to the user for interaction in the interaction field.