HK1175002B

HK1175002B - Shared memory between child and parent partitions

Info

Publication number: HK1175002B
Application number: HK13102141.9A
Authority: HK
Inventors: B．S．波斯特; E．考克斯
Original assignee: 微软技术许可有限责任公司
Priority date: 2010-09-30
Filing date: 2013-02-20
Publication date: 2016-08-19

Description

Shared memory between child and parent partitions

Technical Field

The invention relates to a shared memory between a child partition and a parent partition.

Background

One increasingly popular form of networking is generally referred to as telepresence systems, which can use protocols such as Remote Desktop Protocol (RDP) and Independent Computing Architecture (ICA) to share desktops and other applications executing on servers with remote clients. Such computing systems typically communicate keyboard presses and mouse clicks or selections from the client to the server, relaying back screen updates in the other direction over a network connection (e.g., the internet). Thus, when only screenshots of a desktop or application as appearing on the server side are actually sent to the client device, the user has the experience as if his or her machine is operating completely locally.

User graphics and video may be rendered at the server for each user. The resulting bitmap is then sent to the client for display and interaction. In some systems, a graphics accelerator (such as a GPU) may also be virtualized. For example, rather than modeling a complete hardware GPU, the GPU may be virtualized and thereby provide an abstract software-only GPU that presents a different software interface than that of the underlying hardware. By providing a virtualized GPU, the virtual machine may enable a rich user experience with, for example, accelerated 3D rendering and multimedia without the need to associate the virtual machine with a particular GPU product.

In some cases, a virtualized device, such as a virtualized GPU, on a child partition may transfer large amounts of data to a main partition in order to emulate a video-capable card. The transmission of such large amounts of data may burden system design and performance due to limitations of standard virtual machine bus mechanisms.

Disclosure of Invention

In various embodiments, memory may be allocated in a child partition, and the mapping may be created using a virtualization system API. The mapping may then be transferred to the primary partition, where additional virtualization system APIs may facilitate mapping memory to the user space of the primary partition. Additional synchronization mechanisms and read/write APIs may allow applications on both the child and parent partitions to read data in the shared memory. Furthermore, the ability to map any region of memory from a child partition to a parent partition, whether kernel or user space, may also be provided.

Drawings

The systems, methods, and computer-readable media for transferring data between virtual machine partitions according to the present specification are further described with reference to the accompanying drawings in which:

FIGS. 1 and 2 depict example computer systems in which aspects of the present invention may be implemented.

FIG. 3 depicts an operational environment for practicing aspects of the present disclosure.

FIG. 4 depicts an operational environment for practicing aspects of the present disclosure.

FIG. 5 illustrates a computer system including circuitry for implementing remote desktop services.

FIG. 6 illustrates a computer system including circuitry for implementing remote services.

FIG. 7 illustrates an exemplary abstraction layer for a virtualized GPU.

FIG. 8 illustrates an example architecture that incorporates aspects of the methods disclosed herein.

FIG. 9 illustrates an example architecture that incorporates aspects of the methods disclosed herein.

FIG. 10 illustrates an example architecture that incorporates aspects of the methods disclosed herein.

FIG. 11 illustrates an example of an operational procedure for transferring data between virtual machine partitions.

FIG. 12 illustrates an example system for transferring data between virtual machine partitions.

Detailed Description

Generalized computing environment

Certain specific details are set forth in the following description and figures to provide a thorough understanding of various embodiments of the invention. Certain well-known details often associated with computing and software technology are not set forth in the following disclosure to avoid unnecessarily obscuring the various embodiments of the invention. Furthermore, those of ordinary skill in the relevant art will understand that they can practice other embodiments of the invention without one or more of the details described below. Finally, although the various methods are described in the following disclosure with reference to steps and sequences, the description as such is for providing a clear implementation of embodiments of the invention, and the steps and sequences of steps should not be taken as required to practice this invention.

It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may implement or utilize the processes described in connection with the present invention, e.g., through the use of an API, reusable controls, or the like. Such programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can also be implemented in assembly or machine language, if desired. In either case, the language may be a compiled or interpreted language, and combined with hardware implementations.

A remote desktop system is a computer system that maintains applications that are remotely executable by a client computer system. The input is input at the client computer system and communicated over a network (e.g., using a protocol based on the International Telecommunications Union (ITU) t.120 family of protocols, such as the Remote Desktop Protocol (RDP)) to an application on a terminal server. The application processes the input as if it were entered at the terminal server. The application generates output in response to the received input and transmits the output to the client over the network.

Embodiments may execute on one or more computers. FIGS. 1 and 2 and the following discussion are intended to provide a brief general description of a suitable computing environment in which the invention may be implemented. Those skilled in the art will appreciate that the computer systems 200, 300 may have some or all of the components described with reference to the computer 100 of fig. 1 and 2.

The term circuitry used throughout the invention may include hardware components such as hardware interrupt controllers, hard drives, network adapters, graphics processors, hardware-based video/audio codecs, and the firmware/software used to operate such hardware. The term circuitry may also include a microprocessor, or one or more logical processors, e.g., one or more cores of a multi-core general processing unit, configured to perform functions in a particular manner, either through firmware or through a set of switches. The logical processor in this example may be configured by software instructions embodying logic operable to perform functions loaded from memory, e.g., RAM, ROM, firmware, and/or virtual memory. In an example embodiment where circuitry includes a combination of hardware and software, an implementer may write source code embodying logic that is subsequently compiled into machine readable code that can be executed by a logical processor. Because those skilled in the art will appreciate that the state of the art has evolved to the point where there is little difference between hardware, software, or a combination of hardware/software, the selection of hardware versus software to implement functionality is merely a design choice. Thus, since one skilled in the art can appreciate that a software process can be transformed into an equivalent hardware structure, and a hardware structure itself can be transformed into an equivalent software process, the choice of a hardware implementation or a software implementation is trivial and left to the implementer.

FIG. 1 depicts an example of a computing system configured in accordance with aspects of the present invention. The computing system may include, among other things, a computer 20, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes Read Only Memory (ROM)24 and Random Access Memory (RAM) 25. A basic input/output system 26(BIOS), containing the basic routines that help to transfer information between elements within the computer 20, such as during start-up, is stored in ROM 24. The computer 20 may also include a hard disk drive 27 for reading from and writing to a hard disk (not shown), a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CDROM or other optical media. In some example embodiments, computer executable instructions to implement aspects of the present invention may be stored in ROM24, a hard disk (not shown), RAM25, removable magnetic disk 29, optical disk 31, and/or cache memory of processing unit 21. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computer 20. Although the environment described herein employs a hard disk, a removable magnetic disk 29, and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, Random Access Memories (RAMs), Read Only Memories (ROMs), and the like, may also be used in the operating environment.

A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM24, and/or RAM25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may enter commands and information into the computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port, Universal Serial Bus (USB). A display 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the display 47, computers typically include other peripheral output devices (not shown), such as speakers and printers. The system of FIG. 1 also includes a host adapter 55, Small Computer System Interface (SCSI) bus 56, and an external storage device 62 connected to the SCSI bus 56.

The computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another computer, a server, a router, a network PC, a peer device or other common network node, a virtual machine, and typically includes many or all of the elements described above relative to the computer 20, although only a memory storage device 50 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 can include a Local Area Network (LAN)51 and a Wide Area Network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 20 is connected to the LAN51 through a network interface or adapter 53. When used in a WAN networking environment, the computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, may be connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are examples and other means of establishing a communications link between the computers may be used. Furthermore, while it is contemplated that many embodiments of the present invention are particularly well-suited for use with a computer system, the present invention is not intended to be limited to the disclosure of such embodiments.

Referring now to FIG. 2, another embodiment of an exemplary computing system 100 is depicted. Computer system 100 may include a logical processor 102, such as an execution core. Although one logical processor 102 is shown, in other embodiments, the computer system 100 may have multiple logical processors, e.g., multiple execution cores per processor substrate, and/or multiple processor substrates that may each have multiple execution cores. As shown, the various computer-readable storage media 110 may be interconnected by one or more system buses that couple the various system components to the logical processor 102. The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. In an example embodiment, computer-readable storage media 110 may include, for example, Random Access Memory (RAM)104, storage 106 (e.g., an electromechanical hard drive, a solid state hard drive, etc.), firmware 108 (e.g., flash RAM or ROM), and removable storage 118 (e.g., a CD-ROM, a floppy disk, a DVD, a flash drive, an external storage device, etc.). It should be appreciated by those skilled in the art that other types of computer readable storage media can be used, such as magnetic cassettes, flash memory cards, digital video disks, bernoulli cartridges.

Computer-readable storage media provide non-volatile storage of processor-executable instructions 122, data structures, program modules, and other data for the computer 100. A basic input/output system (BIOS)120, containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, may be stored in firmware 108. A number of programs, including an operating system and/or application programs, may be stored on firmware 108, storage device 106, RAM104, and/or removable storage device 118 and executed by logical processor 102.

Commands and information may be received by computer 100 through input devices 116 that may include, but are not limited to, a keyboard and pointing device. Other input devices may include a microphone, joystick, game pad, scanner, or the like. These and other input devices are often connected to the logical processor 102 through a serial port interface that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port, or Universal Serial Bus (USB). A display or other type of display device is also connected to the system bus via an interface, such as a video adapter, which may be part of the graphics processor 112 or may be connected to the graphics processor 112. In addition to the display, computers typically include other peripheral output devices (not shown), such as speakers and printers. The exemplary system of FIG. 1 can also include a host adapter, Small Computer System Interface (SCSI) bus, and an external storage device connected to the SCSI bus.

The computer system 100 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer. The remote computer may be another computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100.

When used in a LAN or WAN networking environment, computer system 100 can be connected to the LAN or WAN through network interface card 114. The NIC114, which may be internal or external, may be connected to the system bus. In a networked environment, program modules depicted relative to the computer system 100, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections described are exemplary and other means of establishing a communications link between the computers may be used. Further, while it is contemplated that many embodiments of the present invention are particularly well-suited for computerized systems, nothing in this description is meant to limit the invention to those embodiments.

A remote desktop system is a computer system that maintains applications that are remotely executable by a client computer system. The input is input at the client computer system and communicated over a network (e.g., using a protocol based on the International Telecommunications Union (ITU) t.120 family of protocols, such as the Remote Desktop Protocol (RDP)) to an application on a terminal server. The application processes the input as if it were entered at the terminal server. The application generates output in response to the received input, and the output is transmitted over a network to a client computer system. The client computer system renders the output data. Thus, input is received and output is rendered at the client computer system, while processing actually occurs at the terminal server. A session may include a command line interface (shell) such as a desktop and a user interface, a subsystem that tracks mouse movements within the desktop, a subsystem that translates mouse clicks on a icon into commands that implement a program instance, and so forth. In another example embodiment, the session may include an application. In this example, the desktop environment may still be generated and hidden from the user when the application is rendered. It should be understood that the foregoing discussion is exemplary and that the presently disclosed subject matter can be implemented in a variety of client/server environments and is not limited to a particular end service product.

In most, if not all, remote desktop environments, input data (entered at the client computer system) typically includes mouse and keyboard data representing commands to an application, and output data (generated by the application at the terminal server) typically includes video data for display on a video output device. Many remote desktop environments also include functionality that extends to the transfer of other types of data.

The communication channel may be used to extend the RDP protocol by allowing the plug-in to transfer data over the RDP connection. Many such extensions exist. Features such as printer redirection, clipboard redirection, port redirection, etc., use communication channel techniques. Thus, there may be many communication channels required to transfer data in addition to the input and output data. Thus, there may be occasional requests to transmit outgoing data and one or more channel requests to transmit other data contending for available network bandwidth.

Referring now to fig. 3 and 4, depicted is a high-level block diagram of a computer system configured to implement a virtual machine. As shown, computer system 100 may include the elements described in FIGS. 1 and 2, as well as components that may be used to implement a virtual machine. One such component is a hypervisor (hypervisor)202, which may also be referred to in the art as a virtual machine monitor. Hypervisor 202 in the depicted embodiment can be configured to control and arbitrate access to the hardware of computer system 100. Broadly, hypervisor 202 can generate execution environments called partitions, such as child partition 1 through child partition N (where N is an integer greater than or equal to 1). In various embodiments, child partitions may be considered a basic unit of isolation supported by hypervisor 202, i.e., each child partition may be mapped to a set of hardware resources, e.g., memory, devices, logical processor cycles, etc., and/or parent partitions under the control of hypervisor 202, and hypervisor 202 may isolate one partition from accessing the resources of another partition. In various embodiments, hypervisor 202 can be a stand-alone software product, a part of an operating system, embedded within firmware of a motherboard, a specialized integrated circuit, or a combination thereof.

In the above example, computer system 100 includes parent partition 204, which may also be considered domain 0 in the open source community. The parent partition 204 may be configured to provide resources to guest operating systems executing in the child partitions 1-N by using virtualization service providers 228 (VSPs), also referred to as back-end drivers in the open source community. In this example architecture, parent partition 204 may gate access to the underlying hardware. VSPs 228 can be used to multiplex interfaces to hardware resources through virtualization service clients (VCSs), also referred to as front-end drivers in the open source community. Each child partition may include one or more virtual processors, such as virtual processors 230-232 on which guest operating systems 220-222 may manage and schedule threads to execute. Generally, the virtual processors 230 through 232 are executable instructions and associated state information that provide a representation of a physical processor with a particular architecture. For example, one virtual machine may have a virtual processor with characteristics of an Intel x86 processor, while another virtual processor may have the characteristics of a PowerPC processor. The virtual processors in this example may be mapped to logical processors of the computer system such that instructions implementing the virtual processors will be supported by the logical processors. As such, in these example embodiments, multiple virtual processors may be executing simultaneously while, for example, another logical processor is executing hypervisor instructions. In general, and as shown, the combination of virtual processors, various VCSs, and memory in a partition can be considered a virtual machine, such as virtual machine 240 or 242.

In general, guest operating systems 220 through 222 can include data structures such as, for example, fromOpen source community, etc. The guest operating system may include a user/kernel mode of operation and may have a kernel that may include a scheduler, memory manager, and the like. Kernel mode may include an execution mode in a logical processor that grants access to at least privileged processor instructions. Each guest operating system 220 through 222 may have an associated file system on which applications such as terminal servers, e-commerce servers, e-mail servers, etc., as well as the guest operating systems themselves, are stored. Guest operating systems 220 and 222 can schedule threads to execute on virtual processors 230 and 232 and can implement instances of such applications.

Referring now to FIG. 4, an alternative architecture that may be used to implement a virtual machine is shown. FIG. 4 depicts components similar to FIG. 3, but in this example embodiment hypervisor 202 can include virtualization service provider 228 and device driver 224, and parent partition 204 can contain configuration utility 236. In this architecture, hypervisor 202 may perform the same or similar functions as hypervisor 202 in FIG. 2. Hypervisor 202 of FIG. 4 can be a stand-alone software product, a part of an operating system, embedded within firmware of a motherboard or a portion of hypervisor 202 can be implemented by an application specific integrated circuit. In this example, parent partition 204 may have instructions available to configure hypervisor 202, however, the hardware access request may be handled by hypervisor 202 rather than passed to parent partition 204.

Referring now to FIG. 5, computer 100 may include circuitry configured to provide remote desktop services to connected clients. In an example embodiment, the depicted operating system 400 can execute directly on hardware, or guest operating system 220 or 222 can be implemented by a virtual machine, such as VM216 or VM 218. The underlying hardware 208, 210, 234, 212, and 214 is indicated in dashed lines of the type shown to identify that the hardware may be virtualized.

The remote service may be provided to at least one client, such as client 401 (although one client is depicted, the remote service may be provided to more clients). Example client 401 may comprise a computer terminal implemented by hardware configured to direct user input to a remote server session and display user interface information generated by the session. In another embodiment, client 401 may be implemented by a computer that includes similar elements as those in computer 100 of FIG. 1. In this embodiment, the client 401 may include circuitry configured to implement an operating system and circuitry configured to emulate the functionality of a terminal (e.g., a remote desktop client application executable by one or more logical processors 102). Those skilled in the art will appreciate that circuitry configured to implement an operating system may also include circuitry configured to emulate a terminal.

Each connected client may have a session (e.g., session 404) that allows the client to access data and applications stored on computer 100. In general, applications and certain operating system components may be loaded into a region of memory allocated to a session. Thus, in some cases, some OS components may be spawned N times (where N represents the current number of sessions). These various OS components may request services from the operating system kernel 418, which is capable of managing memory, for example; facilitating disc read/write; and configure the threads from each session to execute on logical processor 102. Some example subsystems that may be loaded into the session space may include a subsystem that generates a desktop environment, a subsystem that tracks mouse movements within a desktop, a subsystem that translates mouse clicks on a cursor into commands that implement a program instance, and so forth. The process implementing the services, e.g., tracking mouse movements, is tagged with an identifier associated with the session and loaded into the memory area allocated to the session.

The session may be generated by a session manager 416, such as a process. For example, session manager 416 may initialize and manage each remote session by: generating a session identifier for a session space; allocating memory to the session space; and generating instances of system environment variables and subsystem processes in memory allocated to the session space. Session manager 416 may be invoked when operating system 400 receives a request for a remote desktop session.

The connection request may first be processed by a transport stack 410, such as a Remote Desktop Protocol (RDP) stack. Transport stack 410 instructs configurable logical processor 102 to listen for connection messages on a particular port and forward these messages to session manager 416. When sessions are generated, transport stack 410 may instantiate a remote desktop protocol stack instance for each session. Stack instance 414 is an example stack instance that may be generated for session 404. In general, each remote desktop protocol stack instance may be configured to route output to an associated client and to route client input to environment subsystem 444 for the appropriate remote session.

As shown, in an embodiment, applications 448 (although one is shown, others may be performed) may execute and generate a bit array. The array may be processed by a graphics interface 446, which in turn may render a bitmap, such as an array of pixel values, that may be stored in memory. As shown, a remote display subsystem 420 can be instantiated, which can capture the render call and send the call over a network to client 401 via stack instance 414 for the session.

In addition to remoting graphics and audio, a plug-and-play redirector 458 may also be instantiated to remotely control different devices such as a printer, mp3 player, client file system, CDROM drive, etc. Plug-and-play redirector 458 may receive information from a client-side component that identifies a peripheral device coupled to client 401. Plug-and-play redirector 458 may then configure operating system 400 to load a redirection device driver for the peripheral device of client 401. The redirection device driver may receive a call from operating system 400 to access a peripheral device and send the call over a network to client 401.

As discussed above, the client may connect to the resources using the terminal services using a protocol for providing remote presentation services, such as Remote Desktop Protocol (RDP). When a remote desktop client connects to a terminal server via a terminal server gateway, the gateway can open a socket connection with the terminal server and redirect client communications on a remote presentation port or a port dedicated to remote access services. The gateway may also perform a specific gateway-specific exchange with the client using a terminal server gateway protocol delivered over HTTPS.

Turning to FIG. 6, depicted is a computer system 100 that includes circuitry for implementing remote services and incorporating aspects of the present invention. As shown, in an embodiment, computer system 100 may include components similar to those depicted in FIGS. 2 and 5 and may implement a remote presentation session. In one embodiment of the invention, a remote presentation session may include aspects of a console session, such as a session that is derived for a user using a computer system and a remote session. Similar to the above, session manager 416 may initialize and manage a remote presentation session by enabling/disabling components to implement the remote presentation session.

One set of components that may be loaded into a telepresence session is a high fidelity remoting enabled console component, i.e., a component that utilizes 3D graphics and 2D graphics rendered by 3D hardware.

The 3D/2D graphics rendered by the 3D hardware may be accessed using driver models including user mode drivers 522, APIs 520, graphics kernel 524, and kernel mode drivers 530. The application 448 (or any other process that generates 3D graphics such as a user interface) can generate and send an API construct, such as from Microsoft corporationAnd Direct3D, and the like. The API520, in turn, may communicate with a user mode driver 522, which may generate primitives such as primitive geometries for use in computer graphics represented as vertices and constants, which are used as building blocks for other shapes, and store the primitives in a buffer such as a memory page. In one embodiment, the application 448 can declare how it will use the buffer, e.g., what type of data it will store in the buffer. Applications such as video games may use dynamic buffers to store primitives for avatars and static buffers to store data that will change infrequently, such as representing buildings or forests.

Continuing with the description of the driver model, the application can fill the buffer with primitives and issue execution commands. When an application issues an execution command, the buffer may be appended to the run list by kernel mode driver 530 and scheduled by graphics kernel scheduler 528. Each graphical source, such as an application or user interface, may have a context and its own running list. The graphics kernel 524 may be configured to schedule various contexts to execute on the graphics processing unit 112. The GPU scheduler 528 may be executed by the logical processor 102, and the scheduler 528 may issue commands to the kernel mode driver 530 to render the contents of the buffers. Stack instance 414 may be configured to receive commands and send the contents of the buffer to client 401 over a network, where the buffer may be processed by the client's GPU 401.

Now shown is an example of the operation of a virtualized GPU used in conjunction with an application invoking a telepresence service. Referring to FIG. 6, in an embodiment, a virtual machine session may be generated by the computer 100. For example, the session manager 416 may be executed by the logical processor 102 and may initialize remote sessions that include particular remote components. In this example, the derived session may include the kernel 418, the graphics kernel 524, the user mode display driver 522, and the kernel mode display driver 530. User mode driver 522 may generate primitives that may be stored in memory. For example, the API520 can include an interface that can be exposed to processes, such as a user interface, for the operating system 400 or the applications 448. A process may send high-level API commands to the API420, such as a point list (PointList), a line list (LineList), a line stripe (linetrip), a triangle list (triangle list), a triangle stripe (triangle trip), or a triangle fan (triangle fan). API520 may receive these commands and convert them into commands for user mode driver 522, which user mode driver 522 may then generate and store the vertices in one or more buffers. The GPU scheduler 528 may run and determine the contents of the render buffer. In this example, commands to the server's graphics processing unit 112 may be captured and the contents (primitives) of the buffer may be sent to the client 401 via the network interface card 114. In one embodiment, the API may be exposed by the session manager 416 with which the components can interface to determine whether a virtual GPU is available.

In an embodiment, a virtual machine, such as virtual machine 240 of fig. 3 or fig. 4, may be instantiated, and the virtual machine may act as a platform for execution of operating system 400. In this example, guest operating system 220 can embody operating system 400. The virtual machine may be instantiated upon receiving a connection request over a network. For example, parent partition 204 may include an instance of transport stack 410 and may be configured to receive a connection request. Parent partition 204 may initialize the virtual machine in response to the connection request along with the guest operating system including the ability to implement the remote session. The connection request may then be passed to transport stack 410 of guest operating system 220. In this example, each remote session may be instantiated on an operating system executed by its own virtual machine.

In one embodiment, a virtual machine may be instantiated and may execute guest operating system 220 that materializes operating system 400. Similar to the above, a virtual machine may be instantiated when a connection request is received over a network. The remote session may be generated by the operating system. Session manager 416 may be configured to determine that the request is for a session that supports 3D graphics rendering, and session manager 416 may load a console session. In addition to loading a console session, the session manager 416 can load a stack instance 414' for the session and configure the system to capture primitives generated by the user-mode display driver 522.

User mode driver 522 may generate primitives that may be captured and stored in a buffer accessible to transport stack 410. Kernel mode driver 530 may append buffers to the running list of applications and GPU scheduler 528 may run and determine when to issue rendering commands to the buffers. When scheduler 528 issues a rendering command, the command may be captured by, for example, kernel mode driver 530 and sent to client 401 via stack instance 414'.

GPU scheduler 528 may execute and determine instructions to issue the contents of the render buffer. In this example, primitives associated with rendering instructions may be sent to client 401 via network interface card 114.

In an embodiment, at least one kernel-mode process may be executed by at least one logical processor 112, and at least one logical processor 112 may render vertices stored in different buffers simultaneously. For example, graphics processing scheduler 528, which may operate similar to an operating system scheduler, may schedule GPU operations. GPU scheduler 528 may merge the separate vertex buffers into the correct execution order so that the graphics processing units of client 401 execute the commands in an order that allows them to be rendered correctly.

One or more threads of a process, such as a video game, may map multiple buffers and each thread may issue a drawing command. The identification information for the vertices, e.g., information generated for each buffer, each vertex, or each batch of vertices in the buffer, may be sent to GPU scheduler 528. The information may be stored in a table with identification information associated with the vertices from the same or other processes and used to synchronize the renderings of the various buffers.

An application, such as a word processing program, may execute and declare, for example, two buffers — one for storing vertices used to generate a 3D menu and the other for storing commands to generate letters that will populate the menu. The application may map the buffer and issue a draw command. The GPU scheduler 528 may determine the order in which to execute the two buffers so that the menu is rendered in a visually pleasing manner along with the letters. For example, other processes may issue draw commands at the same or substantially similar times, and if the vertices are not synchronized, the vertices from different threads of different processes may be rendered asynchronously on client 401, causing the final image displayed to appear chaotic or mixed.

A batch compressor 450 may be used to compress the primitives before sending the data stream to client 401. In an embodiment, bulk compactor 450 may be a user mode (not shown) or kernel mode component of stack instance 414 and may be configured to look for similar patterns in the data stream sent to client 401. In this embodiment, because batch compressor 450 receives vertex streams from multiple applications rather than multiple API constructs, batch compressor 450 has a larger set of vertex data to filter in order to find compression opportunities. That is, because vertices for multiple processes are remoted rather than different API calls, there is a greater chance that the batch compactor 450 can find similar patterns in a given stream.

In one embodiment, the GPU 112 may be configured to use virtual addressing instead of physical addresses for memory. Thus, memory pages used as buffers may be paged from video memory to system RAM or disk. The stack instance 414' may be configured to obtain a virtual address for the buffer and send content from the virtual address upon capturing a rendering command from the graphics kernel 528.

Operating system 400 may be configured, for example, to load various subsystems and drivers to capture and send primitives to a remote computer, such as client 401. Similar to the above, the session manager 416 may be executed by the logical processor 102 and a session including a particular remote component may be initialized. In this example, the derived session may include the kernel 418, the graphics kernel 524, the user mode display driver 522, and the kernel mode display driver 530.

The graphics kernel may schedule GPU operations. GPU scheduler 528 may merge the separate vertex buffers into the correct execution order so that the graphics processing units of client 401 execute the commands in an order that allows them to be rendered correctly.

All of these schemes for implementing the above-mentioned partitions are merely exemplary implementations, and nothing herein should be construed as limiting the disclosure to any particular virtualization aspect.

GPU in virtualized environment

A graphics processing unit or GPU is a dedicated processor that offloads the burden of 3D graphics rendering from a microprocessor. GPUs can provide efficient processing of mathematical operations commonly used in graphics rendering by implementing various primitive operations. The GPU may provide faster graphics processing than the main CPU. The GPU may also be referred to as a graphics accelerator.

Graphics applications may use Application Programming Interfaces (APIs) to configure graphics processing pipelines and provide shader programs that perform application-specific vertex and pixel processing on GPUs. Many graphics applications interface with the GPU using APIs such as microsoft's DirectX or OpenGL standards.

As described above, virtualization operates by multiplexing physical hardware by presenting each virtual machine with a virtual device and combining its respective operations in a hypervisor or virtual machine monitor, such that the hardware resources are used while maintaining the perception that each virtual machine has completely independent hardware resources. As discussed, a Virtual Machine Monitor (VMM) or hypervisor is a software system that can partition a single physical machine into multiple virtual machines.

The virtual machine may be rendered into a virtual device by a virtual GPU device driver. The actual rendering may be done as follows: rendering is accelerated using a single or multiple GPU controllers in another virtual machine (parent virtual machine) or on a remote machine (which acts as a graphics server) that is shared by many guest virtual machines. An image capture component on the parent virtual machine may retrieve a snapshot of the desktop image. The captured images may be optionally compressed and encoded before transmission to the client. The compression and encoding may occur on a parent virtual machine or a child virtual machine or a guest virtual machine. A remote presentation protocol, such as the Remote Desktop Protocol (RDP), may be used to connect to the virtual machine from a remote client and to transfer the desktop image. In this way, a remote user may experience a graphical user interface such as windows aero and execute 3D applications and multimedia through telnet.

The virtualization scheme may be based on one or both of two modes. In one embodiment, the user mode driver may provide a higher virtualization boundary in the graphics stack and the kernel mode driver may provide a lower virtualization boundary in the graphics stack. In one embodiment, the virtual GPU subsystem may include a display driver further comprising: user mode and kernel mode components executing on a virtual machine; and a rendering component of a rendering/capturing/compressing process performed on the parent partition. In one embodiment, the display driver may be a Windows Display Driver Model (WDDM) driver.

FIG. 10 illustrates an exemplary embodiment of a virtual machine scenario for implementing a virtual GPU as a component in a VDI scenario. In this example, the VDI can provide 3D graphics capabilities for each child virtual machine 1010, the child virtual machine 1010 being instantiated by a hypervisor 1020 on the server platform. Each child virtual machine 1010 may load a virtual GPU driver 1040. The system may be populated with GPU accelerators 1030 accessible from the parent or root partition 1000. The physical GPU1030 (also known as GVM-graphics virtual machine) on the parent or root partition 1000 may be shared by different child virtual machines 1010 to perform graphics rendering operations.

The virtual machine GPU subsystem may virtualize a physical GPU and provide accelerated rendering capabilities for the virtual machine. In one embodiment, the virtual GPU driver may be the WDDM driver 1040. The driver may remotes the corresponding commands and data to the parent partition for rendering. A rendering process, which may be part of the render/capture/compression subsystem 1050, may perform a corresponding rendering on the GPU. For each virtual machine, a respective render/capture/compress component 1050 may be provided on the main or parent partition 1000. The WDDM driver allows the video memory to be virtualized, where video data is transferred from the video memory to the system RAM on a page-by-page basis.

The rendering/capturing/compressing subsystem may return compressed or uncompressed screen updates as appropriate, as requested by the graphics source subsystem running on the child virtual machine. These screen updates may be based on changing rectangle size and content. Virtual GPU drivers may support common operating systems such as VISTA and WINDOWS 7.

As discussed, some embodiments may incorporate WDDM drivers. The WDDM driver operates as if the GPU is a device configured to draw pixels in video memory based on commands stored in a Direct Memory Access (DMA) buffer. The DMA buffer information may be sent to the GPU, which processes the data asynchronously in the order of submission. When each buffer is complete, the runtime is notified and another buffer is committed. By performing this processing loop, the video image can be processed and ultimately rendered on the user screen. One skilled in the art will recognize that the disclosed subject matter is implemented in a system using OpenGL or other products.

DMA buffer scheduling may be driven by a GPU scheduler component in kernel mode. The GPU scheduler may determine: which DMA buffers to send to the GPU and in what order.

The user mode driver may be configured to convert graphics commands issued by the 3D runtime API into hardware-specific commands and store the commands in a command buffer. The command buffer is then submitted to the runtime, which in turn calls the kernel mode driver. The kernel mode driver may then construct a DMA buffer based on the contents of the command buffer. When the DMA buffer is to be processed, the GPU scheduler may invoke a kernel mode driver that handles all the details of the current submission of the buffer to the GPU hardware.

The kernel mode driver may interface with the physical hardware of the display device. The user mode driver includes hardware specific knowledge and can build hardware specific command buffers. However, user mode drivers do not interface directly with the hardware and may rely on kernel mode drivers for this task. The kernel mode driver may program the display hardware and cause the display hardware to execute commands in the DMA buffer.

In one embodiment, all interactions with a main partition or parent partition may be handled through a kernel mode driver. The kernel mode driver may send the DMA buffer information to the parent partition and may make the necessary callbacks into the kernel mode API runtime when the DMA buffer has been processed. When the runtime creates a graphics device context, the runtime may call a function for creating the graphics device context that holds the rendering state set. In one embodiment, a single kernel-mode connection to a parent partition may be created when the first virtual graphics device is created. Subsequent graphical devices may be created in coordination with the user mode devices and connections to parent partitions of these devices may be handled by the user mode devices.

In another embodiment, a connection to a main partition or parent partition may be created each time a kernel mode driver creates a new device. The connection context may be created and stored in a per-device data structure. The connection context may typically include sockets and I/O buffers. This per-device connection context can help ensure that since all communication with the GVM goes through the kernel-mode driver: the command is routed to the correct device on the primary partition or parent partition.

In one embodiment, a separate thread may be provided on the main partition or parent partition for each running instance of the user mode device. The thread may be created when the application creates a virtual device on the child partition. Additional rendering threads may be provided to process commands originating from kernel mode on the child partition (e.g., kernel mode rendering and mouse pointer activity).

In one embodiment, the number of rendering threads on the parent partition may be kept to a minimum to match the number of CPU cores.

Additional tasks may be performed when managing the GPU. For example, in addition to providing primitives, a hardware context for the GPU may be maintained. Pixel shaders, vertex shaders, clipping planes, clipping rectangles, and other settings that affect the graphics pipeline may be configured. The user mode driver may also determine the logical values of these settings and how to translate these values into physical settings.

In one embodiment, the user mode driver may be responsible for constructing the hardware context and command buffer. The kernel mode driver may be configured to convert the command buffer into a DMA buffer and provide this information to the GPU when scheduled by the GPU scheduler.

Virtual GPUs can be implemented within the context of several user-mode and kernel-mode components. In one embodiment, Virtual Machine Transport (VMT) may be used as a protocol to send and receive requests across all of these components. The VMT may provide communication between modules across two or more partitions. Since there are multiple components in each partition that communicate within the scope of the partition, a common transport may be defined between these components.

FIG. 7 depicts the abstraction layer in a traditional driver and the abstraction layer in an exemplary embodiment of a virtual GPU driver. As with conventional GPU700, parent partition 600 may be viewed as being located at the bottom of driver stack 710. The parent partition 600 represents the graphics hardware and abstracts the interface of the legacy GPU700 as if the GPU existed in a virtual machine. The virtual GPU driver thus provides access to the parent partition under the constraints of the driver model.

The display driver 740 may receive GPU-specific commands 725, and may be written to be hardware-specific and control the GPU700 through a hardware interface. Display driver 740 may program the I/O ports, access memory-mapped registers, and otherwise interface with low-level operations of the GPU device. Virtual GPU driver 750 may receive parent partition-specific commands 735 and may be written to a particular interface exposed by parent partition 600. In one embodiment, the parent partition may be a Direct3D application running on a different machine, and the parent partition may act as a GPU that natively executes Direct3D commands. In this embodiment, commands received by user mode display driver 730 from Direct3D runtime 705 may be sent to parent partition 600 in an unmodified form.

As shown in FIG. 8, in one embodiment, the Direct3D command on child partition 800 may be encoded in user mode driver 820 and kernel mode driver 830 and sent to parent partition 810 along with data parameters. On the parent partition 810, the component may render the graphics by using a hardware GPU.

In another embodiment shown in FIG. 9, the Direct3D command on child partition 800 may be sent to user mode driver 820 and kernel mode driver 830. These commands may be interpreted/adjusted in kernel mode driver 830 and placed in DMA buffers in kernel mode. The parent partition 810 may provide virtual GPU functionality and the command buffer may be constructed by a user mode driver 820. The command buffer information may be sent to kernel mode driver 830, where they may be converted to DMA buffers and submitted to parent partition 810 for execution. On the parent partition, the component may render the command on the hardware GPU.

When an application requests execution of a graphics processing function, corresponding commands and video data may be made available to the command interpreter function. For example, a hardware-independent pixel shader program can be converted into a hardware-specific program. The translated commands and video data may be placed in a parent partition work queue. The queue may then be processed, and the pending DMA buffer may be sent to the parent partition for execution. When the parent partition receives the command and data, the parent partition may use Direct3DAPI to convert the command/data into a form specific to the parent partition's graphics hardware.

Thus, the following GPU drivers may be provided in the sub-partition: the GPU driver conceptually sees each virtual machine as a real graphics driver, but in reality causes virtual machine commands to be routed to the parent partition. On the parent partition, the image may be rendered using real GPU hardware.

In one illustrative embodiment shown in fig. 10, a composite 3D video device may be exposed to a virtual machine, and the virtual machine may search for a driver that matches the video device. A virtual graphics display driver may be provided that matches the device, which may be found and loaded by the virtual machine. Once loaded, the virtual machine can determine: it can perform 3-D tasks and expose the device capabilities to the operating system that can use the functions of the virtualized device.

The command received by the virtual machine may invoke a virtual device driver interface. The translation mechanism may translate device driver commands into DirectX commands. Thus, the virtual machine considers: it has access to a real GPU that calls the DDI and device driver. Incoming device driver calls are received and translated, data is received, and on the parent partition side, DDI commands can be recreated back to directx api to render what should have been rendered on the virtual machine. In some instances, the conversion of DDI commands to DirectXAPI commands may be inefficient. In other embodiments, DirectXAPI may be bypassed and DDI commands may be converted directly to DDI commands on the main partition. In this embodiment, the DirectX subsystem may be configured to allow this bypass.

In another embodiment, only one connection to a parent partition may be established and communications with the graphics device context may be multiplexed on one communication channel. Although there is typically a one-to-one mapping of graphics devices from child partitions to parent partitions, in this embodiment, the communication channel is not associated with any particular graphics device. The "select device" token may be sent before sending a command to a particular device. The "select device" token indicates: all subsequent commands should be routed to a particular graphics device. When a graphical command should be sent to a different device, a subsequent "select device" token may be sent.

Alternatively, in another embodiment, only one graphics device may be available on the parent partition. Here, a many-to-one mapping of devices from child partitions to devices on parent partitions may be implemented. The correct GPU state may be sent before sending the command associated with the particular graphics device. In this scenario, the GPU state is maintained not by the parent partition but by the child partition. In this embodiment, the illusion is created that there are multiple graphics device contexts on a child partition, but in reality, all of the graphics device contexts are handled by one graphics device context on the parent partition that receives the correct GPU state before processing the commands associated with a given child partition graphics device context.

Thus, in various embodiments, the GPU may be abstracted and device driver calls on the virtual machines may be sent to the parent partition, where commands are translated to use the graphics server's API. Before sending to the parent partition, the device driver calls may be translated into intermediate commands and data before they are sent to the parent partition and translated into application-level APIs. The intermediate stages may be implementation specific and depend on the particular hardware used.

Using the techniques described above, a stable virtual GPU can be synthesized, and a given virtual machine need not be concerned with the underlying device as long as a certain piece of hardware at the lower level meets the minimum requirements. For example, in one case, the parent partition may use an nvidiagu, and in another case, the parent partition may use an ATI device. In either case, a set of virtual capabilities may be exposed as long as the underlying GPU provides a predetermined minimum set of capabilities. The application running on the virtual machine runs as if the WDDM driver had a stable set of characteristics. The virtual machine may be saved and migrated to another system using a different GPU without affecting the applications using the GPU services.

In the GPU scenario described above, 3D graphics may be used. 3D graphics use modeling by wire-frame representation of three-dimensional objects that can be displayed as two-dimensional images using various 3D rendering techniques. Such techniques may, for example, represent a 3D object using a set of points in 3D space connected by geometric entities such as triangles. When creating a picture in a video application, various virtual objects, the viewing angle of the viewer, colors, and illumination may be taken into account when generating a still image or animation. Typically, the vertices of the 3D model are colored, and the colors can then be interpolated over the surface of the model during rendering. One method for adding color information to a 3D model is by applying a 2D texture image to the surface of the model using texture mapping. Texture may add detail, surface texture, or color to a computer-generated graphic or 3D model. The vertex geometry information (vertex buffer) may include texture coordinates that indicate how the points of the texture map are mapped to the surface of the 3D model. Textures can be mapped to surfaces of shapes such as triangles, which are commonly used in 3D modeling. In addition, shaders may perform complex computations to take any number of textures from within arbitrary locations.

3D applications typically require a large amount of texture data to produce good quality pictures. This large amount of texture in turn requires a large amount of space in memory and on a storage medium, such as a hard disk or optical disk.

Memory region shared between child and parent partitions

As described above, texture or other data may require a large amount of video memory when virtualizing graphics hardware. For example, when running a 3D application in a virtual machine, a large amount of graphics data may be pushed to the video driver. Furthermore, the actual rendering will be on the parent partition, and thus the graphics data must be transferred from the child partition to the parent partition.

As an additional factor, graphics data typically must not be transferred from just kernel mode of a child partition to kernel mode on a parent partition, because the child partition's renderer uses user mode applications on the parent partition. Thus, a user mode application on a parent partition needs to send a command to a kernel mode in a child partition, which then needs to transfer data to the user mode application in the parent partition.

Graphics data is an illustrative example of a scenario in which it is desirable to transfer large amounts of data between virtual machine partitions. The embodiments disclosed herein are applicable to any type of data and are not limited to the transmission of graphics data.

The above need may be addressed by opening a shared memory channel between kernel mode in the guest and user mode on the host. Such transfers break multiple boundaries where shared memory must be remapped and reallocated in order for a user mode application to access data that kernel mode in a guest seeks to render.

However, some solutions may be very inefficient. For example, some systems may provide a custom kernel-mode service on the host that talks to a kernel-mode service on the guest. Thus, the master custom kernel mode service needs to translate kernel mode memory into user mode memory and talk to the user mode application on the host. This can be very expensive and cumbersome. For example, some systems may first copy memory between kernel mode in a child partition and kernel mode in a parent partition. These data then need to be copied from kernel mode in the parent partition to user mode area.

In various embodiments, a mechanism is disclosed for creating a memory aperture (alert) shared between modes in a parent partition and a child partition. The shared memory aperture may be created between any memory mode between the guest and any host. For example, a shared memory aperture may be created between kernel mode on a child partition and user mode on a parent partition.

By using such a mechanism, a direct bridge may be provided, for example, from kernel mode in a child partition to user mode in a parent partition. Such a bridge may provide the shortest path for data to be passed from kernel or user mode in a child partition to a desired destination mode (kernel or user) in a parent partition, and vice versa.

As will be described in further detail below, in some embodiments, when creating a shared memory aperture, the size of the aperture may be agreed upon. Furthermore, in some embodiments, the aperture may be established by leveraging existing APIs.

A virtual machine bus (VMBus) may be a set of libraries and drivers that provide a mechanism for establishing a channel between a Virtual Service Provider (VSP) and a Virtual Service Consumer (VSC) endpoint. VMBus allow VSPs to publish virtual appliances to virtual machines running within child partitions. In a child partition, the VMBus provide a bus driver that exposes each synthetic virtual device that is issued into the virtual machine. The guest OS running in the virtual machine then identifies and installs the appropriate device driver stack for the synthetic device. The VMBus library also provides a mechanism for sharing memory between the child and parent partitions and facilitates virtual interrupt delivery across multiple partitions.

The VSC may provide a direct interface to virtualized system (e.g., Hyper-V) components and establish VMBus channels to synthetic virtual devices running in the virtual machines. The VSP is a kernel mode driver that runs on the host OS in the parent partition. The VSP provides a VMBus channel into the virtual machine, and if there is a synthetic device VSC component that accepts channel provisioning, the channel is created. VSPs and VSCs serve as endpoints on the VMBus channel, and communicate using an upstream (upstream) ring buffer and a downstream (downstream) ring buffer. These ring buffers may be used to exchange control packets and small data packets. If a larger amount of data needs to be transferred, a shared memory buffer may be established. A VSP instance running in a parent partition may manage all of the channels established using the VSC components in the various virtual machines running a particular synthetic device.

The VSC may be part of a synthetic virtual device running in a virtual machine and implements interfaces between the synthetic device and other Hyper-V components. The VSC accepts VMSBus channel offers by its respective VSP and uses that channel to exchange configuration and status management packets. The VSC may establish a shared memory channel to the corresponding synthetic device component running in the parent partition.

In one embodiment, the shared memory aperture may be created by a library. The library may have multiple APIs that allow for the creation and deletion of shared memory apertures and the reading and writing of shared memory apertures. The library may be built into any component that wishes to use a shared memory aperture. The library may work with corresponding synthetic device drivers on the parent and child partitions to communicate over the VMBus (or any child partition communication system) to establish the connection.

In an embodiment, after the child partition boots, the child partition may make a request for the parent partition by allocating memory, creating a mapping, and requesting a connection to the parent partition. The parent partition receives the connection request, validates the connection request, and validates the mapping. After this verification is complete, the parent partition memory maps the child partition to the main memory where the connection has been requested.

The parent partition and child partitions may read and write memory regions using read and write APIs from the library. Any size memory region (limited by parent and child memories) may be created.

In the context of some of the illustrative virtualization systems described above, an instance of a desktop window manager process may be spawned by the desktop window manager agent when a new virtual machine starts up. The desktop window manager process contains a rendering component that receives D3D commands and data streams from virtualized GPU components in respective virtual machines. The desktop window manager process renders the D3D stream using the host OSDirect3D software components and the physical GPU managed by the host OS. The capture component of the desktop window manager determines which portions of each rendered frame have changed and encodes the frame differences. The capture component then passes the encoded data to a graphics source module of the user mode terminal service component.

The user-mode terminal services component runs in the guest OS of the virtual machine in the child partition. The user mode terminal services component accepts single user logins from client computers using telepresence connections. The user-mode terminal services component creates a user session and handles redirection of individual devices between the virtual desktop and the client computer. The shared memory channel may be used to receive encoded frame difference display data for the remote virtual desktop and applications and send over the telepresence connection to the terminal services client software on the client computer where the data is decoded and displayed.

The desktop window manager process renders the D3D commands and data streams received from the device kernel-mode and user-mode driver components.

The D3D command and data streams sent from the virtualized GPU driver to the rendering module in the desktop window manager process, and the encoded frame difference data sent by the capture module to the graphics source in the user-mode terminal services component require a high performance transmission channel. This need is addressed by a shared memory transfer channel library that provides a shared memory transfer mechanism. The shared memory transfer channel library provides an API that allows shared memory transfer channels to be created. The shared memory transfer channel bank interacts with the VSC and VSP components to map the allocated buffers so that the buffers are shared among the components running in the child and root partitions. A kernel mode version and a user mode version of the library may be provided.

The open function may be used to create a channel in a software component running in a guest OS in the child partition and a desktop manager process instance running in a host OS. Once the channel has been created, data can be sent in either direction using the Send (Send) and Receive (Receive) functions. The channel may be closed using a Close function.

When an Open (Open) function is called, the shared memory transport channel library determines whether it is running on the guest or host. If it is running on a guest, an Open (Open) function allocates a memory buffer for implementing the two circular buffers and the control state in the shared memory. The shared memory transport channel library opens the VSC device object and sends a DeviceIoControl message to the VSC driver requesting that the user-mode or kernel-mode buffers allocated in the child partition be shared with the root partition buffer in the shared memory transport channel library linked to the desktop window manager process.

The VSC driver obtains a Memory Descriptor List (MDL) for the buffer and locks the page into the guest OS memory space. The VSC driver translates the MDL into a Guest Physical Address Descriptor List (GPADL) that describes the actual real system physical pages that have been allocated to the memory buffers by the hypervisor. The VSC sends a packet to the VSP over the VMBus channel requesting the VSP to complete the shared memory mapping operation.

The VSC driver converts the GPADL it receives from the VSC driver to the main OSMDL and locks the page in system memory. A rendering and capture module in the desktop window manager process performs an Open (Open) function call to a shared memory transport channel library linked into the desktop window manager process. The shared memory transfer channel library determines that it is running on the host and reserves a buffer of the same size as the buffer allocated on the guest and sends a DeviceIoControl waiting for a channel open request from the guest. The VSP shares the memory using a shared memory transfer channel bank to transfer virtual addresses of memory buffers that have been reserved using the VirtualAlloc function, and maps these reserved pages to system pages from the guest partition that have been allocated and locked. The VSP uses a unique identifier to map the channel open request from the guest to the appropriate listening module in the desktop window manager process. The virtual machine partition ID is used to determine the guest virtual machine's correspondence to the appropriate desktop window manager instance.

Send and Receive calls transfer data through a shared memory buffer, and these details are transparent to the caller. Receive (Receive) function may run in a blocked or non-blocked mode based on Receive mode in Open (Open) function) parameter. Alternatively, an EventSelect function may be used to allow the following data to be received: the data will be signaled so that the caller can wait for a single or multiple events and then signal the caller when one of the events occurs.

When a Close operation is invoked, the VSC and VSP are invoked and the shared memory is unlocked, unmapped, and released. The channel closed return state is returned when any subsequent calls are made on the opposite endpoint. The endpoint then invokes Close (Close) to complete the Close operation.

The circular buffer and control memory used by the shared memory transfer channel bank may be allocated by the shared memory transfer channel bank. Library users may not have access to this internal memory except indirectly through Send (Send) and Receive (Receive) functions. These functions may have client buffer pointer and byte count parameters. The library copies data from the transmit buffer into its internal shared memory buffer and then copies the data into the destination receive buffer. If the transmit buffer is larger than the internal buffer, multiple transmissions may be performed to transmit all of the data. The sender must guarantee that: the bytes specified by the transmit buffer pointer and byte count are in its transmit buffer. The receiver will Receive the amount of data specified by the byte count in each call to Receive (Receive). The receiving process is responsible for ensuring that: its buffer pointer and byte count do not cause the receive buffer to overflow.

Both channel endpoints control the format of the data they transmit over the channel. The shared memory transmission channel bank may be configured such that it does not examine the contents of the transmitted data. The two endpoints may be responsible for exchanging version packets when the tunnel is opened, as well as for verifying: the two endpoints may suitably interoperate with each other. If the check fails, the endpoint may call a Close function with a status indicating an incompatible version, and the desktop window manager process may send a status indicating a failure. The VSP notifies the system of the failure and virtual machine setup will fail.

Thus, a shared memory based transport channel API surface is provided that allows various components to be utilized to transfer data between components running in the child virtual machine partition and corresponding instances of the desktop window manager process running in the root partition. The shared memory transfer channel library allows for a higher performance transfer solution to replace current socket transfers.

The shared memory transfer channel can be used by other VDI components to enhance inter-component data transfer performance. The shared memory transfer channel provides bidirectional data transfer capability using memory pages shared between virtual machines. The internal channel design provides upstream and downstream circular buffering between channel endpoints and manages the streaming of data between sending and receiving endpoints. The channel endpoints originate from the child virtual machine partition and terminate at a corresponding desktop window manager process running in the root partition. The child virtual machine endpoints may originate from kernel mode drivers, user mode drivers, and graphics source modules in the user mode terminal services component.

The user mode component may use the shared memory transmit channel library to create data and transmit data through instances of the shared memory transmit channel. Exemplary functions of the shared memory transfer channel bank are described below.

These components may Open a channel using an Open (Open) function. The channel originates in the guest virtual machine to allow shared memory buffer allocation in the guest virtual machine. If the Open (Open) invoked component is running in the guest virtual machine, the shared memory transfer channel bank allocates a memory buffer and sends a DeviceIoControl to the VSC driver to establish the shared memory buffer. If an Open (Open) function is called by a desktop window manager process corresponding to a virtual machine running in the child partition, it listens for a channel to be opened by a component in the guest virtual machine. The desktop window manager process reserves the buffer and sends DeviceIoControl to the VSP to listen for channel creation. When the guest virtual machine component opens the channel, the VSC sends a message to the VSP through the VMBus channel. The VSP maps the guest virtual machine buffer and the main buffer to the same system physical pages and responds to snoops from the desktop window manager process to the channel DeviceIoControl. The Open (Open) function returns a handle (handle) that is used to specify the particular channel instance when other shared memory transfer channel library functions are called. The Close function closes the channel, which causes all shared memory to be unmapped and released.

A Send function may be used to Send a specified number of bytes to the opposing endpoint. A Receive (Receive) function may be used to Receive the data. The Receive (Receive) function may run in either a blocking or non-blocking mode, which may be specified by the Receive module parameter in the Open (Open) function. The EventSelect function may be used to specify the following events: the event may be signaled if a specified amount or any amount of data has been received. This allows the channel owner to wait for a receive event to signal the presence of the data. A Flush function may be used to block all data that has been sent until it is received by the opposite endpoint.

The shared memory transfer channel may be used to create multiple persistent channels between software components running in the child partition and corresponding desktop window manager processes running in the root partition. Components running in the child partition open the channel instance when they start. The desktop window manager process has the following threads: the thread waits to establish a new channel connection from a child component when the child component starts. Once created, the channel remains persistent until the component is closed or the virtual machine is closed.

The shared memory transport channel library monitors the channel and detects whether the channel is out of service or whether either party closes the channel. If the guest virtual machine component closes the channel, the shared memory transfer channel pool notifies the opposite endpoint and sends the close channel DeviceIoControl to the VSC driver to unmap and release the memory. When the notified channel is closed, the desktop window manager process clears its state and invokes a Close function. This function sends a close channel DeviceIoControl to the VSP driver. The VSP coordinates with the VSC in the guest virtual machine to unmap and release memory. If a close operation is initiated by the guest virtual machine component, this is considered a normal channel close.

If the desktop window manager process initiates a channel closure, this may be considered a channel loss condition. This may occur when the desktop window manager process is terminated for any reason. In this case, the channel lost state is communicated to the guest virtual machine component of the channel between the desktop window manager process instances that have opened and terminated. The guest virtual machine component will now have an inconsistent state and must reset itself and reestablish the channel connection when the desktop window manager process is restarted by the proxy process.

A similar situation may exist when a virtual machine resumes its executed operations. The desktop window manager process may not save all of the current D3D state during the save virtual machine operation. When the virtual machine is restored, the desktop window manager process is restarted and listens for a channel to be created from the guest virtual machine. The guest virtual machine is completely saved and restored so that the components must be reset and the channel to the desktop window manager process reestablished. The shared memory transfer CHANNEL library calls an error handler specified in an Open (Open) function or returns to the CT _ VMT _ CHANNEL _ LOST return state. When a component receives a channel loss notification, it must reset itself and re-establish the channel.

FIG. 11 depicts an exemplary operational procedure for transferring data between virtual machine partitions, the procedure including operations 1100, 1102, 1104, 1106, and 1108. Referring to FIG. 11, operation 1100 begins the operational procedure, and operation 1102 shows: a request for a memory allocation is sent by a first virtual machine partition to a second virtual machine partition, where the allocation indicates a maximum buffer size for transferred data. Operation 1104 shows: the requested memory allocation is mapped. Operation 1106 shows: the requested memory is allocated in a shared memory aperture between modes in the first virtual machine partition and the second virtual machine partition. Operation 1108 shows: data is transferred between the first virtual machine partition and the second virtual machine partition using the shared memory aperture.

FIG. 12 depicts an exemplary system for transferring data between virtual machine partitions as described above. Referring to fig. 12, the system 1200 includes a processor 1210 and a memory 1220. Memory 1220 further includes computer instructions configured to transfer data between virtual machine partitions. Block 1222 shows: a request for a memory allocation is sent by a first virtual machine partition to a second virtual machine partition, where the allocation indicates a maximum buffer size for transferred data. Block 1224 illustrates: the requested memory allocation is mapped. Block 1226 shows: the requested memory is allocated in a shared memory aperture between modes in the first virtual machine partition and the second virtual machine partition. Block 1228 shows: data is transferred between the first virtual machine partition and the second virtual machine partition using the shared memory aperture.

Any of the above-mentioned aspects may be implemented as a method, system, computer-readable medium, or any type of article of manufacture. For example, a computer-readable medium may store thereon computer-executable instructions for transferring data between virtual machine partitions. Such media may include: a first subset of instructions for sending, by a child virtual machine partition, a request to a parent virtual machine partition for a memory allocation, wherein the allocation indicates a maximum buffer size for transmitted data; a second subset of instructions for mapping the requested memory allocation; a third subset of instructions for allocating the requested memory in a shared memory aperture between modes in the child virtual machine partition and the parent virtual machine partition; and a fourth subset of instructions for transferring data between the child virtual machine partition and the parent virtual machine partition using the shared memory aperture. Those skilled in the art will appreciate that additional instruction sets may be used to capture various other aspects disclosed herein, and that the four presently disclosed subsets of instructions may differ in detail in accordance with the present invention.

The foregoing detailed description has set forth various embodiments of the systems and/or processes via examples and/or operational diagrams. To the extent that such block diagrams and/or examples contain one or more functions and/or operations, those skilled in the art will appreciate that each function and/or operation in such block diagrams or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof.

It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may implement or utilize the processes described in connection with the present invention, e.g., through the use of an API, reusable controls, or the like. Such programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.

While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the scope of the invention as set forth in the following claims. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

Claims

1. A method for transferring data between virtual machine partitions, the method comprising:

sending (1102), by a first virtual machine partition, a request for a memory allocation to a second virtual machine partition, wherein the allocation indicates a maximum buffer size for transmitted data;

mapping (1104) the requested memory allocation;

allocating (1106) requested memory in a shared memory aperture between modes in a first virtual machine partition and a second virtual machine partition, the shared memory aperture providing a direct bridge from one mode in the first virtual machine partition to another mode in the second virtual machine partition; and

data is transferred (1108) between the first virtual machine partition and the second virtual machine partition using the shared memory aperture.

2. The method of claim 1, wherein the first virtual machine partition is a child partition (800) and the second virtual machine partition is a parent partition (810).

3. The method of claim 2, wherein the mode comprises either a kernel mode (830) or a user mode (820).

4. The method of claim 1, wherein the maximum buffer size is agreed upon between a first virtual machine partition and a second virtual machine partition.

5. The method of claim 3, further comprising: the virtual service consumer device object is opened and a control message is sent to the virtual service consumer driver requesting that the user-mode or kernel-mode buffer allocated in the child partition be shared with the parent partition buffer.

6. The method of claim 5, wherein the virtual service consumer driver creates a list of memory descriptors for buffers allocated in the child partition guest OS memory space.

7. The method of claim 6, wherein the virtual service consumer driver translates the memory descriptor list into a guest physical address descriptor list describing system pages that have been allocated to the memory buffer by the hypervisor.

8. The method of claim 1, further comprising: the shared memory aperture is closed by unlocking, de-mapping, and releasing the shared memory aperture.

9. A system (100) for transferring data between virtual machine partitions, comprising:

a computing device comprising at least one processor;

a memory communicatively coupled to the processor when the system is run, the memory having computer instructions stored therein that, when executed by the at least one processor, cause:

sending (1222), by a first virtual machine partition, a request for a memory allocation to a second virtual machine partition, wherein the allocation indicates a maximum buffer size for transmitted data;

mapping (1224) the requested memory allocation;

allocating (1226) the requested memory in a shared memory aperture between the modes in the first virtual machine partition and the second virtual machine partition, the shared memory aperture providing a direct bridge from one mode in the first virtual machine partition to another mode in the second virtual machine partition; and

data is transferred (1228) between the first virtual machine partition and the second virtual machine partition using the shared memory aperture.

10. A system for transferring data between virtual machine partitions, comprising:

means for sending, by the child virtual machine partition, a request to the parent virtual machine partition for a memory allocation, wherein the allocation indicates a maximum buffer size for the transferred data;

means for mapping the requested memory allocation;

means for allocating requested memory in a shared memory aperture between modes in a child virtual machine partition and a parent virtual machine partition, the shared memory aperture providing a direct bridge from one mode in the child virtual machine partition to another mode in the parent virtual machine partition; and

means for transferring data between the child virtual machine partition and the parent virtual machine partition using the shared memory aperture.