[go: up one dir, main page]

CN117609115A - A method and system for lightweight virtualization based on RDMA technology - Google Patents

A method and system for lightweight virtualization based on RDMA technology Download PDF

Info

Publication number
CN117609115A
CN117609115A CN202311617099.5A CN202311617099A CN117609115A CN 117609115 A CN117609115 A CN 117609115A CN 202311617099 A CN202311617099 A CN 202311617099A CN 117609115 A CN117609115 A CN 117609115A
Authority
CN
China
Prior art keywords
rdma
virtualization
configuration
virtual machine
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311617099.5A
Other languages
Chinese (zh)
Other versions
CN117609115B (en
Inventor
于震江
郭兴
黄明亮
鄢贵海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yusur Technology Co ltd
Original Assignee
Yusur Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yusur Technology Co ltd filed Critical Yusur Technology Co ltd
Priority to CN202311617099.5A priority Critical patent/CN117609115B/en
Publication of CN117609115A publication Critical patent/CN117609115A/en
Application granted granted Critical
Publication of CN117609115B publication Critical patent/CN117609115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/105Program control for peripheral devices where the programme performs an input/output emulation function
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Stored Programmes (AREA)

Abstract

本发明提供一种基于RDMA技术实现轻量级虚拟化的方法和系统,所述方法包括:基于RDMA技术的RDMA设备与主机侧建立连接;对于主机侧创建的每个容器或虚拟机,分配一个RDMA设备接口;RDMA设备接口接收来自主机侧驱动的虚拟化配置;其中,所述虚拟化配置包含队列优先级配置;每个RDMA设备接口基于接收到的所述虚拟化配置对RDMA设备中的队列进行管理;容器或虚拟机利用所述MMIO寄存器映射到RDMA设备,以使容器或虚拟机能够基于RDMA设备接口访问RDMA设备,实现基于RDMA技术的虚拟化。本发明能够精简系统的复杂度,节省内存空间和配置空间的占用,降低成本。

The present invention provides a method and system for realizing lightweight virtualization based on RDMA technology. The method includes: establishing a connection between an RDMA device based on RDMA technology and the host side; and allocating a RDMA device interface; the RDMA device interface receives the virtualization configuration from the host-side driver; wherein the virtualization configuration includes the queue priority configuration; each RDMA device interface configures the queue in the RDMA device based on the received virtualization configuration. Management; the container or virtual machine uses the MMIO register to map to the RDMA device, so that the container or virtual machine can access the RDMA device based on the RDMA device interface and implement virtualization based on RDMA technology. The invention can simplify the complexity of the system, save the occupation of memory space and configuration space, and reduce the cost.

Description

一种基于RDMA技术实现轻量级虚拟化的方法和系统A method and system for lightweight virtualization based on RDMA technology

技术领域Technical field

本发明涉及RDMA设备虚拟化I/O技术领域,尤其涉及一种基于RDMA技术实现轻量级虚拟化的方法和系统。The present invention relates to the technical field of RDMA device virtualization I/O, and in particular to a method and system for realizing lightweight virtualization based on RDMA technology.

背景技术Background technique

传统的TCP/IP技术在数据包处理过程中,要经过操作系统及其他软件层,需要占用大量的服务器资源和内存总线带宽,数据在系统内存、处理器缓存和网络控制器缓存之间来回进行复制移动,给服务器的CPU和内存造成了沉重负担。尤其是网络带宽、处理器速度与内存带宽三者的严重“不匹配性”,更加剧了网络延迟效应。In the process of data packet processing, traditional TCP/IP technology has to go through the operating system and other software layers, which requires a large amount of server resources and memory bus bandwidth. Data travels back and forth between system memory, processor cache and network controller cache. Copy movement places a heavy burden on the server's CPU and memory. In particular, the serious "mismatch" between network bandwidth, processor speed and memory bandwidth further aggravates the network delay effect.

而RDMA技术的原理是利用栈旁路和零拷贝技术提供的低延迟特性,而减少了CPU占用,减少了内存带宽瓶颈,提供了更高的带宽利用率。RDMA技术有延迟很低、高吞吐、高效率和占用CPU资源很少等优势。RDMA技术提供了基于I/O的通道,这种通道允许一个应用程序通过RDMA设备对远程的虚拟内存进行直接读写。The principle of RDMA technology is to use the low-latency features provided by stack bypass and zero-copy technology to reduce CPU usage, reduce memory bandwidth bottlenecks, and provide higher bandwidth utilization. RDMA technology has the advantages of low latency, high throughput, high efficiency, and occupies very little CPU resources. RDMA technology provides an I/O-based channel that allows an application to directly read and write remote virtual memory through an RDMA device.

基于硬件的RDMA虚拟化I/O技术,目前主要采用PCIe的SR-IOV标准,SR-IOV标准允许物理功能(Physical Funciton,简称PF)创建多个虚拟功能(Virtual Function,简称VF),虚拟功能可以透传给虚拟机(简称VM),从而达到多个虚拟机共享同一个物理网卡。依据SR-IOV标准的内容,创建的新设备可允许将虚拟机直接连接到I/O设备,越过了虚拟机管理程序(Hypervisor)与虚拟交换机层,这样可以带来低延迟和接近线缆的速度。单个I/O资源可由多个虚拟机共享。共享的设备将提供专用的资源,并且还使用共享的通用资源。这样,每个虚拟机都可访问唯一的资源。因此,启用了SR-IOV标准并且具有适当的硬件和操作系统支持的PCIe设备可以显示为多个单独的物理设备,每个PCIe设备都具有自己的PCIe配置空间。基于SR-IOV标准进行虚拟化相对于IO模拟虚拟化,降低了I/O延时,降低了CPU利用率,数据不需要通过宿主机转发,因此SR-IOV提高了I/O性和数据安全性。Hardware-based RDMA virtualization I/O technology currently mainly uses the SR-IOV standard of PCIe. The SR-IOV standard allows a physical function (Physical Funciton, referred to as PF) to create multiple virtual functions (Virtual Function, referred to as VF). Virtual Function It can be transparently transmitted to a virtual machine (VM for short), so that multiple virtual machines can share the same physical network card. Based on the content of the SR-IOV standard, the new device is created to allow virtual machines to be connected directly to I/O devices, bypassing the hypervisor and virtual switch layers, which can bring low latency and close to cables. speed. A single I/O resource can be shared by multiple virtual machines. A shared device will provide dedicated resources and also use shared common resources. This way, each virtual machine has access to unique resources. Therefore, a PCIe device with the SR-IOV standard enabled and appropriate hardware and operating system support can appear as multiple separate physical devices, each with its own PCIe configuration space. Compared with IO simulation virtualization, virtualization based on the SR-IOV standard reduces I/O latency and CPU utilization. Data does not need to be forwarded through the host, so SR-IOV improves I/O performance and data security. sex.

但是,基于SR-IOV标准进行的RDMA虚拟化I/O的技术存在一些缺点:SR-IOV标准通过硬件虚拟出VF设备(Visual Function),占用独立的配置空间,内存映射I/O空间(MMIO),每个虚拟设备都有独立的中断向量配置,使得系统会占用更多的内存空间和配置空间资源,系统复杂度升高,实现成本高。However, the RDMA virtualized I/O technology based on the SR-IOV standard has some shortcomings: the SR-IOV standard virtualizes the VF device (Visual Function) through hardware, which occupies an independent configuration space, and the memory mapped I/O space (MMIO ), each virtual device has an independent interrupt vector configuration, which causes the system to occupy more memory space and configuration space resources, increasing system complexity and high implementation costs.

发明内容Contents of the invention

鉴于此,本发明实施例提供了一种基于RDMA技术实现轻量级虚拟化的方法和系统,以消除或改善现有技术中存在的一个或更多个缺陷。In view of this, embodiments of the present invention provide a method and system for implementing lightweight virtualization based on RDMA technology to eliminate or improve one or more defects existing in the existing technology.

本发明的一个方面提供了一种基于RDMA技术实现轻量级虚拟化的方法,该方法包括以下步骤:One aspect of the present invention provides a method for implementing lightweight virtualization based on RDMA technology. The method includes the following steps:

基于RDMA技术的RDMA设备与主机侧建立连接;//PCIe,服务器The RDMA device based on RDMA technology establishes a connection with the host side; //PCIe, server

对于主机侧创建的每个容器或虚拟机,分配一个RDMA设备接口;其中,主机侧操作系统基于虚拟化功能创建有多个容器或虚拟机,RDMA设备接口中包含以物理页为单位的MMIO寄存器;For each container or virtual machine created on the host side, an RDMA device interface is allocated; among them, the host-side operating system creates multiple containers or virtual machines based on the virtualization function, and the RDMA device interface contains MMIO registers in units of physical pages. ;

RDMA设备接口接收来自主机侧驱动的虚拟化配置;其中,所述虚拟化配置包含队列优先级配置;The RDMA device interface receives the virtualization configuration from the host-side driver; wherein the virtualization configuration includes queue priority configuration;

每个RDMA设备接口基于接收到的所述虚拟化配置对RDMA设备中的队列进行管理;Each RDMA device interface manages the queue in the RDMA device based on the received virtualization configuration;

容器或虚拟机利用所述MMIO寄存器映射到RDMA设备,以使容器或虚拟机能够基于RDMA设备接口访问RDMA设备,实现基于RDMA技术的虚拟化。The container or virtual machine uses the MMIO register to map to the RDMA device, so that the container or virtual machine can access the RDMA device based on the RDMA device interface and implement virtualization based on RDMA technology.

在本发明的一些实施例中,所述RDMA设备的类型包括RDMA网卡和支持RDMA功能的DPU加速卡。In some embodiments of the present invention, the type of RDMA device includes an RDMA network card and a DPU accelerator card that supports RDMA functions.

在本发明的一些实施例中,该方法还包括:在主机侧创建的每个容器或虚拟机中,使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动,以将RDMA设备虚拟化为对每个容器或虚拟机分别提供服务的虚拟RDMA设备。In some embodiments of the present invention, the method further includes: in each container or virtual machine created on the host side, using virtualization device operating software to manage the RDMA device interface and the host-side driver to virtualize the RDMA device as a pair A virtual RDMA device that serves each container or virtual machine separately.

在本发明的一些实施例中,所述使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动的步骤包括:在主机侧,所述虚拟化设备操作软件向主机侧驱动发送配置指令,从而使主机侧驱动生成针对各个RDMA设备接口的虚拟化配置并发送到RDMA设备。In some embodiments of the present invention, the step of using the virtualization device operating software to manage the RDMA device interface and the host-side driver includes: on the host side, the virtualization device operating software sends a configuration instruction to the host-side driver, so that The host-side driver generates virtualization configurations for each RDMA device interface and sends them to the RDMA device.

在本发明的一些实施例中,所述使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动的步骤还包括:虚拟化设备操作软件通过映射RDMA设备接口所在的物理页,管理RDMA侧的资源,从而为每个容器或虚拟机抽象出一个RDMA设备。In some embodiments of the present invention, the step of using the virtualization device operating software to manage the RDMA device interface and the host-side driver also includes: the virtualization device operating software manages the resources on the RDMA side by mapping the physical page where the RDMA device interface is located. , thus abstracting an RDMA device for each container or virtual machine.

在本发明的一些实施例中,所述虚拟化配置还包括调度配置、虚拟设备配置、软件接口配置、流量控制配置、MAC地址配置和路由配置中的多种。In some embodiments of the present invention, the virtualization configuration also includes multiple types of scheduling configuration, virtual device configuration, software interface configuration, flow control configuration, MAC address configuration and routing configuration.

在本发明的一些实施例中,在所述容器和虚拟机中包含用于存储收发数据的虚拟内存区域和用于收发数据的队列对,在RDMA设备中的每个RDMA设备接口管理完成队列用于收发数据,基于虚拟内存区域、队列对和完成队列实现远程虚拟内存的直接访问。In some embodiments of the present invention, the container and the virtual machine include a virtual memory area for storing transceiver data and a queue pair for transceiver data, and each RDMA device interface in the RDMA device manages the queue using For sending and receiving data, direct access to remote virtual memory is realized based on virtual memory area, queue pair and completion queue.

在本发明的一些实施例中,在所述每个RDMA设备接口基于接收到的所述虚拟化配置对RDMA设备中的队列进行管理的步骤中,该方法还包括:各个容器或虚拟机通过独立配置的RDMA设备接口,配置队列为独占或共享。In some embodiments of the present invention, in the step of each RDMA device interface managing the queue in the RDMA device based on the received virtualization configuration, the method further includes: each container or virtual machine passes an independent Configure the RDMA device interface and configure the queue to be exclusive or shared.

本发明的另一方面提供了一种基于RDMA技术实现轻量级虚拟化的系统,包括处理器和存储器,存储器中存储有计算机指令,处理器用于执行存储器中存储的计算机指令,当计算机指令被处理器执行时该系统实现如上实施例中任一项所述方法的步骤。Another aspect of the present invention provides a system for implementing lightweight virtualization based on RDMA technology, including a processor and a memory. Computer instructions are stored in the memory. The processor is used to execute the computer instructions stored in the memory. When the computer instructions are When the processor is executed, the system implements the steps of the method described in any one of the above embodiments.

本发明的另一方面提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上实施例中任一项所述方法的步骤。Another aspect of the present invention provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the method described in any one of the above embodiments are implemented.

本发明所提出的基于RDMA技术实现轻量级虚拟化的方法和系统,能够使用基于RDMA技术的RDMA设备,辅助构建虚拟化的I/O处理环境,减轻主机侧的压力,一方面通过主机侧操作系统驱动统一配置RDMA设备,另一方面通过为每个容器或虚拟机单独分配的RDMA设备接口实现独立的访问,从而精简系统的复杂度,节省内存空间和配置空间的占用,降低成本,减轻主机侧CPU的负载压力。The method and system for realizing lightweight virtualization based on RDMA technology proposed by the present invention can use RDMA devices based on RDMA technology to assist in building a virtualized I/O processing environment and reduce the pressure on the host side. On the one hand, through the host side The operating system driver uniformly configures RDMA devices. On the other hand, independent access is achieved through the RDMA device interface allocated separately for each container or virtual machine, thereby simplifying the complexity of the system, saving memory space and configuration space, reducing costs, and The load pressure of the host-side CPU.

本发明的附加优点、目的,以及特征将在下面的描述中将部分地加以阐述,且将对于本领域普通技术人员在研究下文后部分地变得明显,或者可以根据本发明的实践而获知。本发明的目的和其它优点可以通过在说明书以及附图中具体指出的结构实现到并获得。Additional advantages, objects, and features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the specification and drawings.

本领域技术人员将会理解的是,能够用本发明实现的目的和优点不限于以上具体所述,并且根据以下详细说明将更清楚地理解本发明能够实现的上述和其他目的。Those skilled in the art will understand that the objectives and advantages that can be achieved with the present invention are not limited to the specific description above, and the above and other objectives that can be achieved with the present invention will be more clearly understood from the following detailed description.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,并不构成对本发明的限定。在附图中:The drawings described here are used to provide a further understanding of the present invention, constitute a part of this application, and do not constitute a limitation of the present invention. In the attached picture:

图1为本发明一实施例中基于RDMA技术实现轻量级虚拟化的方法流程图。Figure 1 is a flow chart of a method for implementing lightweight virtualization based on RDMA technology in an embodiment of the present invention.

图2为本发明一实施例中基于RDMA技术实现轻量级虚拟化的系统结构示意图。Figure 2 is a schematic structural diagram of a system for implementing lightweight virtualization based on RDMA technology in an embodiment of the present invention.

图3为本发明一实施例中基于RDMA设备实现虚拟化I/O的系统结构示意图。Figure 3 is a schematic structural diagram of a system for implementing virtualized I/O based on RDMA devices in an embodiment of the present invention.

图4为本发明一实施例中的RDMA网卡虚拟化配置关系图。Figure 4 is a diagram showing the RDMA network card virtualization configuration relationship in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,下面结合实施方式和附图,对本发明做进一步详细说明。在此,本发明的示意性实施方式及其说明用于解释本发明,但并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the embodiments and drawings. Here, the illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but are not used to limit the present invention.

在此,还需要说明的是,为了避免因不必要的细节而模糊了本发明,在附图中仅仅示出了与根据本发明的方案密切相关的结构和/或处理步骤,而省略了与本发明关系不大的其他细节。Here, it should also be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and the details related to them are omitted. Other details are less relevant to the invention.

应该强调,术语“包括/包含”在本文使用时指特征、要素、步骤或组件的存在,但并不排除一个或更多个其它特征、要素、步骤或组件的存在或附加。It should be emphasized that the term "comprising" when used herein refers to the presence of features, elements, steps or components but does not exclude the presence or addition of one or more other features, elements, steps or components.

在此,还需要说明的是,如果没有特殊说明,术语“连接”在本文不仅可以指直接连接,也可以表示存在中间物的间接连接。Here, it should also be noted that, unless otherwise specified, the term "connection" in this article may not only refer to a direct connection, but may also refer to an indirect connection with an intermediate.

在下文中,将参考附图描述本发明的实施例。在附图中,相同的附图标记代表相同或类似的部件,或者相同或类似的步骤。Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

为了在现有技术的基础上,提供一种新的虚拟化技术解决方案,从而实现尽可能少的占用主机侧CPU资源和内存资源,并且灵活的进行IO虚拟化。本发明提出了一种基于RDMA技术实现轻量级虚拟化的方法。本发明旨在提供一种新的解决方案,该方案占用较少的CPU资源,内存资源,提供一套灵活的IO虚拟化方案,本发明能够降低硬件复杂度,降低开发和使用成本,有较好扩展性和灵活的配置特性,可以应对客户灵活多变的需求。基于RDMA轻量级虚拟化IO实现相对于SR-IOV技术规范,占用较少的内存资源,简化硬件实现复杂度,消除了SR-IOV管理配置VF时,会对同时使用同一台RDMA设备产生的影响,本发明占用较少的资源,简化设计复杂度,灵活配置,适用于灵活多变的用户需求。In order to provide a new virtualization technology solution based on the existing technology, so as to occupy as little as possible host-side CPU resources and memory resources, and flexibly perform IO virtualization. The present invention proposes a method for realizing lightweight virtualization based on RDMA technology. The present invention aims to provide a new solution, which occupies less CPU resources and memory resources, and provides a flexible IO virtualization solution. The present invention can reduce hardware complexity, reduce development and use costs, and has greater Good scalability and flexible configuration features can meet the flexible and changing needs of customers. Compared with the SR-IOV technical specifications, the lightweight virtualized IO implementation based on RDMA occupies less memory resources, simplifies the hardware implementation complexity, and eliminates the problems caused by using the same RDMA device at the same time when SR-IOV manages and configures VF. Impact: The present invention occupies less resources, simplifies design complexity, is flexibly configured, and is suitable for flexible and changeable user needs.

其中,远程直接内存访问(Remote Direct Memory Access,简称RDMA)技术,就是为了解决网络传输中服务器端数据处理的延迟而产生的。RDMA技术通过网络将资料直接传入计算机的存储区,将数据从一个系统快速移动到远程系统存储器中,而不对操作系统造成任何影响,有点是能节省计算机CPU运算资源,减少功耗。RDMA技术消除了外部存储器复制和上下文切换的CPU开销,能较大幅度改进应用系统性能。RDMA技术是一种绕过远程主机操作系统内核访问其内存中数据的技术。RDMA技术可以直接将数据从一台计算机的内存传输到另一台计算机,无需经过双方操作系统的介入,因此可以实现低延迟的网络通信。虚拟机(Virtual Machine,简称VMs),是一种虚拟环境,旨在模拟完整的物理计算机系统,以便在其中运行不同的操作系统和应用程序。容器(Containers)技术是一种将应用程序及其依赖项打包在一起并在隔离的环境中运行的技术。容器技术与虚拟机技术类似,但它使用操作系统的虚拟化功能来创建容器,而不是创建完整的虚拟机。这使得容器更加轻量级、更高效,并且更容易部署和管理。容器(Containers)的概念始于1979年的UNIX chroot,它是一个UNIX操作系统上的一个用于系统调用的命令行工具,用于将一个进程及其子进程的根目录改变到文件系统中的一个新位置,让这些进程只能访问到该目录。这个功能的想法是为每个进程提供独立的磁盘空间。其后在1982年,它被加入到了BSD系统(UNIX操作系统的一个分支)中。Among them, Remote Direct Memory Access (RDMA) technology was developed to solve the delay of server-side data processing in network transmission. RDMA technology transfers data directly into the computer's storage area through the network, quickly moving data from one system to the remote system memory without causing any impact on the operating system. It can save computer CPU computing resources and reduce power consumption. RDMA technology eliminates the CPU overhead of external memory copying and context switching, and can greatly improve application system performance. RDMA technology is a technology that bypasses the kernel of the remote host operating system to access data in its memory. RDMA technology can directly transfer data from the memory of one computer to another without the intervention of the operating systems of both parties, so it can achieve low-latency network communication. Virtual Machine (VMs for short) is a virtual environment designed to simulate a complete physical computer system in order to run different operating systems and applications. Containers (Containers) technology is a technology that packages applications and their dependencies together and runs in an isolated environment. Container technology is similar to virtual machine technology, but it uses the virtualization capabilities of the operating system to create containers instead of creating complete virtual machines. This makes containers more lightweight, more efficient, and easier to deploy and manage. The concept of containers began with UNIX chroot in 1979. It is a command line tool for system calls on a UNIX operating system. It is used to change the root directory of a process and its sub-processes to the file system. A new location so that these processes can only access this directory. The idea of this feature is to provide each process with independent disk space. Later in 1982, it was added to the BSD system (a branch of the UNIX operating system).

在现有技术中所采用的单根I/O虚拟化(Single Root I/O Virtualization,简称SR-IOV)标准,是由PCI-SIG组织推出的一项PCIe规范的扩展标准,目的是通过提供一种标准规范,为虚拟机提供独立的内存空间、中断和DMA数据流。SR-IOV标准定义了一种PCIe设备虚拟化技术的标准机制,是I/O虚拟化的一种技术实现,用于将一个物理PCIe设备分割成多个虚拟PCIe设备,每个虚拟PCIe设备都具有PCIe配置空间,如同物理PCIe设备一样为虚拟机提供服务,分割出的多个虚拟PCIe设备可以透传给一个或多个虚拟机,以达到分割物理PCIe设备及在虚拟机共享PCIe设备的目的,设备分割由硬件实现,有独立的内存空间、中断、DMA数据流,因此虚拟机可以获得与本机性能相当的I/O性能。SR-IOV标准允许将单个PCIe设备虚拟成多个PCIe设备,每个虚拟PCIe设备都具有自己的PCIe配置空间,如同物理PCIe设备一样为上层软件提供服务。其中,PF(Physical Function,物理功能)是SR-IOV设备的完整所能,具有独立的PCI设备ID、地址空间和中断,PF负责管理VF(Visual Function)的创建和删除,以及VF之间的通信。VF是PF的一部分,具有与PF相同的功能,但资源有限,VF可以分配给虚拟机或容器使用。SR-IOV标准还存在的问题包括:(1)基于SR-IOV标准通过硬件虚拟出VF设备,都有独立的配置空间和PCIe BAR空间,会占用更多的内存空间资源,随着线性增长的硬件资源需求,虚拟VF设备的扩展受到限制。(2)基于SR-IOV标准通过硬件虚拟VF设备,其创建和删除会影响同一个物理卡上的其他VF设备,配置策略在系统部署阶段就需要确定,调整的灵活度不够,在实际使用中,用户需求随时发生变化,不能满足用户灵活多变的应用场景。The Single Root I/O Virtualization (SR-IOV) standard used in the existing technology is an extended standard of the PCIe specification launched by the PCI-SIG organization, with the purpose of providing A standard specification that provides independent memory space, interrupts and DMA data streams for virtual machines. The SR-IOV standard defines a standard mechanism for PCIe device virtualization technology. It is a technical implementation of I/O virtualization and is used to divide a physical PCIe device into multiple virtual PCIe devices. Each virtual PCIe device It has PCIe configuration space and provides services to virtual machines just like physical PCIe devices. Multiple virtual PCIe devices can be transparently transmitted to one or more virtual machines to achieve the purpose of dividing physical PCIe devices and sharing PCIe devices in virtual machines. , device segmentation is implemented by hardware, with independent memory space, interrupts, and DMA data streams, so the virtual machine can obtain I/O performance comparable to the native machine's performance. The SR-IOV standard allows a single PCIe device to be virtualized into multiple PCIe devices. Each virtual PCIe device has its own PCIe configuration space and provides services to upper-layer software just like a physical PCIe device. Among them, PF (Physical Function) is the complete capability of the SR-IOV device, with independent PCI device ID, address space and interrupts. PF is responsible for managing the creation and deletion of VF (Visual Function), as well as the communication between VFs. communication. VF is part of PF and has the same functions as PF, but has limited resources. VF can be allocated to virtual machines or containers. Problems that still exist in the SR-IOV standard include: (1) VF devices virtualized through hardware based on the SR-IOV standard have independent configuration space and PCIe BAR space, which will occupy more memory space resources. With the linear growth Due to hardware resource requirements, the expansion of virtual VF devices is limited. (2) Based on the SR-IOV standard, the hardware virtual VF device is created and deleted. Its creation and deletion will affect other VF devices on the same physical card. The configuration strategy needs to be determined during the system deployment stage. The adjustment flexibility is not enough. In actual use, , user needs change at any time, and cannot meet users' flexible application scenarios.

图1为本发明一实施例中基于RDMA技术实现轻量级虚拟化的方法流程图,该方法包含以下步骤:Figure 1 is a flow chart of a method for implementing lightweight virtualization based on RDMA technology in an embodiment of the present invention. The method includes the following steps:

步骤S110:基于RDMA技术的RDMA设备与主机侧建立连接。Step S110: The RDMA device based on RDMA technology establishes a connection with the host side.

在具体实施过程中,所述RDMA设备可以使用支持RDMA技术的DPU(DataProcessing Unit)设备,主机侧可以是服务器端,使用RDMA设备通过虚拟化技术对服务器端的CPU处理压力进行分担。通常,支持RDMA技术的DPU设备可以通过PCIe接口连接到主机侧。In the specific implementation process, the RDMA device can use a DPU (Data Processing Unit) device that supports RDMA technology. The host side can be the server side. The RDMA device can be used to share the CPU processing pressure of the server side through virtualization technology. Generally, DPU devices that support RDMA technology can be connected to the host side through the PCIe interface.

步骤S120:对于主机侧创建的每个容器或虚拟机,分配一个RDMA设备接口;其中,主机侧操作系统基于虚拟化功能创建有多个容器或虚拟机,RDMA设备接口中包含以物理页为单位的MMIO寄存器(Memory Mapped IO)。Step S120: For each container or virtual machine created on the host side, allocate an RDMA device interface; wherein, the host side operating system creates multiple containers or virtual machines based on the virtualization function, and the RDMA device interface contains physical pages as units. MMIO register (Memory Mapped IO).

其中,RDMA设备接口(dev interface)位于RDMA设备的BAR空间(Base AddressRegister,基地址寄存器),基地址寄存器存在于配置空间中,用于确定不同的容器或虚拟机所需的内存空间的大小,并映射到函数内存空间提供基地址。基地址寄存器可以映射到存储器空间或IO空间。Among them, the RDMA device interface (dev interface) is located in the BAR space (Base Address Register) of the RDMA device. The base address register exists in the configuration space and is used to determine the size of the memory space required by different containers or virtual machines. And mapped to the function memory space to provide the base address. The base address register can be mapped to memory space or IO space.

步骤S130:RDMA设备接口接收来自主机侧驱动的虚拟化配置;其中,所述虚拟化配置包含队列优先级配置。Step S130: The RDMA device interface receives the virtualization configuration from the host-side driver; wherein the virtualization configuration includes queue priority configuration.

步骤S140:每个RDMA设备接口基于接收到的所述虚拟化配置对RDMA设备中的队列进行管理。Step S140: Each RDMA device interface manages the queue in the RDMA device based on the received virtualization configuration.

在具体实施过程中,RDMA设备接口对RDMA设备中进行管理的队列包含完成队列,在虚拟机或容器中包含队列对,队列对(Queue Pair,QP)由发送队列(Send Queue,SQ)和接收队列(Receive Queue,RQ)组成,基于队列对和完成队列能够实现基于RDMA技术的虚拟化I/O,从而减轻主机侧的CPU负载压力。In the specific implementation process, the queues managed by the RDMA device interface in the RDMA device include completion queues, and queue pairs in the virtual machine or container. The queue pair (Queue Pair, QP) consists of the send queue (Send Queue, SQ) and the receiving queue. It consists of a queue (Receive Queue, RQ). Based on queue pairs and completion queues, virtualized I/O based on RDMA technology can be implemented, thereby reducing the CPU load pressure on the host side.

步骤S150:容器或虚拟机利用所述MMIO寄存器映射到RDMA设备,以使容器或虚拟机能够基于RDMA设备接口访问RDMA设备,实现基于RDMA技术的虚拟化。Step S150: The container or virtual machine uses the MMIO register to map to the RDMA device, so that the container or virtual machine can access the RDMA device based on the RDMA device interface and implement virtualization based on RDMA technology.

利用本发明所提出的基于RDMA技术实现轻量级虚拟化的方法和系统,能够使用基于RDMA技术的RDMA设备,辅助构建虚拟化的I/O处理环境,减轻主机侧的压力,一方面通过主机侧操作系统驱动统一配置RDMA设备,另一方面通过为每个容器或虚拟机单独分配的RDMA设备接口实现独立的访问,从而精简系统的复杂度,节省内存空间和配置空间的占用,降低成本,减轻主机侧CPU的负载压力。By utilizing the method and system proposed by the present invention to realize lightweight virtualization based on RDMA technology, RDMA devices based on RDMA technology can be used to assist in building a virtualized I/O processing environment and reduce the pressure on the host side. On the one hand, through the host On the one hand, the operating system driver uniformly configures RDMA devices. On the other hand, independent access is achieved through the RDMA device interface allocated separately for each container or virtual machine, thus simplifying the complexity of the system, saving memory space and configuration space, and reducing costs. Reduce the load pressure on the host side CPU.

在本发明一些实施例中,所述RDMA设备的类型包括RDMA网卡和支持RDMA功能的DPU加速卡。RDMA设备是指支持RDMA协议的设备,RDMA协议是一种高性能的网络协议,可以让两个系统直接访问彼此的存储器,而无需CPU的参与。RDMA设备包含以下几类:(1)RDMA网卡,支持RDMA协议的网络接口卡,具有独立的DMA控制器,可以直接访问存储器。(2)RDMA存储设备,具有高性能的网络接口,可以与RDMA网卡直接通信。(3)RDMA交换机,可以将RDMA设备连接起来,形成RDMA网络。(4)支持RDMA功能的DPU加速卡。以上RDMA设备的类型仅为示例,实际上只要是支持RDMA协议的主机侧可装载的RDMA设备均可通过一定的预先设置应用本方案。In some embodiments of the present invention, the type of RDMA device includes an RDMA network card and a DPU accelerator card that supports RDMA functions. RDMA devices refer to devices that support the RDMA protocol. The RDMA protocol is a high-performance network protocol that allows two systems to directly access each other's memory without the involvement of the CPU. RDMA devices include the following categories: (1) RDMA network card, a network interface card that supports the RDMA protocol, has an independent DMA controller and can directly access the memory. (2) RDMA storage device has a high-performance network interface and can communicate directly with the RDMA network card. (3) RDMA switch, which can connect RDMA devices to form an RDMA network. (4) DPU accelerator card that supports RDMA function. The above types of RDMA devices are only examples. In fact, any RDMA device that can be loaded on the host side that supports the RDMA protocol can apply this solution through certain pre-settings.

采用该发明实施例,扩展了本方案的应用范围,相比于现有的SR-IOV标准,通过RDMA设备能够降低主机侧CPU的压力,该方案的计算复杂度也显著低于基于SR-IOV标准的虚拟化方案。Adopting this embodiment of the invention expands the application scope of this solution. Compared with the existing SR-IOV standard, the RDMA device can reduce the pressure on the host-side CPU, and the computational complexity of this solution is also significantly lower than that based on SR-IOV. Standard virtualization solution.

在本发明一些实施例中,该方法还包括:在主机侧创建的每个容器或虚拟机中,使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动,以将RDMA设备虚拟化为对每个容器或虚拟机分别提供服务的虚拟RDMA设备。在具体实施过程中,该虚拟化设备操作软件可以简称为VDEV,VDEV是一个安装在虚拟机环境下的软件程序,可以基于VDEV对主机侧驱动进行配置,从而使主机侧驱动对RDMA设备进行配置,同时基于VDEV也可以访问每个容器或虚拟机对应的RDMA设备接口,从而访问RDMA设备。In some embodiments of the present invention, the method further includes: using virtualization device operating software to manage the RDMA device interface and host-side driver in each container or virtual machine created on the host side, so as to virtualize the RDMA device to each container or virtual machine. Virtual RDMA devices that provide services to each container or virtual machine respectively. During the specific implementation process, the virtualization device operating software can be referred to as VDEV. VDEV is a software program installed in a virtual machine environment. It can configure the host-side driver based on VDEV, so that the host-side driver can configure the RDMA device. , and at the same time, based on VDEV, you can also access the RDMA device interface corresponding to each container or virtual machine, thereby accessing the RDMA device.

采用该发明实施例,可以在容器或虚拟机环境下直观的设置RDMA设备和访问RDMA设备接口,进而通过RDMA设备实现容器或虚拟机环境下的I/O任务,例如在主机侧为服务器时执行数据包的转发等任务。Using the embodiments of the invention, RDMA devices can be intuitively set up and accessed to the RDMA device interface in a container or virtual machine environment, and then I/O tasks in the container or virtual machine environment can be implemented through the RDMA device, such as when the host side is a server. Data packet forwarding and other tasks.

进一步地,在本发明一些实施例中,所述使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动的步骤包括:在主机侧,所述虚拟化设备操作软件向主机侧驱动发送配置指令,从而使主机侧驱动生成针对各个RDMA设备接口的虚拟化配置并发送到RDMA设备。具体地,多个虚拟化设备操作软件VDEV之间可以互相沟通对RDMA设备的资源进行访问和共享,共享方式可以通过设置队列为共享或独占的方式实现。Further, in some embodiments of the present invention, the step of using the virtualization device operating software to manage the RDMA device interface and the host-side driver includes: on the host side, the virtualization device operating software sends a configuration instruction to the host-side driver, This enables the host-side driver to generate virtualization configurations for each RDMA device interface and send them to the RDMA device. Specifically, multiple virtualization device operating software VDEVs can communicate with each other to access and share the resources of the RDMA device. The sharing method can be achieved by setting the queue to be shared or exclusive.

在本发明一些实施例中,所述使用虚拟化设备操作软件管理RDMA设备接口和主机侧驱动的步骤还包括:虚拟化设备操作软件通过映射RDMA设备接口所在的物理页,管理RDMA侧的资源,从而为每个容器或虚拟机抽象出一个RDMA设备。In some embodiments of the present invention, the step of using the virtualization device operating software to manage the RDMA device interface and the host-side driver also includes: the virtualization device operating software manages the resources on the RDMA side by mapping the physical page where the RDMA device interface is located, This abstracts an RDMA device for each container or virtual machine.

在本发明一些实施例中,所述虚拟化配置还包括调度配置、虚拟设备配置、软件接口配置、流量控制配置、MAC地址配置和路由配置中的多种。其中,所述虚拟设备配置基于RDMA标准进行设定,所述流量控制配置、MAC地址配置和路由配置基于主机侧(或服务器端)的网络传输情况进行配置,所述软件接口配置指的是对RDMA设备接口的设置。主机侧驱动通过对RDMA设备进行配置,可以配置BAR空间中的RDMA设备接口、流量管理和队列优先级,所述队列包含RDMA设备中的完成队列(Completion Queue,CQ)。In some embodiments of the present invention, the virtualization configuration also includes multiple types of scheduling configuration, virtual device configuration, software interface configuration, flow control configuration, MAC address configuration and routing configuration. Wherein, the virtual device configuration is set based on the RDMA standard, the flow control configuration, MAC address configuration and routing configuration are configured based on the network transmission situation on the host side (or server side), and the software interface configuration refers to RDMA device interface settings. The host-side driver can configure the RDMA device interface, traffic management and queue priority in the BAR space by configuring the RDMA device. The queue includes the completion queue (Completion Queue, CQ) in the RDMA device.

采用该发明实施例,能够对RDMA设备进行全面的配置,从而实现基于RDMA设备的虚拟化I/O,减轻服务器端的压力,并相比于现有的SR-IOV标准降低系统复杂度。Using the embodiments of the invention, the RDMA device can be comprehensively configured, thereby realizing virtualized I/O based on the RDMA device, reducing the pressure on the server side, and reducing system complexity compared with the existing SR-IOV standard.

在本发明一些实施例中,在所述容器和虚拟机中包含用于存储收发数据的虚拟内存区域和用于收发数据的队列对,在RDMA设备中的每个RDMA设备接口管理完成队列用于收发数据,基于虚拟内存区域、队列对和完成队列实现远程虚拟内存的直接访问。In some embodiments of the present invention, the container and the virtual machine include a virtual memory area for storing transceiver data and a queue pair for transceiver data, and each RDMA device interface in the RDMA device manages a completion queue for Send and receive data, and realize direct access to remote virtual memory based on virtual memory area, queue pair and completion queue.

在RDMA标准中提出了基于队列实现RDMA传输的方法,本方案在此不再具体陈述。The RDMA standard proposes a queue-based RDMA transmission method, and this solution will not be described in detail here.

在本发明一些实施例中,在所述每个RDMA设备接口基于接收到的所述虚拟化配置对RDMA设备中的队列进行管理的步骤中,该方法还包括:各个容器或虚拟机通过独立配置的RDMA设备接口,配置队列为独占或共享。In some embodiments of the present invention, in the step of each RDMA device interface managing the queue in the RDMA device based on the received virtualization configuration, the method further includes: each container or virtual machine configures RDMA device interface, configure the queue as exclusive or shared.

采用该发明实施例,相比于现有的SR-IOV标准,可以提高基于队列进行I/O处理的使用效率和灵活性。By adopting this embodiment of the invention, compared with the existing SR-IOV standard, the efficiency and flexibility of queue-based I/O processing can be improved.

图2为本发明一实施例中基于RDMA技术实现轻量级虚拟化的系统结构示意图,该系统由容器(或虚拟机)、主机和RDMA设备组成。主机侧驱动实现对RDMA设备的配置,对RDMA设备接口(RDMA dev interface)的管理配置、流量配置和优先级配置等操作。容器(或虚拟机)可以直接通过RDMA设备接口访问RDMA设备,以达到与基于SR-IOV标准虚拟化方案同样的I/O性能。RDMA设备接口与SR-IOV标准中的虚拟功能(Visual Function,VF)相似,RDMA设备接口中包含以物理页为单位的MMIO寄存器,容器(或虚拟机)映射时,以物理页形式进行映射,以便能达到内存隔离,保证数据安全性。基于主机侧驱动(Host driver)可以实现RDMA设备接口配置、流量管理配置和队列优先级管理等。RDMA设备接口是一段物理内存,其中还定义了较多的功能寄存器,本方案中不做展开讨论。Figure 2 is a schematic structural diagram of a system for implementing lightweight virtualization based on RDMA technology in an embodiment of the present invention. The system consists of a container (or virtual machine), a host and an RDMA device. The host-side driver implements operations such as configuration of RDMA devices, management configuration, traffic configuration and priority configuration of the RDMA device interface (RDMA dev interface). Containers (or virtual machines) can directly access RDMA devices through the RDMA device interface to achieve the same I/O performance as virtualization solutions based on the SR-IOV standard. The RDMA device interface is similar to the virtual function (VF) in the SR-IOV standard. The RDMA device interface contains MMIO registers in units of physical pages. When the container (or virtual machine) is mapped, it is mapped in the form of physical pages. In order to achieve memory isolation and ensure data security. Based on the host driver (Host driver), RDMA device interface configuration, traffic management configuration, queue priority management, etc. can be implemented. The RDMA device interface is a piece of physical memory, which also defines many functional registers, which will not be discussed in this plan.

图2中所示的虚拟监视器是一种软件程序,可以将一台物理机虚拟成多个虚拟机,每个虚拟机都运行着自己的操作系统和应用程序。VMM的功能包括创建虚拟机、运行虚拟机和管理虚拟机,其中,创建虚拟机包括为虚拟机分配资源,例如CPU、内存、存储和网络,运行虚拟机包括为虚拟机提供运行环境,管理虚拟机包括启动、停止、重启和迁移虚拟机。The virtual monitor shown in Figure 2 is a software program that can virtualize a physical machine into multiple virtual machines, each running its own operating system and applications. The functions of VMM include creating, running and managing virtual machines. Creating a virtual machine includes allocating resources to the virtual machine, such as CPU, memory, storage and network. Running a virtual machine includes providing a running environment for the virtual machine and managing the virtual machine. Machines include starting, stopping, restarting and migrating virtual machines.

图3为本发明一实施例中基于RDMA设备实现虚拟化I/O的系统结构示意图,在图3中的VDEV是位于容器(或虚拟机)中的用于管理资源的操作软件,是映射RDMA设备接口所在的物理页产生的,VDEV可以相当于一个RDMA设备,可以通过创建queue pairs/context/completion queue,等同于RDMA设备。RDMA设备接口是维护硬件和软件之间的queuepairs/context/completion queue资源的管理者,RDMA设备接口通过对RDMA设备中的队列进行管理可以实现共享配置。其中,VDEV(Visual DEVICE),是虚拟设备的缩写,它是虚拟化软件中的一个逻辑概念,用于表示虚拟机的存储设备。VDEV可以是多个物理存储设备组成,也可以一个物理存储设备中包含多个VDEV,每个VDEV是一个逻辑存储设备,用于虚拟机的存储。Figure 3 is a schematic structural diagram of a system that implements virtualized I/O based on RDMA devices in an embodiment of the present invention. The VDEV in Figure 3 is the operating software located in the container (or virtual machine) for managing resources and is the mapping RDMA Generated from the physical page where the device interface is located, VDEV can be equivalent to an RDMA device. It can be equivalent to an RDMA device by creating queue pairs/context/completion queue. The RDMA device interface is the manager that maintains the queuepairs/context/completion queue resources between hardware and software. The RDMA device interface can achieve shared configuration by managing the queues in the RDMA device. Among them, VDEV (Visual DEVICE) is the abbreviation of virtual device. It is a logical concept in virtualization software and is used to represent the storage device of a virtual machine. A VDEV can be composed of multiple physical storage devices, or one physical storage device can contain multiple VDEVs. Each VDEV is a logical storage device used to store virtual machines.

此外,图3中示出了管理域PD(Physical Disk,又称物理磁盘),PD编号物理上是唯一的,逻辑上可以重复,PD是虚拟机存储的基础,虚拟机的操作系统和数据都存储在PD上。队列对(QP)是I/O通信的资源,每个域中可以配置不同数量的队列对,RDMA设备接口可以根据用户需求灵活配置队列对资源。MR(Memory Region),中文翻译为RDMA内存区域,指的是由RDMA软件层在内存中规划出的一片区域,用于存放收发的数据。在IB(InfiniBand)协议中,用户在申请完用于存放数据的内存区域之后,都需要通过调用IB框架提供的API注册MR,才能让RDMA网卡访问这片内存区域。主机侧驱动(Host driver)与虚拟机端驱动相对,可以实现对dev interface、流量、队列优先级的管理和配置,同时虚拟机可以通过VDEV访问主机端驱动,通过主机端驱动配置RDMA设备以实现系统灵活性。In addition, Figure 3 shows the management domain PD (Physical Disk, also known as physical disk). The PD number is physically unique and can be logically repeated. PD is the basis for virtual machine storage. The operating system and data of the virtual machine are both Stored on PD. Queue pairs (QP) are resources for I/O communication. Different numbers of queue pairs can be configured in each domain. The RDMA device interface can flexibly configure queue pair resources according to user needs. MR (Memory Region), translated in Chinese as RDMA memory area, refers to an area planned in the memory by the RDMA software layer to store transmitted and received data. In the IB (InfiniBand) protocol, after users apply for a memory area for storing data, they need to register MR by calling the API provided by the IB framework so that the RDMA network card can access this memory area. The host driver (Host driver) is opposite to the virtual machine driver. It can manage and configure the dev interface, traffic, and queue priority. At the same time, the virtual machine can access the host driver through VDEV and configure the RDMA device through the host driver to achieve System flexibility.

在容器或虚拟机中引入VDEV用户可以直接映射设备分割资源,独立操作和控制设备分割资源,能有效提高IO性能及数据安全性。其中,VDEV是ZFS文件系统中的一个虚拟设备,由一组物理磁盘组成。VDEV可以用于创建RAID阵列、镜像或简单的逻辑卷。By introducing VDEV into a container or virtual machine, users can directly map device segmentation resources and independently operate and control device segmentation resources, which can effectively improve IO performance and data security. Among them, VDEV is a virtual device in the ZFS file system and consists of a set of physical disks. VDEV can be used to create RAID arrays, mirrors or simple logical volumes.

图4为本发明一实施例中的RDMA网卡虚拟化配置关系图,图4中示意出了RDMA设备可以通过图3中的VDEV进行配置的资源,RDMA设备可以提供一套计算机之间高性能网络通信的硬件资源,这种硬件资源可以通过配置实现分割,提供给多个容器(或虚拟机)使用,网卡硬件设计需提供资源配置,MAC地址配置,路由配置,网卡虚拟出多个不同的网卡资源,每个网卡资源配置用户私有的MAC地址及路由配置。流量配置,队列优先级配置,针对不同用户配置流量及优先级队列。虚拟设备配置,软件接口配置,为虚拟机和容器配置接口和资源,虚拟机独占访问资源。通过调度配置可以设置I/O过程中任务调度的流程,缓解主机侧I/O的压力。Figure 4 is an RDMA network card virtualization configuration relationship diagram in an embodiment of the present invention. Figure 4 illustrates the resources that the RDMA device can configure through the VDEV in Figure 3. The RDMA device can provide a set of high-performance networks between computers. Communication hardware resources. This hardware resource can be divided through configuration and provided to multiple containers (or virtual machines). The network card hardware design needs to provide resource configuration, MAC address configuration, routing configuration, and the network card can virtualize multiple different network cards. Resources, each network card resource configures the user's private MAC address and routing configuration. Traffic configuration, queue priority configuration, configure traffic and priority queues for different users. Virtual device configuration, software interface configuration, configure interfaces and resources for virtual machines and containers, and virtual machines have exclusive access to resources. Through scheduling configuration, the task scheduling process in the I/O process can be set to relieve the I/O pressure on the host side.

本发明所提出的基于RDMA技术实现轻量级虚拟化的方法和系统,能够使用基于RDMA技术的RDMA设备,辅助构建虚拟化的I/O处理环境,减轻主机侧的压力,一方面通过主机侧操作系统驱动统一配置RDMA设备,另一方面通过为每个容器或虚拟机单独分配的RDMA设备接口实现独立的访问,从而精简系统的复杂度,节省内存空间和配置空间的占用,降低成本,减轻主机侧CPU的负载压力。The method and system for realizing lightweight virtualization based on RDMA technology proposed by the present invention can use RDMA devices based on RDMA technology to assist in building a virtualized I/O processing environment and reduce the pressure on the host side. On the one hand, through the host side The operating system driver uniformly configures RDMA devices. On the other hand, independent access is achieved through the RDMA device interface allocated separately for each container or virtual machine, thereby simplifying the complexity of the system, saving memory space and configuration space, reducing costs, and The load pressure of the host-side CPU.

相比于SR-IOV标准所存在的需要占用独立的配置空间,内存映射I/O空间(MMIO),每个虚拟设备都有独立的中断向量配置的问题,本发明能够降低对内存空间和配置空间的要求,通过主机侧操作系统驱动统一配置RDMA设备,能够降低系统的复杂度,节省内存空间和配置空间的占用,降低成本,减轻主机侧CPU的负载压力Compared with the problems existing in the SR-IOV standard that require independent configuration space, memory mapped I/O space (MMIO), and each virtual device has an independent interrupt vector configuration, the present invention can reduce the need for memory space and configuration. Space requirements, unified configuration of RDMA devices through the host-side operating system driver can reduce the complexity of the system, save the occupation of memory space and configuration space, reduce costs, and reduce the load pressure on the host-side CPU

具体地,支持RDMA就的DPU设备可以通过PCIe接口连接到主机侧(服务器),DPU和RDMA的结合可以发挥协同效应,提高数据中心的整体性能。例如,DPU可以用于加速RDMA的传输并降低主机侧CPU的负载,RDMA可以降低DPU的负载。Specifically, DPU devices that support RDMA can be connected to the host side (server) through the PCIe interface. The combination of DPU and RDMA can exert a synergistic effect and improve the overall performance of the data center. For example, the DPU can be used to accelerate RDMA transmission and reduce the load on the host-side CPU, and RDMA can reduce the load on the DPU.

本发明所提出的虚拟化实现方法,可以通过VDEV灵活的配置主机侧驱动,从而通过主机侧驱动配置整个RDMA设备,在配置RDMA设备过程中可以分别调整针对不同容器或虚拟机的RDMA设备接口,使用同一RDMA设备的用户之间相互不影响,从而在实际使用中随用户需求变化灵活调整RDMA设备接口的配置以满足用户多变的应用场景(通过物理页的形式进行映射可以达到内存隔离)。The virtualization implementation method proposed by the present invention can flexibly configure the host-side driver through VDEV, thereby configuring the entire RDMA device through the host-side driver. During the process of configuring the RDMA device, the RDMA device interfaces for different containers or virtual machines can be adjusted respectively. Users using the same RDMA device do not affect each other, so that in actual use, the configuration of the RDMA device interface can be flexibly adjusted as user needs change to meet the user's changing application scenarios (memory isolation can be achieved through mapping in the form of physical pages).

与上述方法相应地,本发明还提供了一种基于RDMA技术实现轻量级虚拟化的系统,该系统包括计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机指令,所述处理器用于执行所述存储器中存储的计算机指令,当所述计算机指令被处理器执行时该系统实现如前所述方法的步骤。Corresponding to the above method, the present invention also provides a system for realizing lightweight virtualization based on RDMA technology. The system includes a computer device, the computer device includes a processor and a memory, and computer instructions are stored in the memory. The processor is configured to execute computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the foregoing method.

本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时以实现如前所述方法的步骤。该计算机可读存储介质可以是有形存储介质,诸如随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、软盘、硬盘、可移动存储盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the method as described above are implemented. The computer readable storage medium may be a tangible storage medium such as random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, floppy disk, hard disk, removable storage disk, CD-ROM, or any other form of storage medium known in the art.

本领域普通技术人员应该可以明白,结合本文中所公开的实施方式描述的各示例性的组成部分、系统和方法,能够以硬件、软件或者二者的结合来实现。具体究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。当以硬件方式实现时,其可以例如是电子电路、专用集成电路(ASIC)、适当的固件、插件、功能卡等等。当以软件方式实现时,本发明的元素是被用于执行所需任务的程序或者代码段。程序或者代码段可以存储在机器可读介质中,或者通过载波中携带的数据信号在传输介质或者通信链路上传送。Those of ordinary skill in the art should understand that each exemplary component, system and method described in conjunction with the embodiments disclosed herein can be implemented in hardware, software or a combination of both. Whether it is implemented in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an application specific integrated circuit (ASIC), appropriate firmware, a plug-in, a function card, or the like. When implemented in software, elements of the invention are programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted over a transmission medium or communications link via a data signal carried in a carrier wave.

需要明确的是,本发明并不局限于上文所描述并在图中示出的特定配置和处理。为了简明起见,这里省略了对已知方法的详细描述。在上述实施例中,描述和示出了若干具体的步骤作为示例。但是,本发明的方法过程并不限于所描述和示出的具体步骤,本领域的技术人员可以在领会本发明的精神后,作出各种改变、修改和添加,或者改变步骤之间的顺序。It is to be understood that this invention is not limited to the specific arrangements and processes described above and illustrated in the drawings. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications and additions, or change the order between steps after understanding the spirit of the present invention.

本发明中,针对一个实施方式描述和/或例示的特征,可以在一个或更多个其它实施方式中以相同方式或以类似方式使用,和/或与其他实施方式的特征相结合或代替其他实施方式的特征。In the present invention, features described and/or illustrated with respect to one embodiment may be used in the same or in a similar manner in one or more other embodiments and/or may be combined with or substituted for features of other embodiments. Features of Embodiments.

以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域的技术人员来说,本发明实施例可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, various modifications and changes may be made to the embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

Claims (10)

1. A method for implementing lightweight virtualization based on RDMA technology, the method comprising the steps of:
establishing connection between an RDMA device and a host based on RDMA technology;
for each container or virtual machine created at the host side, assigning an RDMA device interface; the host side operating system creates a plurality of containers or virtual machines based on the virtualization function, and the RDMA equipment interface comprises MMIO registers taking physical pages as units;
the RDMA device interface receives a virtualization configuration from a host side driver; wherein the virtualization configuration comprises a queue priority configuration;
each RDMA device interface managing queues in RDMA devices based on the received virtualization configuration;
the container or virtual machine maps to the RDMA device using the MMIO register to enable the container or virtual machine to access the RDMA device based on the RDMA device interface to implement virtualization based on RDMA technology.
2. The method of claim 1, wherein the types of RDMA devices include an RDMA network card and an RDMA-capable DPU acceleration card.
3. The method according to claim 1, characterized in that the method further comprises:
in each container or virtual machine created at the host side, the RDMA device interface and host side driver are managed using virtualization device operating software to virtualize the RDMA device as a virtual RDMA device that services each container or virtual machine separately.
4. The method of claim 3, wherein managing RDMA device interfaces and host side drivers using virtualization device operating software comprises:
on the host side, the virtualization device operating software sends configuration instructions to the host side driver, causing the host side driver to generate and send virtualization configurations for the respective RDMA device interfaces to the RDMA devices.
5. The method of claim 3, wherein managing RDMA device interfaces and host side drivers using virtualization device operating software further comprises:
the virtualization device operating software manages the RDMA side resources by mapping the physical pages where the RDMA device interfaces are located, thereby abstracting one RDMA device for each container or virtual machine.
6. The method of any of claims 1-5, wherein the virtualization configuration further comprises a plurality of scheduling configurations, virtual device configurations, software interface configurations, flow control configurations, MAC address configurations, and routing configurations.
7. The method of claim 1, wherein a virtual memory region for storing transceiving data and a queue pair for transceiving data are contained in the container and virtual machine, wherein each RDMA device interface in the RDMA device manages a completion queue for transceiving data, and wherein direct access to the remote virtual memory is achieved based on the virtual memory region, the queue pair, and the completion queue.
8. The method of claim 1, wherein in the step of each RDMA device interface managing queues in an RDMA device based on the received virtualization configuration, the method further comprises:
each container or virtual machine configures the queue to be exclusive or shared through an independently configured RDMA device interface.
9. A system for implementing lightweight virtualization based on RDMA technology, comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, which system implements the steps of the method according to any of claims 1 to 8 when the computer instructions are executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 8.
CN202311617099.5A 2023-11-29 2023-11-29 Method and system for realizing lightweight virtualization based on RDMA technology Active CN117609115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311617099.5A CN117609115B (en) 2023-11-29 2023-11-29 Method and system for realizing lightweight virtualization based on RDMA technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311617099.5A CN117609115B (en) 2023-11-29 2023-11-29 Method and system for realizing lightweight virtualization based on RDMA technology

Publications (2)

Publication Number Publication Date
CN117609115A true CN117609115A (en) 2024-02-27
CN117609115B CN117609115B (en) 2025-01-10

Family

ID=89953049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311617099.5A Active CN117609115B (en) 2023-11-29 2023-11-29 Method and system for realizing lightweight virtualization based on RDMA technology

Country Status (1)

Country Link
CN (1) CN117609115B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118152075A (en) * 2024-04-09 2024-06-07 中科驭数(北京)科技有限公司 Lightweight RDMA virtualization method, system, device, electronic equipment and medium
CN118381818A (en) * 2024-06-24 2024-07-23 济南浪潮数据技术有限公司 Data interaction method, computer device, storage medium, and program product
CN118963920A (en) * 2024-08-12 2024-11-15 中科驭数(北京)科技有限公司 A VirtIO device ring queue dynamic management method and device for DPU
CN119292969A (en) * 2024-08-30 2025-01-10 西安电子科技大学 A hardware implementation method for single-root virtualization of RDMA network card supporting Ethernet

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858157A (en) * 2018-08-22 2020-03-03 英特尔公司 Live migration of virtual devices in an extensible I/O virtualization (S-IOV) architecture
CN111966446A (en) * 2020-07-06 2020-11-20 复旦大学 RDMA virtualization method in container environment
US20210117246A1 (en) * 2020-09-25 2021-04-22 Intel Corporation Disaggregated computing for distributed confidential computing environment
US20210117360A1 (en) * 2020-05-08 2021-04-22 Intel Corporation Network and edge acceleration tile (next) architecture
CN113296884A (en) * 2021-02-26 2021-08-24 阿里巴巴集团控股有限公司 Virtualization method, virtualization device, electronic equipment, virtualization medium and resource virtualization system
WO2023184203A1 (en) * 2022-03-30 2023-10-05 Intel Corporation Techniques to implement confidential computing with a remote device via use of trust domains

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858157A (en) * 2018-08-22 2020-03-03 英特尔公司 Live migration of virtual devices in an extensible I/O virtualization (S-IOV) architecture
CN115344521A (en) * 2018-08-22 2022-11-15 英特尔公司 Virtual device composition in extensible input/output (I/O) virtualization (S-IOV) architecture
US20210117360A1 (en) * 2020-05-08 2021-04-22 Intel Corporation Network and edge acceleration tile (next) architecture
CN111966446A (en) * 2020-07-06 2020-11-20 复旦大学 RDMA virtualization method in container environment
US20210117246A1 (en) * 2020-09-25 2021-04-22 Intel Corporation Disaggregated computing for distributed confidential computing environment
CN113296884A (en) * 2021-02-26 2021-08-24 阿里巴巴集团控股有限公司 Virtualization method, virtualization device, electronic equipment, virtualization medium and resource virtualization system
WO2023184203A1 (en) * 2022-03-30 2023-10-05 Intel Corporation Techniques to implement confidential computing with a remote device via use of trust domains

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李超等: "基于SR-IOV的IO虚拟化技术", 电脑与信息技术, no. 05, 15 October 2010 (2010-10-15) *
王展等: "基于单根I/O虚拟化的多根I/O资源池化方法", 计算机研究与发展, no. 01, 15 January 2015 (2015-01-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118152075A (en) * 2024-04-09 2024-06-07 中科驭数(北京)科技有限公司 Lightweight RDMA virtualization method, system, device, electronic equipment and medium
CN118381818A (en) * 2024-06-24 2024-07-23 济南浪潮数据技术有限公司 Data interaction method, computer device, storage medium, and program product
CN118963920A (en) * 2024-08-12 2024-11-15 中科驭数(北京)科技有限公司 A VirtIO device ring queue dynamic management method and device for DPU
CN119292969A (en) * 2024-08-30 2025-01-10 西安电子科技大学 A hardware implementation method for single-root virtualization of RDMA network card supporting Ethernet
CN119292969B (en) * 2024-08-30 2025-09-16 西安电子科技大学 A hardware implementation method for single-root virtualization of RDMA network cards supporting Ethernet

Also Published As

Publication number Publication date
CN117609115B (en) 2025-01-10

Similar Documents

Publication Publication Date Title
CN114996185B (en) Bridging across address space
CN112148421B (en) Virtual machine migration method and device
US9529773B2 (en) Systems and methods for enabling access to extensible remote storage over a network as local storage via a logical storage controller
US10417174B2 (en) Remote direct memory access in a virtualized computing environment
US7552298B2 (en) Method and system for deferred pinning of host memory for stateful network interfaces
CN108243118B (en) Method and physical host for forwarding packets
CN117609115A (en) A method and system for lightweight virtualization based on RDMA technology
EP3798835B1 (en) Method, device, and system for implementing hardware acceleration processing
US7941812B2 (en) Input/output virtualization through offload techniques
US10176007B2 (en) Guest code emulation by virtual machine function
EP4004751B1 (en) Pinned physical memory supporting direct memory access for virtual memory backed containers
US8225332B2 (en) Method and system for protocol offload in paravirtualized systems
CN102497434B (en) Establishing method of kernel state virtual network equipment and packet transmitting and receiving methods thereof
CN103384551B (en) A kind of virtual machine communication method of Based PC IE network, server and system
CN105389199B (en) A Xen-based FPGA accelerator virtualization platform and its application
US20220391341A1 (en) Cross bus memory mapping
US12105648B2 (en) Data processing method, apparatus, and device
CN103763173A (en) Data transmission method and computing node
Ngoc et al. Flexible NVMe request routing for virtual machines
US20240348562A1 (en) Multi-host isolation in a shared networking pipeline
US20250147886A1 (en) I/o cache partitioning
Nanos et al. Xen2MX: towards high-performance communication in the cloud
CN119781904A (en) High-performance FPGA heterogeneous computing virtualization method and system based on vhost-user
CN117520215A (en) Page fault processing method and related equipment
CN120631827A (en) Unified memory and data transmission system and management method of FPGA heterogeneous platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant