CN105550085A

CN105550085A - RDMA (remote direct memory Access) testing method based on GPUDerict

Info

Publication number: CN105550085A
Application number: CN201510915330.8A
Authority: CN
Inventors: 潘霖
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: IEIT Systems Co Ltd
Priority date: 2015-12-10
Filing date: 2015-12-10
Publication date: 2016-05-04

Abstract

The invention provides a GPUDerict-based? RDMA test methods, including HCA card, GPU card, Nvidia? Driver, Nvidia? CUDA? toolkit, MLNX _ offset driver, and nv _ peer _ mem packets for GPU to IB card communication. Directly accessing GPU memory, avoiding access to fixed (pined)? Unnecessary system memory copy and CPU overhead when CUDA host memory speed up the communication with the network and storage device to enable one GPU in the same system to directly access another GPU to use direct high-speed DMA transfer. The test method is simple to operate, high in automation degree and high in practicability, can save manpower, effectively ensures the stability of the performance of the server, and is a very effective method for verifying the product quality of the GPU server.

Description

A testing method based on GPUDerict RDMA

技术领域technical field

本发明涉及GPU服务器测试领域，具体涉及一种基于GPUDerictRDMA测试方法。The invention relates to the field of GPU server testing, in particular to a testing method based on GPUDerictRDMA.

背景技术Background technique

随着IT领域技术的不断发展，传统信息化服务以及日趋强大的云计算服务对服务器的要求越来越高，技术的更新换代时间越来越快，通用处理器CPU无论是在频率、内存带宽、多核乃至制程和指令集上的优化，都遇到了前所未有的困难。但是GPU服务器在异构计算领域打开了一扇门，愈来愈多的超算中心、企业和研究机构正在构建以协处理器为核心的计算资源池，并在异构平台上发展和优化出适配的应用层，客户对GPU服务器的需求越来多大，用以满足日益增长的计算能力需求。With the continuous development of technology in the IT field, traditional information services and increasingly powerful cloud computing services have higher and higher requirements for servers, and the replacement time of technology is getting faster and faster. , multi-core and even process and instruction set optimization have encountered unprecedented difficulties. However, GPU servers have opened a door in the field of heterogeneous computing. More and more supercomputing centers, enterprises and research institutions are building computing resource pools with coprocessors as the core, and developing and optimizing computing resources on heterogeneous platforms. For the adapted application layer, customers have more and more demands on GPU servers to meet the growing demand for computing power.

发明内容Contents of the invention

本发明的技术任务是针对现有技术的不足，提供一种基于GPUDerictRDMA测试方法。本方法既对GPU服务器性能进行了有效测试，又为客户对GPU服务器的性能需求提供了重要性能数据。The technical task of the present invention is to provide a testing method based on GPUDerictRDMA aiming at the deficiencies of the prior art. This method not only effectively tests the performance of the GPU server, but also provides important performance data for the customer's performance requirements on the GPU server.

本发明解决其技术问题所采用的技术方案是：The technical solution adopted by the present invention to solve its technical problems is:

一种基于GPUDerictRDMA测试方法，直接访问GPU内存，避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销，加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输，增加了P2P的内存访问，真正释放了主机CPU资源，消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作；包括HCA卡、GPU卡、GPU必备的NvidiaDriver、NvidiaCUDAtoolkit，及infiniband必备的MLNX_OFED驱动外，以及一个GPU与IB卡通信的nv_peer_mem包。A test method based on GPUDerictRDMA, which directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing pinned CUDA host memory, and accelerates communication with network and storage devices in the same system One GPU directly accesses another GPU using direct high-speed DMA transmission, which increases P2P memory access, truly releases the host CPU resources, eliminates unnecessary frequent data transmission in the host CPU, and does not participate in input RDMA operations at all; including HCA card, GPU card, NvidiaDriver, NvidiaCUDAtoolkit necessary for GPU, and MLNX_OFED driver necessary for infiniband, and an nv_peer_mem package for communication between GPU and IB card.

HCA卡为MellanoxConnectX及以后产品，GPU卡为K20及以后产品。HCA cards are MellanoxConnectX and later products, and GPU cards are K20 and later products.

GPUDirectRDMA测试方法如下：The GPUDirectRDMA test method is as follows:

1、测试工具1. Test tools

a、cuda_6.5.14_linux_64.runa. cuda_6.5.14_linux_64.run

b、nvidia_peer_memory-1.0-0.tar.gzb. nvidia_peer_memory-1.0-0.tar.gz

c、mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmc. mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

d、MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.isod. MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso

2、测试方法2. Test method

a、HCA驱动安装a. HCA driver installation

mount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mntmount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mnt

cd/mntcd/mnt

./mlnxofedinstall./mlnxofedinstall

b、显卡驱动安装b. Graphics card driver installation

chmod777cuda_6.5.14_linux_64.runchmod 777cuda_6.5.14_linux_64.run

./cuda_6.5.14_linux_64.run--extract=/root/rdma./cuda_6.5.14_linux_64.run --extract=/root/rdma

./NVIDIA-Linux-x86_64-340.29.run./NVIDIA-Linux-x86_64-340.29.run

c、CUDA安装c. CUDA installation

./cuda-linux64-rel-6.5.14-18749181.run./cuda-linux64-rel-6.5.14-18749181.run

d、环境变量设置d. Environment variable settings

vi~/.bashrcvi ~/.bashrc

在最后添加：exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATHAdd at the end: exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH

source~/.bashrcsource ~/.bashrc

vi/etc/ld.so.confvi /etc/ld.so.conf

在最后添加：/usr/local/cuda-6.5/lib64Add at the end: /usr/local/cuda-6.5/lib64

LdconfigLdconfig

e、nv_peer_mem安装e. nv_peer_mem installation

tar-zxf../nvidia_peer_memory-1.0-0.tar.gztar-zxf ../nvidia_peer_memory-1.0-0.tar.gz

rpmbuild--rebuildnvidia_peer_memory-1.0-0.src.rpmrpmbuild --rebuild nvidia_peer_memory-1.0-0.src.rpm

rpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpmrpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpm

/etc/init.d/nv_peer_memstart启动nv_peer_mem服务/etc/init.d/nv_peer_memstart starts nv_peer_mem service

f、mvapich2安装f. mvapich2 installation

rpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmrpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

g、GPUDirectRDMA带宽测试g. GPUDirectRDMA bandwidth test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1 /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD

h、GPUDirectRDMA延迟测试h. GPUDirectRDMA latency test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD

本发明的一种基于GPUDerictRDMA测试方法与现有技术相比，所产生的有益效果是，本发明直接访问GPU内存，避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销，加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输，增加了P2P的内存访问，真正释放了主机CPU资源，消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作。A kind of test method based on GPUDerictRDMA of the present invention compares with prior art, the beneficial effect produced is that the present invention directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing fixed (pinned) CUDA host memory , speeding up the communication with the network and storage devices. One GPU in the same system can directly access another GPU using direct high-speed DMA transfer, which increases P2P memory access, truly releases the host CPU resources, and eliminates the host CPU. Unnecessarily frequent data transfers in the middle, do not participate in the incoming RDMA operation at all.

GPUDirectRDMA测试方法既对GPU服务器性能进行了有效测试，又为客户对GPU服务器的性能需求提供了重要性能数据。该测试方法操作简单，自动化程度高，实用性较强，能够节省人力，有效确保了服务器性能的稳定性，是验证GPU服务器产品质量非常有效的方法。The GPUDirectRDMA test method not only effectively tests the performance of GPU servers, but also provides important performance data for customers' performance requirements for GPU servers. The test method is simple to operate, has a high degree of automation, strong practicability, can save manpower, effectively ensures the stability of server performance, and is a very effective method for verifying the quality of GPU server products.

附图说明Description of drawings

图1是基于GPUDerictRDMA测试方法示意图。Figure 1 is a schematic diagram of the test method based on GPUDerictRDMA.

具体实施方式detailed description

下面结合附图对本发明的一种基于GPUDerictRDMA测试方法作以下详细地说明。A test method based on GPUDerictRDMA of the present invention will be described in detail below in conjunction with the accompanying drawings.

GPUDirectRDMA测试方法如下：The GPUDirectRDMA test method is as follows:

3、测试工具3. Test tools

h、cuda_6.5.14_linux_64.runh. cuda_6.5.14_linux_64.run

i、nvidia_peer_memory-1.0-0.tar.gzi. nvidia_peer_memory-1.0-0.tar.gz

j、mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmj. mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

k、MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.isok. MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso

4、测试方法4. Test method

b、HCA驱动安装b. HCA driver installation

cd/mntcd/mnt

./mlnxofedinstall./mlnxofedinstall

b、显卡驱动安装b. Graphics card driver installation

chmod777cuda_6.5.14_linux_64.runchmod 777cuda_6.5.14_linux_64.run

./NVIDIA-Linux-x86_64-340.29.run./NVIDIA-Linux-x86_64-340.29.run

c、CUDA安装c. CUDA installation

./cuda-linux64-rel-6.5.14-18749181.run./cuda-linux64-rel-6.5.14-18749181.run

d、环境变量设置d. Environment variable settings

vi~/.bashrcvi ~/.bashrc

source~/.bashrcsource ~/.bashrc

vi/etc/ld.so.confvi /etc/ld.so.conf

LdconfigLdconfig

l、nv_peer_mem安装l. nv_peer_mem installation

m、mvapich2安装m, mvapich2 installation

n、GPUDirectRDMA带宽测试n. GPUDirectRDMA bandwidth test

h、GPUDirectRDMA延迟测试h. GPUDirectRDMA latency test

随着传统信息化服务以及日趋强大的云计算服务对服务器的要求越来越高，客户对GPU服务器的需求日益增加，GPUDirectRDMA测试方法既对GPU服务器性能进行了有效测试，又为客户对GPU服务器的性能需求提供了重要性能数据。该测试方法操作简单，自动化程度高，实用性较强，能够节省人力，有效确保了服务器性能的稳定性，是验证GPU服务器产品质量非常有效的方法。As traditional information services and increasingly powerful cloud computing services have higher and higher requirements for servers, customers' demand for GPU servers is increasing. The GPUDirectRDMA test method not only effectively tests the performance of GPU servers, but also provides customers with GPU The performance requirements provide important performance data. The test method is simple to operate, has a high degree of automation, strong practicability, can save manpower, effectively ensures the stability of server performance, and is a very effective method for verifying the quality of GPU server products.

Claims

1. A test method based on GPUDerictRDMA, characterized in that it directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing fixed (pinned) CUDA host memory, and accelerates the communication between network and storage devices. One GPU in the same system directly accesses another GPU using direct high-speed DMA transfer, which increases P2P memory access, truly releases the host CPU resources, eliminates unnecessary frequent data transmission in the host CPU, and does not participate in input at all. RDMA operation; including HCA card, GPU card, NvidiaDriver, NvidiaCUDAtoolkit, MLNX_OFED driver and nv_peer_mem package for communication between GPU and IB card.

2. A kind of test method based on GPUDerictRDMA according to claim 1, characterized in that the HCA card is MellanoxConnectX.