[go: up one dir, main page]

CN105550085A - RDMA (remote direct memory Access) testing method based on GPUDerict - Google Patents

RDMA (remote direct memory Access) testing method based on GPUDerict Download PDF

Info

Publication number
CN105550085A
CN105550085A CN201510915330.8A CN201510915330A CN105550085A CN 105550085 A CN105550085 A CN 105550085A CN 201510915330 A CN201510915330 A CN 201510915330A CN 105550085 A CN105550085 A CN 105550085A
Authority
CN
China
Prior art keywords
gpu
card
memory
cuda
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510915330.8A
Other languages
Chinese (zh)
Inventor
潘霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201510915330.8A priority Critical patent/CN105550085A/en
Publication of CN105550085A publication Critical patent/CN105550085A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2236Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • G06F13/282Cycle stealing DMA

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a GPUDerict-based? RDMA test methods, including HCA card, GPU card, Nvidia? Driver, Nvidia? CUDA? toolkit, MLNX _ offset driver, and nv _ peer _ mem packets for GPU to IB card communication. Directly accessing GPU memory, avoiding access to fixed (pined)? Unnecessary system memory copy and CPU overhead when CUDA host memory speed up the communication with the network and storage device to enable one GPU in the same system to directly access another GPU to use direct high-speed DMA transfer. The test method is simple to operate, high in automation degree and high in practicability, can save manpower, effectively ensures the stability of the performance of the server, and is a very effective method for verifying the product quality of the GPU server.

Description

一种基于GPUDerict RDMA测试方法A testing method based on GPUDerict RDMA

技术领域technical field

本发明涉及GPU服务器测试领域,具体涉及一种基于GPUDerictRDMA测试方法。The invention relates to the field of GPU server testing, in particular to a testing method based on GPUDerictRDMA.

背景技术Background technique

随着IT领域技术的不断发展,传统信息化服务以及日趋强大的云计算服务对服务器的要求越来越高,技术的更新换代时间越来越快,通用处理器CPU无论是在频率、内存带宽、多核乃至制程和指令集上的优化,都遇到了前所未有的困难。但是GPU服务器在异构计算领域打开了一扇门,愈来愈多的超算中心、企业和研究机构正在构建以协处理器为核心的计算资源池,并在异构平台上发展和优化出适配的应用层,客户对GPU服务器的需求越来多大,用以满足日益增长的计算能力需求。With the continuous development of technology in the IT field, traditional information services and increasingly powerful cloud computing services have higher and higher requirements for servers, and the replacement time of technology is getting faster and faster. , multi-core and even process and instruction set optimization have encountered unprecedented difficulties. However, GPU servers have opened a door in the field of heterogeneous computing. More and more supercomputing centers, enterprises and research institutions are building computing resource pools with coprocessors as the core, and developing and optimizing computing resources on heterogeneous platforms. For the adapted application layer, customers have more and more demands on GPU servers to meet the growing demand for computing power.

发明内容Contents of the invention

本发明的技术任务是针对现有技术的不足,提供一种基于GPUDerictRDMA测试方法。本方法既对GPU服务器性能进行了有效测试,又为客户对GPU服务器的性能需求提供了重要性能数据。The technical task of the present invention is to provide a testing method based on GPUDerictRDMA aiming at the deficiencies of the prior art. This method not only effectively tests the performance of the GPU server, but also provides important performance data for the customer's performance requirements on the GPU server.

本发明解决其技术问题所采用的技术方案是:The technical solution adopted by the present invention to solve its technical problems is:

一种基于GPUDerictRDMA测试方法,直接访问GPU内存,避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销,加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输,增加了P2P的内存访问,真正释放了主机CPU资源,消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作;包括HCA卡、GPU卡、GPU必备的NvidiaDriver、NvidiaCUDAtoolkit,及infiniband必备的MLNX_OFED驱动外,以及一个GPU与IB卡通信的nv_peer_mem包。A test method based on GPUDerictRDMA, which directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing pinned CUDA host memory, and accelerates communication with network and storage devices in the same system One GPU directly accesses another GPU using direct high-speed DMA transmission, which increases P2P memory access, truly releases the host CPU resources, eliminates unnecessary frequent data transmission in the host CPU, and does not participate in input RDMA operations at all; including HCA card, GPU card, NvidiaDriver, NvidiaCUDAtoolkit necessary for GPU, and MLNX_OFED driver necessary for infiniband, and an nv_peer_mem package for communication between GPU and IB card.

HCA卡为MellanoxConnectX及以后产品,GPU卡为K20及以后产品。HCA cards are MellanoxConnectX and later products, and GPU cards are K20 and later products.

GPUDirectRDMA测试方法如下:The GPUDirectRDMA test method is as follows:

1、测试工具1. Test tools

a、cuda_6.5.14_linux_64.runa. cuda_6.5.14_linux_64.run

b、nvidia_peer_memory-1.0-0.tar.gzb. nvidia_peer_memory-1.0-0.tar.gz

c、mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmc. mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

d、MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.isod. MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso

2、测试方法2. Test method

a、HCA驱动安装a. HCA driver installation

mount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mntmount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mnt

cd/mntcd/mnt

./mlnxofedinstall./mlnxofedinstall

b、显卡驱动安装b. Graphics card driver installation

chmod777cuda_6.5.14_linux_64.runchmod 777cuda_6.5.14_linux_64.run

./cuda_6.5.14_linux_64.run--extract=/root/rdma./cuda_6.5.14_linux_64.run --extract=/root/rdma

./NVIDIA-Linux-x86_64-340.29.run./NVIDIA-Linux-x86_64-340.29.run

c、CUDA安装c. CUDA installation

./cuda-linux64-rel-6.5.14-18749181.run./cuda-linux64-rel-6.5.14-18749181.run

d、环境变量设置d. Environment variable settings

vi~/.bashrcvi ~/.bashrc

在最后添加:exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATHAdd at the end: exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH

source~/.bashrcsource ~/.bashrc

vi/etc/ld.so.confvi /etc/ld.so.conf

在最后添加:/usr/local/cuda-6.5/lib64Add at the end: /usr/local/cuda-6.5/lib64

LdconfigLdconfig

e、nv_peer_mem安装e. nv_peer_mem installation

tar-zxf../nvidia_peer_memory-1.0-0.tar.gztar-zxf ../nvidia_peer_memory-1.0-0.tar.gz

rpmbuild--rebuildnvidia_peer_memory-1.0-0.src.rpmrpmbuild --rebuild nvidia_peer_memory-1.0-0.src.rpm

rpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpmrpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpm

/etc/init.d/nv_peer_memstart启动nv_peer_mem服务/etc/init.d/nv_peer_memstart starts nv_peer_mem service

f、mvapich2安装f. mvapich2 installation

rpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmrpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

g、GPUDirectRDMA带宽测试g. GPUDirectRDMA bandwidth test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1 /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD

h、GPUDirectRDMA延迟测试h. GPUDirectRDMA latency test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD

本发明的一种基于GPUDerictRDMA测试方法与现有技术相比,所产生的有益效果是,本发明直接访问GPU内存,避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销,加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输,增加了P2P的内存访问,真正释放了主机CPU资源,消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作。A kind of test method based on GPUDerictRDMA of the present invention compares with prior art, the beneficial effect produced is that the present invention directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing fixed (pinned) CUDA host memory , speeding up the communication with the network and storage devices. One GPU in the same system can directly access another GPU using direct high-speed DMA transfer, which increases P2P memory access, truly releases the host CPU resources, and eliminates the host CPU. Unnecessarily frequent data transfers in the middle, do not participate in the incoming RDMA operation at all.

GPUDirectRDMA测试方法既对GPU服务器性能进行了有效测试,又为客户对GPU服务器的性能需求提供了重要性能数据。该测试方法操作简单,自动化程度高,实用性较强,能够节省人力,有效确保了服务器性能的稳定性,是验证GPU服务器产品质量非常有效的方法。The GPUDirectRDMA test method not only effectively tests the performance of GPU servers, but also provides important performance data for customers' performance requirements for GPU servers. The test method is simple to operate, has a high degree of automation, strong practicability, can save manpower, effectively ensures the stability of server performance, and is a very effective method for verifying the quality of GPU server products.

附图说明Description of drawings

图1是基于GPUDerictRDMA测试方法示意图。Figure 1 is a schematic diagram of the test method based on GPUDerictRDMA.

具体实施方式detailed description

下面结合附图对本发明的一种基于GPUDerictRDMA测试方法作以下详细地说明。A test method based on GPUDerictRDMA of the present invention will be described in detail below in conjunction with the accompanying drawings.

一种基于GPUDerictRDMA测试方法,直接访问GPU内存,避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销,加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输,增加了P2P的内存访问,真正释放了主机CPU资源,消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作;包括HCA卡、GPU卡、GPU必备的NvidiaDriver、NvidiaCUDAtoolkit,及infiniband必备的MLNX_OFED驱动外,以及一个GPU与IB卡通信的nv_peer_mem包。A test method based on GPUDerictRDMA, which directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing pinned CUDA host memory, and accelerates communication with network and storage devices in the same system One GPU directly accesses another GPU using direct high-speed DMA transmission, which increases P2P memory access, truly releases the host CPU resources, eliminates unnecessary frequent data transmission in the host CPU, and does not participate in input RDMA operations at all; including HCA card, GPU card, NvidiaDriver, NvidiaCUDAtoolkit necessary for GPU, and MLNX_OFED driver necessary for infiniband, and an nv_peer_mem package for communication between GPU and IB card.

HCA卡为MellanoxConnectX及以后产品,GPU卡为K20及以后产品。HCA cards are MellanoxConnectX and later products, and GPU cards are K20 and later products.

GPUDirectRDMA测试方法如下:The GPUDirectRDMA test method is as follows:

3、测试工具3. Test tools

h、cuda_6.5.14_linux_64.runh. cuda_6.5.14_linux_64.run

i、nvidia_peer_memory-1.0-0.tar.gzi. nvidia_peer_memory-1.0-0.tar.gz

j、mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmj. mvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

k、MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.isok. MLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso

4、测试方法4. Test method

b、HCA驱动安装b. HCA driver installation

mount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mntmount-oro,loopMLNX_OFED_LINUX-2.4-1.0.0-rhel6.2-x86_64.iso/mnt

cd/mntcd/mnt

./mlnxofedinstall./mlnxofedinstall

b、显卡驱动安装b. Graphics card driver installation

chmod777cuda_6.5.14_linux_64.runchmod 777cuda_6.5.14_linux_64.run

./cuda_6.5.14_linux_64.run--extract=/root/rdma./cuda_6.5.14_linux_64.run --extract=/root/rdma

./NVIDIA-Linux-x86_64-340.29.run./NVIDIA-Linux-x86_64-340.29.run

c、CUDA安装c. CUDA installation

./cuda-linux64-rel-6.5.14-18749181.run./cuda-linux64-rel-6.5.14-18749181.run

d、环境变量设置d. Environment variable settings

vi~/.bashrcvi ~/.bashrc

在最后添加:exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATHAdd at the end: exportPATH=/usr/local/cuda-6.5/bin:$PATHexportLD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH

source~/.bashrcsource ~/.bashrc

vi/etc/ld.so.confvi /etc/ld.so.conf

在最后添加:/usr/local/cuda-6.5/lib64Add at the end: /usr/local/cuda-6.5/lib64

LdconfigLdconfig

l、nv_peer_mem安装l. nv_peer_mem installation

tar-zxf../nvidia_peer_memory-1.0-0.tar.gztar-zxf ../nvidia_peer_memory-1.0-0.tar.gz

rpmbuild--rebuildnvidia_peer_memory-1.0-0.src.rpmrpmbuild --rebuild nvidia_peer_memory-1.0-0.src.rpm

rpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpmrpm-ivh/root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpm

/etc/init.d/nv_peer_memstart启动nv_peer_mem服务/etc/init.d/nv_peer_memstart starts nv_peer_mem service

m、mvapich2安装m, mvapich2 installation

rpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpmrpm-Uvh--nodepsmvapich2-gdr-cuda6.5-gnu-2.1-0.1.a.el6.x86_64.rpm

n、GPUDirectRDMA带宽测试n. GPUDirectRDMA bandwidth test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1 /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_bw-dcudaDD

h、GPUDirectRDMA延迟测试h. GPUDirectRDMA latency test

/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD/opt/mvapich2/gdr/2.1/cuda6.5/gnu/bin/mpirun_rsh-np2c1c2MV2_USE_CUDA=1MV2_USE_GPUDIRECT=1/opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency-dcudaDD

随着传统信息化服务以及日趋强大的云计算服务对服务器的要求越来越高,客户对GPU服务器的需求日益增加,GPUDirectRDMA测试方法既对GPU服务器性能进行了有效测试,又为客户对GPU服务器的性能需求提供了重要性能数据。该测试方法操作简单,自动化程度高,实用性较强,能够节省人力,有效确保了服务器性能的稳定性,是验证GPU服务器产品质量非常有效的方法。As traditional information services and increasingly powerful cloud computing services have higher and higher requirements for servers, customers' demand for GPU servers is increasing. The GPUDirectRDMA test method not only effectively tests the performance of GPU servers, but also provides customers with GPU The performance requirements provide important performance data. The test method is simple to operate, has a high degree of automation, strong practicability, can save manpower, effectively ensures the stability of server performance, and is a very effective method for verifying the quality of GPU server products.

Claims (2)

1.一种基于GPUDerictRDMA测试方法,其特征在于直接访问GPU内存,避免访问固定(pinned)CUDA主机内存时不必要的系统内存拷贝和CPU的开销,加速了与网络和存储设备之间的通信可以在同一系统中的一个GPU直接访问另一个GPU使用直接的高速DMA传输,增加了P2P的内存访问,真正释放了主机CPU资源,消除主机了CPU中不必要的频繁数据传输,完全不参与输入的RDMA操作;包括HCA卡、GPU卡、NvidiaDriver、NvidiaCUDAtoolkit、MLNX_OFED驱动和GPU与IB卡通信的nv_peer_mem包。1. A test method based on GPUDerictRDMA, characterized in that it directly accesses GPU memory, avoids unnecessary system memory copy and CPU overhead when accessing fixed (pinned) CUDA host memory, and accelerates the communication between network and storage devices. One GPU in the same system directly accesses another GPU using direct high-speed DMA transfer, which increases P2P memory access, truly releases the host CPU resources, eliminates unnecessary frequent data transmission in the host CPU, and does not participate in input at all. RDMA operation; including HCA card, GPU card, NvidiaDriver, NvidiaCUDAtoolkit, MLNX_OFED driver and nv_peer_mem package for communication between GPU and IB card. 2.根据权利要求1所述的一种基于GPUDerictRDMA测试方法,其特征在于HCA卡为MellanoxConnectX。2. A kind of test method based on GPUDerictRDMA according to claim 1, characterized in that the HCA card is MellanoxConnectX.
CN201510915330.8A 2015-12-10 2015-12-10 RDMA (remote direct memory Access) testing method based on GPUDerict Pending CN105550085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510915330.8A CN105550085A (en) 2015-12-10 2015-12-10 RDMA (remote direct memory Access) testing method based on GPUDerict

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510915330.8A CN105550085A (en) 2015-12-10 2015-12-10 RDMA (remote direct memory Access) testing method based on GPUDerict

Publications (1)

Publication Number Publication Date
CN105550085A true CN105550085A (en) 2016-05-04

Family

ID=55829281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510915330.8A Pending CN105550085A (en) 2015-12-10 2015-12-10 RDMA (remote direct memory Access) testing method based on GPUDerict

Country Status (1)

Country Link
CN (1) CN105550085A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062929A (en) * 2018-06-11 2018-12-21 上海交通大学 A kind of query task communication means and system
CN112162890A (en) * 2020-09-24 2021-01-01 深圳市航顺芯片技术研发有限公司 DMA pressure test method and device of MCU and storage medium
CN113395359A (en) * 2021-08-17 2021-09-14 苏州浪潮智能科技有限公司 File currency cluster data transmission method and system based on remote direct memory access
CN119127624A (en) * 2024-11-14 2024-12-13 之江实验室 Automated testing system and method for direct communication between heterogeneous GPUs
CN119718676A (en) * 2025-02-25 2025-03-28 山东大学 Heterogeneous GPU system and data transmission method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161939A1 (en) * 2005-04-19 2010-06-24 Stmicroelectronics S.R.L. Parallel processing method and system, for instance for supporting embedded cluster platforms, computer program product therefor
CN103345382A (en) * 2013-07-15 2013-10-09 郑州师范学院 CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161939A1 (en) * 2005-04-19 2010-06-24 Stmicroelectronics S.R.L. Parallel processing method and system, for instance for supporting embedded cluster platforms, computer program product therefor
CN103345382A (en) * 2013-07-15 2013-10-09 郑州师范学院 CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KHALED HAMIDOUCHE ETAL.: "Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters", 《IEEE》 *
SREERAM POTLURI ETAL.: "Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs", 《IEEE》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062929A (en) * 2018-06-11 2018-12-21 上海交通大学 A kind of query task communication means and system
CN109062929B (en) * 2018-06-11 2020-11-06 上海交通大学 A query task communication method and system
CN112162890A (en) * 2020-09-24 2021-01-01 深圳市航顺芯片技术研发有限公司 DMA pressure test method and device of MCU and storage medium
CN112162890B (en) * 2020-09-24 2021-09-21 深圳市航顺芯片技术研发有限公司 DMA pressure test method and device of MCU and storage medium
CN113395359A (en) * 2021-08-17 2021-09-14 苏州浪潮智能科技有限公司 File currency cluster data transmission method and system based on remote direct memory access
CN119127624A (en) * 2024-11-14 2024-12-13 之江实验室 Automated testing system and method for direct communication between heterogeneous GPUs
CN119127624B (en) * 2024-11-14 2025-03-14 之江实验室 Automated testing system and method for direct communication between heterogeneous GPUs
CN119718676A (en) * 2025-02-25 2025-03-28 山东大学 Heterogeneous GPU system and data transmission method

Similar Documents

Publication Publication Date Title
Kim et al. NBA (network balancing act) a high-performance packet processing framework for heterogeneous processors
US10067741B1 (en) Systems and methods for I/O device logging
CN105550085A (en) RDMA (remote direct memory Access) testing method based on GPUDerict
US20150317177A1 (en) Systems and methods for supporting migration of virtual machines accessing remote storage devices over network via nvme controllers
US20180300109A1 (en) Preserving dynamic trace purity
US9357035B2 (en) Optimizing network communications
CN107967180B (en) Based on resource overall situation affinity network optimized approach and system under NUMA virtualized environment
US10915368B2 (en) Data processing
US10873630B2 (en) Server architecture having dedicated compute resources for processing infrastructure-related workloads
CN108021429A (en) A kind of virutal machine memory and network interface card resource affinity computational methods based on NUMA architecture
He et al. Dxpu: Large-scale disaggregated gpu pools in the datacenter
CN104125165A (en) Job scheduling system and method based on heterogeneous cluster
Liu et al. A performance comparison of http servers in a 10g/40g network
Singh et al. Appliedmicro x-gene2
Balman et al. Experiences with 100gbps network applications
CN105868000A (en) Method for parallelly processing data in extensible manner for network I/O (input/output) virtualization
US20130013666A1 (en) Monitoring data access requests to optimize data transfer
Tang et al. Accelerating redis with RDMA over infiniband
Doddavula et al. Cloud computing solution patterns: Infrastructural solutions
Kim et al. A Hadoop-based multimedia transcoding system for processing social media in the PaaS platform of SMCCSE
Kopeć Evaluating Methods of Transferring Large Datasets
CN106200413B (en) Electro-magnetic transient Real-time Communications method and apparatus based on Aurora agreement
US11301359B2 (en) Remote debugging parallel regions in stream computing applications
CN108509155A (en) A kind of method and apparatus of remote access disk
CN102185896A (en) Cloud service-oriented device and method for sensing remote file request

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160504