TWI753421B

TWI753421B - Method, apparatus, and GPU node for executing an application program, and computing device and machine-readable storage medium therefor

Info

Publication number: TWI753421B
Application number: TW109114799A
Authority: TW
Inventors: 趙軍平
Original assignee: 大陸商支付寶（杭州）信息技術有限公司
Priority date: 2019-10-14
Filing date: 2020-05-04
Publication date: 2022-01-21
Also published as: CN110750282A; TW202115564A; CN110750282B; WO2021073214A1

Abstract

本說明書的實施例提供用於執行應用程式的方法、裝置及GPU節點。該GPU節點具有伺服端、至少一個客戶端和至少一個GPU硬體。在客戶端上啓動執行應用程式後，客戶端獲取應用程式執行時所需動態鏈接庫中規定的API界面的第一版本信息，並將該第一版本信息包含在API指令執行請求中發送給伺服端。伺服端使用第一版本信息與本地驅動程式中的第二版本信息來進行API界面適配，並且使用適配後的API界面來存取GPU硬體執行API指令，然後將API指令執行結果返回給客戶端。Embodiments of this specification provide methods, apparatus, and GPU nodes for executing applications. The GPU node has a server, at least one client and at least one GPU hardware. After starting and executing the application on the client, the client obtains the first version information of the API interface specified in the dynamic link library required for the execution of the application, and includes the first version information in the API command execution request and sends it to the server end. The server uses the first version information and the second version information in the local driver to perform API interface adaptation, and uses the adapted API interface to access the GPU hardware to execute API commands, and then returns the API command execution results to client.

Description

Method, apparatus and GPU node for executing application program and computing device and machine-readable storage medium thereof

本說明書的實施例通常涉及電腦領域，更具體地，涉及用於執行應用程式的方法、裝置及GPU節點。Embodiments of this specification generally relate to the field of computers, and more particularly, to methods, apparatuses, and GPU nodes for executing applications.

AI（Artificial Intelligence，人工智慧），特別是深度學習（Deep Learning，DL），目前已經廣泛用於支付（人臉）、定損（圖片識別）、互動與客服（語音識別、內容過濾）等多種場景，取得了顯著效果。典型DL任務需要强大的運算力支撑，因此當前絕大多數任務都執行在部署在GPU節點中的GPU等加速設備之上，GPU（Graphics Processing Unit，圖形處理器）是一種高性能計算加速設備，目前廣泛用於AI、深度學習的訓練和線上服務。在實踐中，GPU節點中的GPU硬體通常更新換代快，例如高性能GPU設備幾乎保持每年推出一代新產品，其性能、效率的提升非常明顯。隨著GPU硬體的更新換代，需要安裝新的GPU驅動程式以及進行軟體庫升級。而對於很多業務，底層軟體的驗證和升級牽涉面廣，往往比較慎重，甚至出現DL應用程式（例如，GPU應用程式）仍長期採用老的GPU驅動程式和軟體庫，由此導致DL應用程式無法執行在新的GPU硬體之上，從而無法享受更新換代後的GPU硬體所帶來的功能和性能改進。AI (Artificial Intelligence), especially Deep Learning (DL), has been widely used in payment (face), loss assessment (image recognition), interaction and customer service (speech recognition, content filtering), etc. scene with remarkable results. Typical DL tasks require powerful computing power support. Therefore, most tasks are currently executed on GPU and other acceleration devices deployed in GPU nodes. GPU (Graphics Processing Unit, graphics processor) is a high-performance computing acceleration device. Currently widely used in AI, deep learning training and online services. In practice, the GPU hardware in the GPU node is usually replaced quickly. For example, high-performance GPU devices almost always launch a new generation of products every year, and their performance and efficiency are greatly improved. With the replacement of GPU hardware, it is necessary to install new GPU drivers and upgrade the software library. For many businesses, the verification and upgrade of the underlying software involves a wide range and is often cautious. Even DL applications (for example, GPU applications) still use the old GPU drivers and software libraries for a long time. As a result, DL applications cannot be Executes on new GPU hardware and thus does not enjoy the functional and performance improvements brought by the updated GPU hardware.

鑒於上述問題，本說明書的實施例提供了一種用於執行應用程式的方法、裝置及GPU節點。利用該方法及裝置，使得應用程式能夠在無需修改或重新編譯的情況下執行在新的GPU硬體上。根據本說明書的實施例的一個方面，提供了一種用於執行應用程式的裝置，所述裝置應用於GPU節點中的伺服端，所述GPU節點中部署有至少一個GPU硬體，所述裝置包括：執行請求接收單元，從客戶端接收應用程式界面指令執行請求，所述應用程式界面指令執行請求包括應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息，所述第一版本信息是響應於所述客戶端檢測到所述應用程式被啓動執行而獲取的；適配處理單元，基於應用程式界面適配策略來根據所述應用程式界面的第一版本信息和第二版本信息進行應用程式界面適配處理，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息；應用程式執行單元，使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行應用程式界面指令；以及執行結果發送單元，將所述應用程式界面指令的執行結果發送給所述客戶端。可選地，在上述方面的一個示例中，所述裝置還可以包括：硬體發現單元，發現所述GPU節點中的GPU硬體；以及適配策略創建單元，基於所發現的GPU硬體的應用程式界面兼容列表創建所述應用程式界面適配策略。可選地，在上述方面的一個示例中，所述裝置還可以包括：GPU執行資源隔離單元，為所述GPU硬體執行所述應用程式界面指令分配隔離資源；以及指令優先級管理單元，管理所述應用程式界面指令在所述GPU硬體上執行的優先級。可選地，在上述方面的一個示例中，所述裝置還可以包括：GPU執行優化單元，對所述GPU硬體進行執行優化處理。可選地，在上述方面的一個示例中，所述客戶端和所述伺服端位於同一設備內，以及所述客戶端和所述伺服端之間的通信採用處理間通信機制來實現。可選地，在上述方面的一個示例中，所述客戶端和所述伺服端位於不同設備內，以及所述客戶端和所述伺服端之間的通信採用網路協議實現。可選地，在上述方面的一個示例中，所述客戶端與所述伺服端位於同一GPU節點內，或者所述客戶端與所述伺服端位於不同的GPU節點內。可選地，在上述方面的一個示例中，所述應用程式執行請求包括應用程式調度信息，所述應用程式調用信息用於指定所述應用程式執行時需要存取的目標GPU硬體，以及所述目標GPU硬體是所述至少一個GPU硬體中的部分或全部GPU硬體。根據本說明書的實施例的另一方面，提供一種用於執行應用程式的裝置，所述裝置應用於GPU節點中的客戶端，所述GPU節點中部署有至少一個GPU硬體，所述裝置包括：版本信息獲取單元，響應於檢測到應用程式被啓動執行，獲取所述應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息；執行請求發送單元，向所述GPU節點中的伺服端發送應用程式界面指令執行請求，所述應用程式界面指令執行請求包括所述第一版本信息，以在所述伺服端進行應用程式界面適配處理和應用程式執行處理；以及執行結果接收單元，從所述伺服端接收應用程式界面指令的執行結果，其中，所述應用程式界面適配處理是基於應用程式界面適配規則來根據所述應用程式界面的第一版本信息和第二版本信息進行的，以及所述應用程式執行處理是使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行所述應用程式界面指令，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息。根據本說明書的實施例的另一方面，提供一種GPU節點，包括：伺服端，所述伺服端包括如上所述的用於執行應用程式的裝置；至少一個客戶端，每個客戶端包括如上所述的用於執行應用程式的裝置；以及至少一個GPU硬體。根據本說明書的實施例的另一方面，提供一種用於執行應用程式的方法，所述方法應用於GPU節點中的伺服端，所述GPU節點中部署有至少一個GPU硬體，所述方法包括：從客戶端接收應用程式界面指令執行請求，所述應用程式界面指令執行請求包括應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息，所述第一版本信息是響應於所述客戶端檢測到所述應用程式被啓動執行而獲取的；基於應用程式界面適配策略來根據所述應用程式界面的第一版本信息和第二版本信息進行應用程式界面適配處理，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息；使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行應用程式界面指令；以及將所述應用程式界面指令的執行結果發送給所述客戶端。可選地，在上述方面的一個示例中，所述應用程式界面適配策略可以是基於所述GPU硬體的應用程式界面兼容列表創建的。可選地，在上述方面的一個示例中，所述客戶端和所述伺服端位於同一設備內，以及所述客戶端和所述伺服端之間的通信採用處理間通信機制來實現。可選地，在上述方面的一個示例中，所述客戶端和所述伺服端位於不同設備內，以及所述客戶端和所述伺服端之間的通信採用網路協議實現。根據本說明書的實施例的另一方面，提供一種用於執行應用程式的方法，所述方法應用於GPU節點中的客戶端，所述GPU節點中部署有至少一個GPU硬體，所述方法包括：響應於檢測到應用程式被啓動執行，獲取所述應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息；向所述GPU節點中的伺服端發送應用程式界面指令執行請求，所述應用程式界面指令執行請求包括所述第一版本信息，以在所述伺服端進行應用程式界面適配處理和應用程式執行處理；以及從所述伺服端接收應用程式界面指令的執行結果，其中，所述應用程式界面適配處理是基於應用程式界面適配規則來根據所述應用程式界面的第一版本信息和第二版本信息進行的，以及所述應用程式執行處理是使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行所述應用程式界面指令，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息。根據本說明書的實施例的另一方面，提供一種計算設備，包括：一個或多個處理器，以及與所述一個或多個處理器耦合的儲存器，所述儲存器儲存指令，當所述指令被所述一個或多個處理器執行時，使得所述一個或多個處理器執行如上所述的應用於伺服端的用於執行應用程式的方法。根據本說明書的實施例的另一方面，提供一種機器可讀儲存媒體，其儲存有可執行指令，所述指令當被執行時使得所述機器執行如上所述的應用於伺服端的用於執行應用程式的方法。根據本說明書的實施例的另一方面，提供一種計算設備，包括：一個或多個處理器，以及與所述一個或多個處理器耦合的儲存器，所述儲存器儲存指令，當所述指令被所述一個或多個處理器執行時，使得所述一個或多個處理器執行如上所述的應用於客戶端的用於執行應用程式的方法。根據本說明書的實施例的另一方面，提供一種機器可讀儲存媒體，其儲存有可執行指令，所述指令當被執行時使得所述機器執行如上所述的應用於伺服端的用於執行應用程式的方法。利用本說明書的實施例所提供的應用程式執行方法及裝置，透過提供一種具備客戶端-伺服端架構的應用程式執行機制，在客戶端上實現應用程式對API指令的調用，以及在伺服端上實現API界面適配並且經由經過適配後的API界面存取GPU硬體來執行API指令，由此解耦應用程式對API指令的調用以及存取GPU硬體來具體執行API指令，這樣，GPU應用程式能夠在客戶端加載現有的API指令，然後在伺服端完成API界面適配以使得適配後的API界面能夠存取新的GPU硬體來執行API指令，從而使得應用程式能夠在無需修改或重新編譯的情況下執行在新GPU硬體上。In view of the above problems, the embodiments of this specification provide a method, an apparatus and a GPU node for executing an application program. Using the method and apparatus, the application program can be executed on the new GPU hardware without modification or recompilation. According to an aspect of the embodiments of the present specification, there is provided an apparatus for executing an application program, the apparatus is applied to a server in a GPU node, and at least one GPU hardware is deployed in the GPU node, and the apparatus includes : an execution request receiving unit that receives an application program interface instruction execution request from the client, where the application program interface instruction execution request includes the first version information of the application program interface specified in the dynamic link library required for application execution, and the first version of the application program interface A version information is obtained in response to the client detecting that the application program is started and executed; the adaptation processing unit, based on the application program interface adaptation policy, according to the first version information and the second version information of the application program interface The version information is adapted to the application program interface, and the second version information is the version information of the application program interface in the driver of the at least one GPU hardware installed on the server; adapting the processed application program interface to access the at least one GPU hardware to execute the application program interface instruction; and an execution result sending unit to send the execution result of the application program interface instruction to the client. Optionally, in an example of the above aspect, the apparatus may further include: a hardware discovery unit, for discovering GPU hardware in the GPU node; and an adaptation policy creation unit, based on the discovered GPU hardware The API compatibility list creates the API adaptation policy. Optionally, in an example of the above aspect, the apparatus may further include: a GPU execution resource isolation unit, which allocates isolation resources for the GPU hardware to execute the application programming interface instructions; and an instruction priority management unit, which manages Priority of execution of the API instructions on the GPU hardware. Optionally, in an example of the above aspect, the apparatus may further include: a GPU execution optimization unit that performs execution optimization processing on the GPU hardware. Optionally, in an example of the above aspect, the client and the server are located in the same device, and the communication between the client and the server is implemented using an inter-process communication mechanism. Optionally, in an example of the above aspect, the client and the server are located in different devices, and the communication between the client and the server is implemented using a network protocol. Optionally, in an example of the above aspect, the client and the server are located in the same GPU node, or the client and the server are located in different GPU nodes. Optionally, in an example of the above aspect, the application program execution request includes application program scheduling information, the application program call information is used to specify the target GPU hardware that needs to be accessed when the application program is executed, and The target GPU hardware is part or all of the at least one GPU hardware. According to another aspect of the embodiments of the present specification, there is provided an apparatus for executing an application, the apparatus being applied to a client in a GPU node in which at least one GPU hardware is deployed, the apparatus comprising : a version information acquisition unit, in response to detecting that the application program is started to be executed, to acquire the first version information of the application program interface specified in the dynamic link library required for the execution of the application program; the execution request sending unit, to the GPU node The server side in the server sends an application program interface instruction execution request, and the application program interface instruction execution request includes the first version information, so as to perform the application program interface adaptation process and the application program execution process on the server side; and the execution result The receiving unit receives the execution result of the application programming interface instruction from the server, wherein the application programming interface adaptation process is based on the application programming interface adaptation rule according to the first version information and the second version information of the application programming interface. version information is performed, and the application program execution process uses an adapted application program interface to access the at least one GPU hardware to execute the application program interface instructions, and the second version information is the Version information of the application programming interface in the driver of the at least one GPU hardware installed on the server. According to another aspect of an embodiment of the present specification, there is provided a GPU node, comprising: a server including the above-mentioned apparatus for executing an application; at least one client, each client including the above-mentioned apparatus the described device for executing an application; and at least one GPU hardware. According to another aspect of the embodiments of the present specification, there is provided a method for executing an application, the method being applied to a server in a GPU node in which at least one GPU hardware is deployed, the method comprising : Receive an application program interface instruction execution request from the client, the application program interface instruction execution request includes the first version information of the application program interface specified in the dynamic link library required for the execution of the application program, and the first version information is the response Obtained when the client detects that the application is started and executed; and based on the application interface adaptation strategy, the application interface adaptation process is performed according to the first version information and the second version information of the application interface, The second version information is the version information of the application programming interface in the driver of the at least one GPU hardware installed on the server; the at least one application programming interface after the adaptation process is used to access the at least one The GPU hardware executes the application programming interface instruction; and sends the execution result of the application programming interface instruction to the client. Optionally, in an example of the above aspect, the API adaptation policy may be created based on an API compatibility list of the GPU hardware. Optionally, in an example of the above aspect, the client and the server are located in the same device, and the communication between the client and the server is implemented using an inter-process communication mechanism. Optionally, in an example of the above aspect, the client and the server are located in different devices, and the communication between the client and the server is implemented using a network protocol. According to another aspect of the embodiments of the present specification, there is provided a method for executing an application, the method being applied to a client in a GPU node in which at least one GPU hardware is deployed, the method comprising : In response to detecting that the application program is started to be executed, obtain the first version information of the application program interface specified in the dynamic link library required for the execution of the application program; send the application program interface instruction to the server in the GPU node to execute request, the application program interface instruction execution request includes the first version information, so that the application program interface adaptation process and the application program execution process are performed on the server; and the execution of the application program interface instruction is received from the server As a result, the application program interface adaptation process is performed according to the first version information and the second version information of the application program interface based on the application program interface adaptation rule, and the application program execution process is performed using the Adapting the processed API to access the at least one GPU hardware to execute the API command, and the second version information is a driver of the at least one GPU hardware installed on the server Version information for the API in . According to another aspect of the embodiments of the present specification, there is provided a computing device comprising: one or more processors, and memory coupled to the one or more processors, the memory storing instructions, when the The instructions, when executed by the one or more processors, cause the one or more processors to execute the method for executing an application as described above for the server. According to another aspect of the embodiments of the present specification, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to execute the above-described application for executing an application applied to a server. program method. According to another aspect of the embodiments of the present specification, there is provided a computing device comprising: one or more processors, and memory coupled to the one or more processors, the memory storing instructions, when the The instructions, when executed by the one or more processors, cause the one or more processors to perform a method for executing an application as applied to a client as described above. According to another aspect of the embodiments of the present specification, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to execute the above-described application for executing an application applied to a server. program method. Using the application execution method and device provided by the embodiments of this specification, by providing an application execution mechanism with a client-server architecture, the application program calls API instructions on the client, and the server Implement API interface adaptation and access GPU hardware through the adapted API interface to execute API instructions, thereby decoupling application calls to API instructions and accessing GPU hardware to specifically execute API instructions. In this way, GPU The application can load the existing API commands on the client side, and then complete the API interface adaptation on the server side, so that the adapted API interface can access the new GPU hardware to execute the API commands, so that the application can execute the API commands without modification. Or recompile on new GPU hardware.

現在將參考示例實施方式討論本文描述的主題。應該理解，討論這些實施方式只是為了使得本領域技術人員能夠更好地理解從而實現本文描述的主題，並非是對申請專利範圍中所闡述的保護範圍、適用性或者示例的限制。可以在不脫離本說明書的實施例內容的保護範圍的情況下，對所討論的元件的功能和排列進行改變。各個示例可以根據需要，省略、替代或者添加各種過程或組件。例如，所描述的方法可以按照與所描述的順序不同的順序來執行，以及各個步驟可以被添加、省略或者組合。另外，相對一些示例所描述的特徵在其它例子中也可以進行組合。如本文中使用的，術語“包括”及其變型表示開放的術語，含義是“包括但不限於”。術語“基於”表示“至少部分地基於”。術語“一個實施例”和“一實施例”表示“至少一個實施例”。術語“另一個實施例”表示“至少一個其他實施例”。術語“第一”、“第二”等可以指代不同的或相同的對象。下面可以包括其他的定義，無論是明確的還是隱含的。除非上下文中明確地指明，否則一個術語的定義在整個說明書中是一致的。如本文中所使用的，“GPU節點”可以是具有GPU處理能力的主體，例如，具有GPU處理能力的單個GPU設備或GPU系統。此外，在本文中，“應用程式”與“GPU應用程式”可以互換使用，都是意在描述能夠在GPU設備上執行的應用程式。圖1示出了用於執行應用程式的GPU系統架構100的示意圖。如圖1所示，GPU系統架構100由下到上可以包括GPU底層硬體110、GPU硬體驅動120、AI框架層130以及應用層140。在本說明書的實施例中，GPU底層硬體110可以是nvidia企業級P100, V100，T4 或消費型GTX 1080等中的一種，但是，GPU底層硬體110並不限於上述舉例。GPU底層硬體110包括GPU硬體以及用於實現GPU功能所需的GPU資源等。具體地，每個GPU底層硬體110的GPU資源例如可以包括：GPU顯示卡記憶體、計算佇列、計算任務處置柄等等。在本說明書中，GPU底層硬體110可以包括一個或多個GPU硬體實體。 GPU硬體驅動120用於驅動GPU底層硬體110，以使能GPU硬體實體進行工作。例如，GPU硬體驅動120中包含有API界面的版本信息。相應地，低於GPU硬體驅動120中所包含的版本信息的API界面不能存取GPU硬體實體。例如，假設GPU硬體驅動120中的API界面的版本是CUDA10，則調用CUDA9的應用程式就不能存取GPU硬體，從而不能在GPU硬體上執行。這裡，CUDA是GPU廠商nvidia所推出的SDK統稱，具有對應的開源界面，例如OpenCL。圖1中示出了硬體實體P100/P40/P4，V100和T4。其中，硬體實體P100/P40/P4的驅動程式中的API界面的版本為384.111，V100的驅動程式中的API界面的版本為396.26，以及T4的驅動程式中的API界面的版本為410或418。此外，T4中的API界面版本410或418最高，利用該版本的API界面可以存取硬體實體P100/P40/P4、V100和T4。V100的驅動程式中的API界面版本396.26比硬體實體P100/P40/P4的驅動程式中的API界面版本384.111高，由此利用版本396.26的API界面可以存取硬體實體P100/P40/P4和V100。硬體實體P100/P40/P4的驅動程式中的API界面版本384.111最低，利用該版本的API界面只能存取硬體實體P100/P40/P4。 AI框架層130用於提供系統支持的各種API界面，例如，CUDA10，CUDA9和CUDA8等，以供使用來構建應用程式。在編譯時，AI框架層130通常綁定特定動態鏈接庫（例如CUDA8或CUDA10）以產生可執行程式。之後，在執行時刻，基於AI框架層130的GPU應用程式在啓動時，透過作業系統搜索所需的動態鏈接庫（如CUDA8）並加載到快取記憶體中。AI框架層130例如可以包括TensorFlow，PyTorch，Caffe2等框架，所有已知的AI框架層130都支持GPU執行，即，在AI框架層130所提供的API界面的版本不低於GPU硬體的驅動程式中的API界面的版本的情況下，該AI框架層130所提供的API界面能夠存取GPU硬體實體。在應用程式被成功安裝在系統中時，所安裝的應用程式被存放在應用層140。在應用程式被啓動執行時，可以被允許使用所加載版本的API界面來存取GPU硬體，以使用GPU硬體所提供的GPU資源來執行。在本說明書中，應用程式例如可以是用戶模型。所述用戶模型可以為如下任何一種，但是不限於以下舉例：CNN（Convolutional Neural Networks，卷積神經網路）、RNN（Recurrent Neural Network，循環神經網路）、LSTM（Longshort term memory，長短期記憶網路）、GAN(Generative Adversarial Networks，生成式對抗網路)等模型。然而，在所加載的API界面的版本低於GPU硬體的驅動程式中的API界面的版本時，如圖2所示，在GPU應用程式調用CUDA9時，所調用的API界面的版本為384.111，其低於版本410/418，由此，該GPU應用程式不能使用所調用的API界面存取GPU硬體T4來執行。在實踐中，GPU節點中的GPU硬體通常更新換代快，例如高性能GPU設備幾乎保持每年推出一代新產品，其性能、效率的提升非常明顯。隨著GPU硬體的更新換代，需要安裝新的GPU驅動程式以及進行軟體庫升級。而對於很多業務，底層軟體的驗證和升級牽涉面廣，往往比較慎重，甚至出現DL應用程式（例如，GPU應用程式）仍長期採用老的GPU驅動程式和軟體庫（動態鏈接庫）。在這種情況下，如果GPU應用程式不能使用現有的API界面存取更新換代後的GPU硬體，則會無法享受更新換代後的GPU硬體所帶來的功能和性能改進。鑒於上述，在本說明書的實施例中，提供了一種具備客戶端-伺服端架構的應用程式執行機制。在該應用程式執行機制中，在客戶端上實現應用程式對API指令的調用，以及在伺服端上實現API界面適配並且經由經過適配後的API界面存取GPU硬體來執行API指令，由此解耦應用程式對API指令的調用以及存取GPU硬體來具體執行API指令，這樣，GPU應用程式能夠在客戶端加載現有的API指令，然後在伺服端完成API界面適配以使得適配後的API界面能夠存取新的GPU硬體來執行API指令，從而使得應用程式能夠在無需修改或重新編譯的情況下執行在新的GPU硬體上。下面將結合圖3到圖11描述根據本說明書的實施例的用於執行應用程式的方法、裝置及GPU節點。圖3示出了根據本說明書的實施例的用於執行應用程式的GPU節點300的架構示意圖。如圖3所示，GPU節點300包括至少一個客戶端310（例如，圖3中所示的客戶端310-1和310-2）、伺服端320以及至少一個GPU硬體330。每個客戶端310包括應用層311、AI框架層312和應用程式執行裝置313。在應用程式（例如，GPU應用程式）被成功安裝在系統中時，所安裝的應用程式被存放在應用層311。AI框架層312用於提供系統支持的各種API界面，以供使用來構建應用程式。更具體地，AI框架層312通常綁定多個動態鏈接庫（GPU動態鏈接庫），例如cuBlas，cuFFT, cuSparse，cuDNN等，每個動態鏈接庫各自規定所支持的API界面版本信息。應用程式執行裝置313被配置為在應用程式被啓動執行時，獲取應用程式執行時所需動態鏈接庫中規定的API界面的第一版本信息，並向伺服端320發送API指令執行請求，所述API指令執行請求包括第一版本信息，以及從伺服端320接收API指令執行結果。具體地，在GPU應用程式開始執行後，後台系統會根據編譯時所指定的動態鏈接庫版本，為各個GPU應用程式加載所需的動態鏈接庫，例如，針對GPU應用程式1加載libA.8.0.so，針對GPU應用程式2加載libA.9.0.so，以及針對GPU應用程式3加載libA.10.0.so。然後，應用程式執行裝置313提取當前GPU應用程式實際加載的GPU鏈接庫版本。具體地，可以透過掃描程式堆棧或程式的共享記憶體映射區（例如，linux的動態鏈接庫採用的映射文件到處理地址空間中），搜索動態鏈接庫的文件名，並提取文件名中的版本信息（例如，上述例子中的CUDA8/9/10）。然後，將所提取的版本信息包含在API指令執行請求中發送給伺服端320來執行API指令，並從伺服端320接收API指令執行結果。圖4示出了根據本說明書的實施例的應用於客戶端的應用程式執行裝置313的方塊圖。如圖4所示，應用程式執行裝置313包括版本信息獲取單元3131、執行請求發送單元3133和執行結果接收單元3135。版本信息獲取單元3131被配置為響應於檢測到應用程式被啓動執行，獲取應用程式執行時所需動態鏈接庫中規定的API界面的第一版本信息。接著，執行請求發送單元3133向GPU節點300中的伺服端320發送API指令執行請求，所述API指令執行請求包括第一版本信息，以在伺服端320進行API界面適配處理和應用程式執行處理。執行結果接收單元3135被配置為從伺服端320接收API指令的執行結果。每個客戶端310透過IPC（Inter-Process Communication，處理間通信）或者網路協議方式與伺服端320通信，以向伺服端320發送API指令執行請求以及從伺服端320接收API指令執行結果。伺服端320執行在GPU硬體驅動340之上。伺服端320是一個在系統後台長時間執行的守護進程（long running daemon）。在本說明書實施例中，一個伺服端320上可以部署有一個服務實例，該服務實例可以封裝執行在docker容器裡。伺服端320管理一個或者多個GPU硬體實體330。一個伺服端320可以對應多個客戶端主體310，或者一個GPU硬體實體330可以對應多個客戶端主體310。伺服端320包括應用程式執行裝置321。應用程式執行裝置321被配置為在從客戶端310接收到API指令執行請求後，基於所接收的API指令執行請求中包含的API界面的第一版本信息以及本地安裝的驅動程式中的API界面版本信息（即，第二版本信息），根據API界面適配策略來進行API界面適配，然後使用經過適配後的API界面來存取GPU硬體執行API指令。此外，伺服端320還將API指令執行結果返回給客戶端310。圖5示出了根據本說明書的實施例的應用於伺服端的應用程式執行裝置321的結構示意圖。如圖5所示，應用程式執行裝置321包括執行請求接收單元3211、適配處理單元3213、應用程式執行單元3215和執行結果發送單元3217。執行請求接收單元3211被配置為從客戶端310接收API指令執行請求，所述API指令執行請求包括應用程式執行時所需動態鏈接庫中規定的API界面的第一版本信息。這裡，要說明的是，伺服端320可以是從位於同一GPU節點內的客戶端310接收API指令執行請求，也可以是從位於不同的GPU節點內的客戶端310接收API指令執行請求。適配處理單元3213被配置為基於API界面適配策略來根據第一版本信息和第二版本信息進行API界面適配處理。這裡，第二版本信息是在伺服端320上安裝的GPU硬體的驅動程式中的API界面的版本信息，例如，針對GPU硬體T4，第二版本信息是410或418。這裡，API界面適配策略可以是基於GPU硬體的API界面兼容列表創建。例如，API界面適配策略可以是基於GPU硬體生成廠商提供的API界面兼容列表預先創建的。或者，在一個示例中，應用程式執行裝置321可以包括硬體發現單元322和適配策略創建單元323。硬體發現單元322被配置為發現GPU節點300中的GPU硬體330。適配策略創建單元323被配置為基於所發現的GPU硬體330的API界面兼容列表創建API界面適配策略。圖6示出了根據本說明書的實施例的API界面適配策略的示例示意圖。圖6中示出了2個客戶端（客戶端1和客戶端2）和1個伺服端，其中，客戶端1的API界面版本是CUDA10，客戶端2的API界面版本是CUDA9，以及伺服端的API界面版本是CUDA10。如圖6所示，針對API1和API3，伺服端320的API界面版本中沒有發生參數變化，由此，在進行適配處理時，針對客戶端1和2發送的API指令執行請求，都保持API1和API3的參數不變。針對API2，伺服端320的API界面版本中發生參數變化，由此，在進行適配處理時，針對客戶端1發送的API指令執行請求，保持API2的參數不變，以及針對客戶端2發送的API指令執行請求，對API2進行參數轉換。針對API4，伺服端320的API界面版本中廢止，由此，在進行適配處理時，針對客戶端1發送的API指令執行請求，不進行任何操作，以及針對客戶端2發送的API指令執行請求，忽略執行並返回執行成功消息。回到圖5，在對API界面進行適配處理後，應用程式執行單元3215使用經過適配處理後的API界面來存取GPU硬體執行API指令。然後，執行結果發送單元3217將API指令的執行結果發送給客戶端310。這裡要說明的是，在一個示例中，客戶端310和伺服端320可以位於同一設備內。在這種情況下，客戶端310和伺服端320之間可以採用IPC方式（例如UNIX socket，管道或者共享記憶體等）來實現通信。在另一示例中，客戶端310和伺服端320也可以位於不同設備內。在這種情況下，客戶端310和伺服端320之間可以採用網路協議來實現通信，例如，採用TCP（Transmission Control Protocol，傳輸控制協議）通信協議、IP通信協議、或者RDMA（Remote Direct Memory Access，直接記憶體存取）通信協議等來實現通信。此外，伺服端320還可以同時從多個客戶端310接收API指令執行請求。在這種情況下，應用程式執行裝置321還可以包括GPU執行資源隔離單元324和優先級關聯單元325。GPU執行資源隔離單元324為GPU硬體執行API指令分配隔離資源。指令優先級管理單元325被配置為管理API指令在GPU硬體上執行的優先級。此外，為了優化伺服端320上的GPU執行效率。應用程式執行裝置321還可以包括GPU執行優化單元326。GPU執行優化單元326被配置為對GPU硬體進行執行優化處理。所述優化處理例如可以包括GPU顯示卡記憶體優化、GPU性能優化和/或GPU擴展性優化等等。此外，在多個GPU節點300構成GPU計算集群時，GPU計算集群中還可以包括集群調度器。集群調度器透過網路或者IPC方式與GPU節點300中的客戶端310通信。集群調度器負責集群範圍內GPU資源的調度。伺服端320向集群調度器上報GPU資源（物理GPU資源和/或虛擬GPU資源），例如透過設備插件“nvidia device plugin”上報。使得集群調度器控制集群範圍內所有GPU資源的分配。每個客戶端310向集群調度器申請GPU資源。集群調度器負責調度執行所有GPU資源的分配，例如，在目標pod上啓動實例來實現以將目標擬GPU資源分配給對應的客戶端310。具體地，集群調度器可以包括但不限於K8S（Kubernetes）調度器（kube-scheduler）或者Kubemaker調度器。在GPU計算集群的情況下，所述應用程式執行請求還可以包括應用程式調度信息，所述應用程式調用信息用於指定所述應用程式執行時需要存取的目標GPU硬體（即，用於執行應用程式的GPU硬體）。這裡，所述目標GPU硬體可以是所述至少一個GPU硬體中的部分或全部GPU硬體。圖7示出了根據本說明書的實施例的用於執行應用程式的方法700的流程圖。如圖7所示，在步驟710，響應於檢測到應用程式被啓動執行，客戶端310獲取應用程式執行時所需動態鏈接庫中規定的API界面的第一版本信息。接著，在步驟720，客戶端310向伺服端320發API指令執行請求，所述API指令執行請求包括第一版本信息。在接收到API指令執行請求後，在步驟730，伺服端320基於API界面適配策略來根據第一版本信息和第二版本信息進行API界面適配處理。這裡，第二版本信息是伺服端320上安裝的至少一個GPU硬體的驅動程式中的API界面的版本信息。隨後，在步驟740，伺服端320使用經過適配處理後的API界面來存取GPU硬體執行API指令。接著，在步驟750，將API指令的執行結果發送給客戶端310，由此完成API指令執行過程。如上參照圖3到圖7描述了根據本說明書的實施例的應用程式執行方法及系統架構。根據本說明書的實施例的系統架構可以部署在裸機環境、容器環境或者VM虛擬機環境，參考圖8A~圖8C所示。這裡要說明的是，圖8A~8C中示出的客戶端是本說明書的實施例中公開的客戶端310的客戶端主體，該客戶端主體包括應用程式執行裝置313，但不包括應用程311和AI框架層312。如果該系統架構部署在裸機環境中，如圖8A所示，伺服端和客戶端主體都執行在主作業系統（host OS）上（例如，都執行在linux上）。伺服端透過GPU驅動接管所有對GPU資源的存取。其中，若客戶端主體與伺服端在同一機器，通信可以採用IPC方式通信；若客戶端主體與伺服端不在同一機器，則採用TCP協議、IP協議或者RDMA協議等通信。如果該系統架構部署在容器環境中，如圖8B所示，伺服端以容器化方式執行並管理GPU資源。客戶端主體（例如K8S pod）與伺服端執行在同一實體機上，客戶端主體與伺服端之間的通信可以採用IPC，（例如UNIX socket，Pipe或者shmem）或者網路協議實現。如果該系統架構部署在虛擬機環境中，如圖8C所示，GPU資源給特定的實體機，然後在VM Guest OS裡啓動伺服端或客戶端主體，等同於裸機環境。由此可見，該系統架構可以同時支持部署在裸機、容器和虛擬機，從而使得部署非常靈活。圖9示出了根據本說明書的實施例的用於在GPU計算集群900中部署新的GPU節點的示例示意圖。在現有GPU計算集群（例如，圖9中的採用的型號GPU-A的GPU節點1和GPU節點2構成的GPU計算集群）需要部署新GPU節點（例如採用型號GPU-B和驅動B的GPU節點3）時，首先在新GPU節點3上部署根據本說明書實施例的客戶端310和伺服端320。所部署的伺服端320支持較新的GPU硬體驅動版本B，並且更新API適配表（即，伺服端320的API界面版本更新為版本B）。然後，將GPU節點3加入到現有GPU計算集群中，使得GPU節點3中的伺服端和客戶端能夠與GPU計算集群中的其它GPU節點（即，GPU節點1和2）的伺服端和客戶端進行通信，從而形成多種GPU型號並存的異構計算集群。按照這種方式，新增加的GPU節點3的API界面版本比GPU節點1和2中的API界面版本更新，透過在GPU節點3中部署根據本說明書實施例的客戶端310和伺服端320，由於GPU節點3中部署的伺服端320支持較新的硬體、驅動，由此利用本說明書提供的API界面版本提取-協商-適配機制，可以在伺服端320適配老版本的API並代為執行，從而使得現有GPU應用可以無修改地調度並執行在新增加的GPU節點3上，打破了原來特定GPU應用必須執行在特定GPU型號上的限制，提高了集群範圍內應用部署、調度的靈活性，例如，可以在新GPU節點上啓動一個GPU應用實例（例如封裝在容器中），或者將其他GPU節點上正在執行的GPU應用實例熱遷移到該新GPU節點（例如為了負載均衡、系統維護等）等場景。此外，當該異構集群需要部署其他GPU硬體（例如GPU-C）時，可以重複上述操作過程來實現。根據本說明書的實施例，透過提供API界面版本提取-協商-適配機制，可以實現對上（AI框架、應用模型等）保持界面一致，對下則屏蔽實體硬體和驅動的實現差異，從而能夠將GPU資源進行更好的抽象、封裝和隔離管理，進而有助於提高GPU資源利用率、提高調度靈活性、支持熱遷移，實現透明化管理。如上參照圖3到圖9，對根據本說明書的實施例的用於執行應用程式的方法及裝置的實施例進行了描述。上面的應用程式執行裝置可以採用硬體實現，也可以採用軟體或者硬體和軟體的組合來實現。圖10為根據本說明書的實施例的應用於客戶端的用於執行應用程式的計算設備1000的結構方塊圖。如圖10所示，計算設備1000可以包括至少一個處理器1010、儲存器（例如，非易失性儲存器）1020、記憶體1030、通信界面1040以及內部匯流排1060，並且至少一個處理器1010、儲存器1020、記憶體1030和通信界面1040經由匯流排1060連接在一起。該至少一個處理器1010執行在電腦可讀儲存媒體中儲存或編碼的至少一個電腦可讀指令（即，上述以軟體形式實現的元素）。在一個實施例中，在儲存器中儲存有電腦可執行指令，其當執行時使得至少一個處理器1010：響應於檢測到應用程式被啓動執行，獲取所述應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息；向所述GPU節點中的伺服端發送應用程式界面指令執行請求，所述應用程式界面指令執行請求包括所述第一版本信息，以在所述伺服端進行應用程式界面適配處理和應用程式執行處理；以及從所述伺服端接收應用程式界面指令的執行結果，其中，所述應用程式界面適配處理是基於應用程式界面適配策略來根據所述應用程式界面的第一版本信息和第二版本信息進行的，以及所述應用程式執行處理是使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行所述應用程式界面指令，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息。應該理解的是，在儲存器中儲存的電腦可執行指令當執行時使得至少一個處理器1010執行在本說明書的各個實施例中如上結合圖3-9描述的各種操作和功能。圖11為根據本說明書的實施例的應用於客戶端的用於執行應用程式的計算設備1100的結構方塊圖。如圖11所示，計算設備1100可以包括至少一個處理器1110、儲存器（例如，非易失性儲存器）1120、記憶體1130、通信界面1140以及內部匯流排1160，並且至少一個處理器1110、儲存器1120、記憶體1130和通信界面1140經由匯流排1160連接在一起。該至少一個處理器1110執行在電腦可讀儲存媒體中儲存或編碼的至少一個電腦可讀指令（即，上述以軟體形式實現的元素）。在一個實施例中，在儲存器中儲存有電腦可執行指令，其當執行時使得至少一個處理器1110：從客戶端接收應用程式界面指令執行請求，所述應用程式界面指令執行請求包括應用程式執行時所需動態鏈接庫中規定的應用程式界面的第一版本信息，所述第一版本信息是響應於所述客戶端檢測到所述應用程式被啓動執行而獲取的；基於應用程式界面適配策略來根據所述應用程式界面的第一版本信息和第二版本信息進行應用程式界面適配處理，所述第二版本信息是所述伺服端上安裝的所述至少一個GPU硬體的驅動程式中的應用程式界面的版本信息；使用經過適配處理後的應用程式界面來存取所述至少一個GPU硬體執行應用程式界面指令；以及將所述應用程式界面指令的執行結果發送給所述客戶端。應該理解的是，在儲存器中儲存的電腦可執行指令當執行時使得至少一個處理器1110執行在本說明書的各個實施例中如上結合圖3-9描述的各種操作和功能。根據一個實施例，提供了一種例如非暫時性機器可讀媒體的程式產品。非暫時性機器可讀媒體可以具有指令（即，上述以軟體形式實現的元素），該指令當被機器執行時，使得機器執行本說明書的各個實施例中如上結合圖3-9描述的各種操作和功能。具體地，可以提供配有可讀儲存媒體的系統或者裝置，在該可讀儲存媒體上儲存著實現上述實施例中任一實施例的功能的軟體程式碼，且使該系統或者裝置的電腦或處理器讀出並執行儲存在該可讀儲存媒體中的指令。在這種情況下，從可讀媒體讀取的程式碼本身可實現上述實施例中任何一項實施例的功能，因此機器可讀碼和儲存機器可讀碼的可讀儲存媒體構成了本發明的一部分。可讀儲存媒體的實施例包括軟碟、硬碟、磁光碟、光碟（如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW）、磁帶、非易失性儲存卡和ROM。可選擇地，可以由通信網路從伺服器電腦上或雲上下載程式碼。本領域技術人員應當理解，上面公開的各個實施例可以在不偏離發明實質的情況下做出各種變形和修改。因此，本發明的保護範圍應當由所附的申請專利範圍來限定。需要說明的是，上述各流程和各系統結構圖中不是所有的步驟和單元都是必須的，可以根據實際的需要忽略某些步驟或單元。各步驟的執行順序不是固定的，可以根據需要進行確定。上述各實施例中描述的裝置結構可以是物理結構，也可以是邏輯結構，即，有些單元可能由同一物理實體實現，或者，有些單元可能分由多個物理實體實現，或者，可以由多個獨立設備中的某些部件共同實現。以上各實施例中，硬體單元或模組可以透過機械方式或電氣方式實現。例如，一個硬體單元、模組或處理器可以包括永久性專用的電路或邏輯（如專門的處理器，FPGA或ASIC）來完成相應操作。硬體單元或處理器還可以包括可編程邏輯或電路（如通用處理器或其它可編程處理器），可以由軟體進行臨時的設置以完成相應操作。具體的實現方式（機械方式、或專用的永久性電路、或者臨時設置的電路）可以基於成本和時間上的考慮來確定。上面結合附圖闡述的具體實施方式描述了示例性實施例，但並不表示可以實現的或者落入申請專利範圍的保護範圍的所有實施例。在整個本說明書中使用的術語“示例性”意味著“用作示例、實例或例示”，並不意味著比其它實施例“優選”或“具有優勢”。出於提供對所描述技術的理解的目的，具體實施方式包括具體細節。然而，可以在沒有這些具體細節的情況下實施這些技術。在一些實例中，為了避免對所描述的實施例的概念造成難以理解，公知的結構和裝置以方塊圖形式示出。本公開案內容的上述描述被提供來使得本領域任何普通技術人員能夠實現或者使用本公開案內容。對於本領域普通技術人員來說，對本公開案內容進行的各種修改是顯而易見的，並且，也可以在不脫離本公開案內容的保護範圍的情況下，將本文所定義的一般性原理應用於其它變型。因此，本公開案內容並不限於本文所描述的示例和設計，而是與符合本文公開的原理和新穎性特徵的最廣範圍相一致。 The subject matter described herein will now be discussed with reference to example implementations. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and not to limit the scope of protection, applicability or examples set forth in the scope of the patent application. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiment content of this specification. Various examples may omit, substitute, or add various procedures or components as desired. For example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples. As used herein, the term "including" and variations thereof represent open-ended terms meaning "including but not limited to". The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment." The terms "first", "second", etc. may refer to different or the same objects. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise. As used herein, a "GPU node" may be a subject with GPU processing capabilities, eg, a single GPU device or GPU system with GPU processing capabilities. Also, in this document, "application" and "GPU application" are used interchangeably, both intended to describe an application that can be executed on a GPU device. 1 shows a schematic diagram of a GPU system architecture 100 for executing applications. As shown in FIG. 1 , the GPU system architecture 100 may include GPU underlying hardware 110 , GPU hardware driver 120 , AI framework layer 130 and application layer 140 from bottom to top. In the embodiments of this specification, the GPU underlying hardware 110 may be one of nvidia enterprise-level P100, V100, T4 or consumer GTX 1080, etc. However, the GPU underlying hardware 110 is not limited to the above examples. The GPU underlying hardware 110 includes GPU hardware and GPU resources required for implementing GPU functions. Specifically, the GPU resources of each GPU underlying hardware 110 may include, for example, GPU display card memory, computing queues, computing task processing handles, and the like. In this specification, the GPU underlying hardware 110 may include one or more GPU hardware entities. The GPU hardware driver 120 is used to drive the GPU underlying hardware 110 to enable the GPU hardware entity to work. For example, the GPU hardware driver 120 contains version information of the API interface. Accordingly, an API interface lower than the version information contained in the GPU hardware driver 120 cannot access the GPU hardware entity. For example, if the version of the API interface in the GPU hardware driver 120 is CUDA10, an application calling CUDA9 cannot access the GPU hardware, and thus cannot be executed on the GPU hardware. Here, CUDA is the collective name of the SDK launched by GPU manufacturer nvidia, which has a corresponding open source interface, such as OpenCL. The hardware entities P100/P40/P4, V100 and T4 are shown in FIG. 1 . Among them, the version of the API interface in the driver of the hardware entity P100/P40/P4 is 384.111, the version of the API interface in the driver of V100 is 396.26, and the version of the API interface in the driver of T4 is 410 or 418 . In addition, the API interface version 410 or 418 in T4 is the highest, and the hardware entities P100/P40/P4, V100 and T4 can be accessed using the API interface of this version. The API interface version 396.26 in the driver of V100 is higher than the API interface version 384.111 in the driver of the hardware entity P100/P40/P4, so the API interface of version 396.26 can be used to access the hardware entity P100/P40/P4 and V100. The API interface version 384.111 in the driver of the hardware entity P100/P40/P4 is the lowest, and the API interface of this version can only access the hardware entity P100/P40/P4. The AI framework layer 130 is used to provide various API interfaces supported by the system, such as CUDA10, CUDA9, and CUDA8, etc., for use in building applications. At compile time, the AI framework layer 130 typically binds a specific dynamic link library (eg, CUDA8 or CUDA10) to generate an executable program. Then, at the execution time, when the GPU application based on the AI framework layer 130 is started, the required dynamic link library (eg CUDA8) is searched through the operating system and loaded into the cache memory. The AI framework layer 130 may include, for example, frameworks such as TensorFlow, PyTorch, Caffe2, etc. All known AI framework layers 130 support GPU execution, that is, the version of the API interface provided in the AI framework layer 130 is not lower than the driver of the GPU hardware In the case of the version of the API interface in the program, the API interface provided by the AI framework layer 130 can access the GPU hardware entity. When an application is successfully installed in the system, the installed application is stored in the application layer 140 . When the application is started to execute, it can be allowed to use the API interface of the loaded version to access the GPU hardware, so as to use the GPU resources provided by the GPU hardware for execution. In this specification, the application may be, for example, a user model. The user model may be any of the following, but is not limited to the following examples: CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Network, recurrent neural network), LSTM (Longshort term memory, long short term memory, long short term memory) Network), GAN (Generative Adversarial Networks, Generative Adversarial Networks) and other models. However, when the version of the loaded API interface is lower than the version of the API interface in the driver of the GPU hardware, as shown in Figure 2, when the GPU application calls CUDA9, the version of the called API interface is 384.111, It is lower than version 410/418, thus, the GPU application cannot use the called API interface to access the GPU hardware T4 for execution. In practice, the GPU hardware in the GPU node is usually replaced quickly. For example, high-performance GPU devices almost always launch a new generation of products every year, and their performance and efficiency are greatly improved. With the replacement of GPU hardware, it is necessary to install new GPU drivers and upgrade the software library. For many businesses, the verification and upgrade of the underlying software involves a wide range of aspects, and is often more cautious. Even DL applications (eg, GPU applications) still use the old GPU drivers and software libraries (dynamic link libraries) for a long time. In this case, if the GPU application cannot use the existing API interface to access the updated GPU hardware, it will not be able to enjoy the functionality and performance improvements brought by the updated GPU hardware. In view of the above, in the embodiments of this specification, an application execution mechanism with a client-server architecture is provided. In the application execution mechanism, the application program calls API instructions on the client, and the API interface adaptation is implemented on the server side, and the GPU hardware is accessed through the adapted API interface to execute the API instructions. This decouples the application's call to API commands and accesses the GPU hardware to execute the API commands specifically. In this way, the GPU application can load the existing API commands on the client side, and then complete the API interface adaptation on the server side to make the appropriate The configured API interface can access the new GPU hardware to execute API instructions, enabling applications to run on the new GPU hardware without modification or recompilation. A method, an apparatus, and a GPU node for executing an application program according to embodiments of the present specification will be described below with reference to FIGS. 3 to 11 . FIG. 3 shows a schematic diagram of the architecture of a GPU node 300 for executing applications according to an embodiment of the present specification. As shown in FIG. 3 , the GPU node 300 includes at least one client 310 (eg, clients 310 - 1 and 310 - 2 shown in FIG. 3 ), a server 320 and at least one GPU hardware 330 . Each client 310 includes an application layer 311 , an AI framework layer 312 and an application execution device 313 . When an application (eg, a GPU application) is successfully installed in the system, the installed application is stored in the application layer 311 . The AI framework layer 312 is used to provide various API interfaces supported by the system for use in building applications. More specifically, the AI framework layer 312 usually binds multiple dynamic link libraries (GPU dynamic link libraries), such as cuBlas, cuFFT, cuSparse, cuDNN, etc., and each dynamic link library specifies the supported API interface version information. The application program execution device 313 is configured to obtain the first version information of the API interface specified in the dynamic link library required for the execution of the application program when the application program is started to execute, and send the API command execution request to the server 320, the said The API instruction execution request includes the first version information, and the API instruction execution result is received from the server 320 . Specifically, after the GPU application starts to execute, the background system will load the required dynamic link library for each GPU application according to the dynamic link library version specified at compile time, for example, load libA.8.0 for GPU application 1. so, load libA.9.0.so for GPU application 2 and libA.10.0.so for GPU application 3. Then, the application execution device 313 extracts the version of the GPU link library actually loaded by the current GPU application. Specifically, the file name of the dynamic link library can be searched by scanning the program stack or the shared memory mapping area of the program (for example, the mapping file used by the linux dynamic link library into the processing address space), and the version in the file name can be extracted. information (eg, CUDA8/9/10 in the above example). Then, the extracted version information is included in the API command execution request and sent to the server 320 to execute the API command, and the API command execution result is received from the server 320 . FIG. 4 shows a block diagram of the application execution device 313 applied to the client according to an embodiment of the present specification. As shown in FIG. 4 , the application execution device 313 includes a version information acquisition unit 3131 , an execution request sending unit 3133 and an execution result receiving unit 3135 . The version information obtaining unit 3131 is configured to obtain the first version information of the API interface specified in the dynamic link library required for the execution of the application program in response to detecting that the application program is started to be executed. Next, the execution request sending unit 3133 sends an API command execution request to the server 320 in the GPU node 300, where the API command execution request includes the first version information, so that the server 320 can perform API interface adaptation processing and application program execution processing . The execution result receiving unit 3135 is configured to receive the execution result of the API instruction from the server 320 . Each client 310 communicates with the server 320 through IPC (Inter-Process Communication) or network protocol to send an API command execution request to the server 320 and receive an API command execution result from the server 320 . The server 320 is executed on the GPU hardware driver 340 . The server 320 is a long running daemon that runs in the background of the system for a long time. In the embodiment of this specification, a service instance may be deployed on a server 320, and the service instance may be packaged and executed in a docker container. The server 320 manages one or more GPU hardware entities 330 . One server 320 may correspond to multiple client entities 310 , or one GPU hardware entity 330 may correspond to multiple client entities 310 . The server 320 includes an application program execution device 321 . The application program execution device 321 is configured to, after receiving the API instruction execution request from the client 310, based on the first version information of the API interface contained in the received API instruction execution request and the API interface version in the locally installed driver information (ie, the second version information), the API interface is adapted according to the API interface adaptation policy, and then the adapted API interface is used to access the GPU hardware to execute API instructions. In addition, the server 320 also returns the execution result of the API instruction to the client 310 . FIG. 5 is a schematic structural diagram of an application executing apparatus 321 applied to a server according to an embodiment of the present specification. As shown in FIG. 5 , the application program execution device 321 includes an execution request receiving unit 3211 , an adaptation processing unit 3213 , an application program execution unit 3215 and an execution result sending unit 3217 . The execution request receiving unit 3211 is configured to receive an API instruction execution request from the client 310, where the API instruction execution request includes the first version information of the API interface specified in the dynamic link library required for application execution. Here, it should be noted that the server 320 may receive the API instruction execution request from the client 310 located in the same GPU node, or may receive the API instruction execution request from the client 310 located in a different GPU node. The adaptation processing unit 3213 is configured to perform API interface adaptation processing according to the first version information and the second version information based on the API interface adaptation policy. Here, the second version information is the version information of the API interface in the driver of the GPU hardware installed on the server 320. For example, for the GPU hardware T4, the second version information is 410 or 418. Here, the API interface adaptation strategy may be created based on the API interface compatibility list of the GPU hardware. For example, the API interface adaptation strategy may be pre-created based on the API interface compatibility list provided by the GPU hardware generation manufacturer. Alternatively, in one example, the application execution device 321 may include a hardware discovery unit 322 and an adaptation policy creation unit 323 . Hardware discovery unit 322 is configured to discover GPU hardware 330 in GPU node 300 . The adaptation strategy creation unit 323 is configured to create an API interface adaptation strategy based on the discovered API interface compatibility list of the GPU hardware 330 . FIG. 6 shows an example schematic diagram of an API interface adaptation strategy according to an embodiment of the present specification. Figure 6 shows 2 clients (client 1 and client 2) and 1 server, wherein the API interface version of client 1 is CUDA10, the API interface version of client 2 is CUDA9, and the server's API interface version is CUDA10. The API interface version is CUDA10. As shown in FIG. 6 , for API1 and API3, there is no parameter change in the API interface version of the server 320. Therefore, during the adaptation process, API1 is maintained for the API command execution requests sent by clients 1 and 2. And the parameters of API3 are unchanged. For API2, the parameters of the API interface version of the server end 320 change. Therefore, during the adaptation process, the request is executed for the API command sent by client 1, the parameters of API2 are kept unchanged, and the request sent by client 2 is The API command executes the request and performs parameter conversion for API2. For API4, the API interface version of the server 320 is abolished. Therefore, when performing the adaptation process, the request is executed for the API command sent by the client 1 without any operation, and the request is executed for the API command sent by the client 2. , ignore the execution and return the execution success message. Returning to FIG. 5 , after the API interface is adapted, the application execution unit 3215 uses the adapted API interface to access the GPU hardware to execute API instructions. Then, the execution result sending unit 3217 sends the execution result of the API instruction to the client 310 . It should be noted here that, in one example, the client 310 and the server 320 may be located in the same device. In this case, the communication between the client 310 and the server 320 may be implemented in an IPC manner (eg, UNIX socket, pipe or shared memory, etc.). In another example, the client 310 and the server 320 may also be located in different devices. In this case, a network protocol may be used to implement communication between the client 310 and the server 320, for example, a TCP (Transmission Control Protocol, Transmission Control Protocol) communication protocol, an IP communication protocol, or an RDMA (Remote Direct Memory) communication protocol may be used. Access, direct memory access) communication protocol, etc. to achieve communication. In addition, the server 320 can also receive API instruction execution requests from multiple clients 310 at the same time. In this case, the application execution device 321 may further include a GPU execution resource isolation unit 324 and a priority association unit 325 . The GPU execution resource isolation unit 324 allocates isolation resources for the GPU hardware to execute API instructions. The instruction priority management unit 325 is configured to manage the priority of the execution of API instructions on the GPU hardware. In addition, in order to optimize the GPU execution efficiency on the server 320 . The application execution device 321 may also include a GPU execution optimization unit 326 . GPU execution optimization unit 326 is configured to perform execution optimization processing on GPU hardware. The optimization process may include, for example, GPU display card memory optimization, GPU performance optimization, and/or GPU scalability optimization, and the like. In addition, when multiple GPU nodes 300 form a GPU computing cluster, the GPU computing cluster may further include a cluster scheduler. The cluster scheduler communicates with the client 310 in the GPU node 300 through the network or IPC. The cluster scheduler is responsible for the scheduling of GPU resources within the cluster. The server 320 reports GPU resources (physical GPU resources and/or virtual GPU resources) to the cluster scheduler, for example, through the device plug-in "nvidia device plugin". Makes the cluster scheduler control the allocation of all GPU resources within the cluster. Each client 310 requests GPU resources from the cluster scheduler. The cluster scheduler is responsible for scheduling and executing the allocation of all GPU resources, eg, by launching an instance on the target pod to allocate the target quasi-GPU resource to the corresponding client 310 . Specifically, the cluster scheduler may include, but is not limited to, the K8S (Kubernetes) scheduler (kube-scheduler) or the Kubemaker scheduler. In the case of a GPU computing cluster, the application execution request may further include application scheduling information, and the application invocation information is used to specify the target GPU hardware (that is, for the GPU hardware on which the application is executed). Here, the target GPU hardware may be part or all of the at least one GPU hardware. FIG. 7 shows a flowchart of a method 700 for executing an application according to an embodiment of the present specification. As shown in FIG. 7 , in step 710, in response to detecting that the application program is started to be executed, the client 310 obtains the first version information of the API interface specified in the dynamic link library required for the execution of the application program. Next, in step 720, the client 310 sends an API instruction execution request to the server 320, where the API instruction execution request includes the first version information. After receiving the API instruction execution request, in step 730, the server 320 performs API interface adaptation processing according to the first version information and the second version information based on the API interface adaptation policy. Here, the second version information is the version information of the API interface in the driver of at least one GPU hardware installed on the server 320 . Then, in step 740, the server 320 uses the adapted API interface to access the GPU hardware to execute the API command. Next, in step 750, the execution result of the API instruction is sent to the client 310, thereby completing the API instruction execution process. The application execution method and system architecture according to the embodiments of the present specification are described above with reference to FIGS. 3 to 7 . The system architecture according to the embodiments of this specification may be deployed in a bare metal environment, a container environment, or a VM virtual machine environment, as shown in FIG. 8A to FIG. 8C . It should be noted here that the client shown in FIGS. 8A to 8C is the client body of the client 310 disclosed in the embodiments of this specification, and the client body includes the application program execution device 313 but does not include the application program 311 and AI framework layer 312. If the system architecture is deployed in a bare metal environment, as shown in FIG. 8A , both the server side and the client side are executed on the host operating system (host OS) (for example, both are executed on linux). The server takes over all access to GPU resources through the GPU driver. Among them, if the client main body and the server end are in the same machine, the communication can use IPC communication; if the client main body and the server end are not in the same machine, the TCP protocol, IP protocol or RDMA protocol is used for communication. If the system architecture is deployed in a container environment, as shown in Figure 8B, the server side executes and manages GPU resources in a containerized manner. The client body (such as K8S pod) and the server are executed on the same physical machine, and the communication between the client body and the server can be implemented using IPC, (such as UNIX socket, Pipe or shmem) or network protocols. If the system architecture is deployed in a virtual machine environment, as shown in Figure 8C, GPU resources are given to a specific physical machine, and then the server or client main body is started in the VM Guest OS, which is equivalent to a bare metal environment. It can be seen that the system architecture can support deployment on bare metal, containers and virtual machines at the same time, making the deployment very flexible. FIG. 9 shows an example schematic diagram for deploying a new GPU node in a GPU computing cluster 900 according to an embodiment of the present specification. In an existing GPU computing cluster (for example, the GPU computing cluster formed by GPU node 1 and GPU node 2 of model GPU-A in Figure 9), it is necessary to deploy new GPU nodes (such as GPU nodes of model GPU-B and driver B) 3), first deploy the client 310 and the server 320 according to the embodiment of the present specification on the new GPU node 3 . The deployed server 320 supports the newer GPU hardware driver version B, and the API adaptation table is updated (ie, the API interface version of the server 320 is updated to version B). Then, add GPU node 3 to the existing GPU computing cluster, so that the server and client in GPU node 3 can communicate with the servers and clients of other GPU nodes (ie, GPU nodes 1 and 2) in the GPU computing cluster. Communicate to form a heterogeneous computing cluster with multiple GPU models coexisting. In this way, the API interface version of the newly added GPU node 3 is newer than the API interface versions in the GPU nodes 1 and 2. By deploying the client 310 and the server 320 according to the embodiment of the present specification in the GPU node 3, because The server 320 deployed in the GPU node 3 supports newer hardware and drivers, so using the API interface version extraction-negotiation-adaptation mechanism provided in this specification, the server 320 can adapt to the old version of the API and execute it on its behalf , so that existing GPU applications can be scheduled and executed on the newly added GPU node 3 without modification, breaking the original limitation that specific GPU applications must be executed on specific GPU models, and improving the flexibility of application deployment and scheduling within the cluster , for example, a GPU application instance can be started on a new GPU node (for example, encapsulated in a container), or a GPU application instance executing on other GPU nodes can be live migrated to the new GPU node (for example, for load balancing, system maintenance, etc. ) and other scenarios. In addition, when the heterogeneous cluster needs to deploy other GPU hardware (such as GPU-C), the above operation process can be repeated. According to the embodiments of this specification, by providing an API interface version extraction-negotiation-adaptation mechanism, it is possible to maintain a consistent interface for the upper (AI framework, application model, etc.) It can better abstract, encapsulate, and isolate GPU resources, thereby helping to improve GPU resource utilization, improve scheduling flexibility, support hot migration, and achieve transparent management. As above, with reference to FIGS. 3 to 9 , embodiments of a method and apparatus for executing an application program according to embodiments of the present specification are described. The above application program execution device may be implemented by hardware, software or a combination of hardware and software. FIG. 10 is a structural block diagram of a computing device 1000 applied to a client for executing an application program according to an embodiment of the present specification. As shown in FIG. 10 , computing device 1000 may include at least one processor 1010 , storage (eg, non-volatile storage) 1020 , memory 1030 , communication interface 1040 , and internal bus 1060 , and at least one processor 1010 , the storage 1020 , the memory 1030 and the communication interface 1040 are connected together via the bus bar 1060 . The at least one processor 1010 executes at least one computer-readable instruction stored or encoded in a computer-readable storage medium (ie, the aforementioned elements implemented in software). In one embodiment, computer-executable instructions are stored in the memory, which, when executed, cause at least one processor 1010 to: in response to detecting that the application program is started to be executed, obtain the dynamic link library required for the execution of the application program The first version information of the application programming interface specified in; send an application programming interface instruction execution request to the server in the GPU node, and the application programming interface instruction execution request includes the first version information to be used in the server The application program interface adaptation process and the application program execution process are performed by the server; and the execution result of the application program interface instruction is received from the server side, wherein the application program interface adaptation process is based on the application program interface adaptation strategy. the first version information and the second version information of the application programming interface, and the application program execution process is to use the adapted application programming interface to access the at least one GPU hardware to execute the application program The interface instruction, the second version information is the version information of the application program interface in the driver of the at least one GPU hardware installed on the server. It should be understood that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1010 to perform the various operations and functions described above in connection with FIGS. 3-9 in various embodiments of this specification. FIG. 11 is a structural block diagram of a computing device 1100 applied to a client for executing an application program according to an embodiment of the present specification. As shown in FIG. 11 , computing device 1100 may include at least one processor 1110 , storage (eg, non-volatile storage) 1120 , memory 1130 , communication interface 1140 , and internal bus 1160 , and at least one processor 1110 , the storage 1120 , the memory 1130 and the communication interface 1140 are connected together via the bus bar 1160 . The at least one processor 1110 executes at least one computer-readable instruction stored or encoded in a computer-readable storage medium (ie, the above-described elements implemented in software). In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause at least one processor 1110 to: receive an application programming interface instruction execution request from a client, the application programming interface instruction execution request comprising an application program The first version information of the application program interface specified in the dynamic link library required for execution, the first version information is obtained in response to the client detecting that the application program is started and executed; based on the application program interface adaptation configure a strategy to perform application programming interface adaptation processing according to the first version information of the application programming interface and the second version information, the second version information is the driver of the at least one GPU hardware installed on the server version information of the API in the program; use the adapted API to access the at least one GPU hardware to execute the API instruction; and send the execution result of the API instruction to all described client. It should be understood that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1110 to perform the various operations and functions described above in connection with FIGS. 3-9 in various embodiments of this specification. According to one embodiment, a program product, eg, a non-transitory machine-readable medium, is provided. A non-transitory machine-readable medium may have instructions (ie, the above-described elements implemented in software) that, when executed by a machine, cause the machine to perform the various operations described above in connection with FIGS. 3-9 in various embodiments of this specification and function. Specifically, a system or device equipped with a readable storage medium may be provided, on which software code for implementing the functions of any of the above-described embodiments is stored, and a computer or device of the system or device may be provided. The processor reads and executes the instructions stored in the readable storage medium. In this case, the program code itself read from the readable medium can implement the functions of any one of the above-described embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute the present invention a part of. Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (eg, CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tapes , non-volatile memory cards and ROMs. Alternatively, the code can be downloaded from a server computer or cloud over a communication network. It should be understood by those skilled in the art that various variations and modifications may be made to the various embodiments disclosed above without departing from the spirit of the invention. Therefore, the protection scope of the present invention should be defined by the appended claims. It should be noted that not all steps and units in the above-mentioned processes and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of each step is not fixed and can be determined as required. The device structure described in the above embodiments may be a physical structure or a logical structure, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented by multiple physical entities. Some components in separate devices are implemented together. In the above embodiments, the hardware unit or module may be implemented mechanically or electrically. For example, a hardware unit, module or processor may include permanent dedicated circuits or logic (eg, dedicated processors, FPGAs or ASICs) to perform corresponding operations. The hardware unit or processor may also include programmable logic or circuits (such as a general-purpose processor or other programmable processors), which may be temporarily set by software to perform corresponding operations. The specific implementation (mechanical, or dedicated permanent circuit, or temporarily provided circuit) can be determined based on cost and time considerations. The detailed description set forth above in connection with the accompanying drawings describes exemplary embodiments, but does not represent all embodiments that may be implemented or fall within the scope of the claims. The term "exemplary" as used throughout this specification means "serving as an example, instance, or illustration" and does not mean "preferred" or "advantage" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, these techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments. The above description of the disclosure is provided to enable any person of ordinary skill in the art to make or use the disclosure. Various modifications to this disclosure will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other transform. Thus, the present disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

100:GPU系統架構 110:底層硬體 120:硬體驅動 130:框架層 140:應用層 300:GPU節點 310,310-1,310-2:客戶端 311,311-1,311-2:應用層 312,312-1,312-2:AI框架層 313,313-1,313-2:應用程式執行裝置 320:伺服端 321:應用程式執行裝置 322:硬體發現單元 323:適配策略創建單元 324:GPU執行資源隔離單元 325:指令優先級管理單元 326:GPU執行優化單元 330:GPU 340:硬體驅動 3131:版本信息獲取單元 3133:執行請求發送單元 3135:執行結果接收單元 3211:執行請求接收單元 3213:適配處理單元 3215:應用程式執行單元 3217:執行結果發送單元 710:步驟 720:步驟 730:步驟 740:步驟 750:步驟 900:計算集群 1000:計算設備 1010:處理器 1020:儲存器 1030:記憶體 1040:通信界面 1060:匯流排 1100:計算設備 1110:處理器 1120:儲存器 1130:記憶體 1140:通信界面 1160:匯流排100: GPU System Architecture 110: Bottom Hardware 120: Hard drive 130: Frame Layer 140: Application Layer 300: GPU node 310, 310-1, 310-2: Client 311, 311-1, 311-2: Application Layer 312, 312-1, 312-2: AI Framework Layer 313, 313-1, 313-2: Application Execution Device 320: Servo side 321: Application Execution Device 322: Hardware Discovery Unit 323: Adaptation strategy creation unit 324: GPU execution resource isolation unit 325: Instruction priority management unit 326: GPU execution optimization unit 330: GPU 340: Hard drive 3131: Version information acquisition unit 3133: Execute request sending unit 3135: Execution result receiving unit 3211: Execute request receiving unit 3213: Adaptation processing unit 3215: Application execution unit 3217: Execution result sending unit 710: Steps 720: Steps 730: Steps 740: Steps 750: Steps 900: Computing Cluster 1000: Computing Devices 1010: Processor 1020: Storage 1030: Memory 1040: Communication Interface 1060: Busbar 1100: Computing Devices 1110: Processor 1120: Storage 1130: Memory 1140: Communication interface 1160: Busbar

透過參照下面的附圖，可以實現對於本說明書的實施例內容的本質和優點的進一步理解。在附圖中，類似組件或特徵可以具有相同的附圖標記。 [圖1]示出了現有的應用程式執行架構的示意圖； [圖2]示出了應用程式在不同GPU硬體下的執行狀況示意圖； [圖3]示出了根據本說明書的實施例的用於執行應用程式的GPU節點的架構示意圖； [圖4]示出了根據本說明書的實施例的應用於客戶端的應用程式執行裝置的方塊圖； [圖5]示出了根據本說明書的實施例的應用於伺服端的應用程式執行裝置的結構示意圖； [圖6]示出了根據本說明書的實施例的API界面適配策略的示例示意圖； [圖7]示出根據本說明書的實施例的用於執行應用程式的方法的流程圖； [圖8A]示出了根據本說明書的實施例的應用程式執行系統架構部署在裸機環境的示意圖； [圖8B]示出了根據本說明書的實施例的應用程式執行系統架構部署在容器中的示意圖； [圖8C]示出了根據本說明書的實施例的應用程式執行系統架構部署在虛擬機環境的示意圖； [圖9]示出了根據本說明書的實施例的用於在GPU計算集群中部署新的GPU節點的示例示意圖； [圖10]示出了根據本說明書的實施例的應用於伺服端的用於執行應用程式的計算設備的方塊圖；和 [圖11]示出了根據本說明書的實施例的應用於客戶端的用於執行應用程式的計算設備的方塊圖。A further understanding of the nature and advantages of the content of the embodiments of this specification may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals. [Fig. 1] shows a schematic diagram of an existing application execution architecture; [Figure 2] shows a schematic diagram of the execution status of the application under different GPU hardware; [ FIG. 3 ] shows a schematic diagram of the architecture of a GPU node for executing an application according to an embodiment of the present specification; [ Fig. 4 ] shows a block diagram of an application execution device applied to a client according to an embodiment of the present specification; [ Fig. 5 ] shows a schematic structural diagram of an application program execution device applied to a server according to an embodiment of the present specification; [Fig. 6] shows an example schematic diagram of an API interface adaptation strategy according to an embodiment of the present specification; [ FIG. 7 ] A flowchart illustrating a method for executing an application according to an embodiment of the present specification; [ FIG. 8A ] is a schematic diagram illustrating the deployment of an application execution system architecture in a bare metal environment according to an embodiment of the present specification; [ FIG. 8B ] shows a schematic diagram of an application execution system architecture deployed in a container according to an embodiment of the present specification; [ Fig. 8C ] shows a schematic diagram of an application execution system architecture deployed in a virtual machine environment according to an embodiment of the present specification; [ FIG. 9 ] shows an example schematic diagram for deploying a new GPU node in a GPU computing cluster according to an embodiment of the present specification; [ FIG. 10 ] shows a block diagram of a computing device for executing an application program applied to a server side according to an embodiment of the present specification; and [ FIG. 11 ] A block diagram illustrating a computing device for executing an application program applied to a client according to an embodiment of the present specification.

300:GPU節點300: GPU node

310-1,310-2:客戶端310-1, 310-2: Client

311-1,311-2:應用層311-1, 311-2: Application Layer

312-1,312-2:AI框架層312-1, 312-2: AI Framework Layer

313-1,313-2:應用程式執行裝置313-1, 313-2: Application Execution Device

320:伺服端320: Servo side

321:應用程式執行裝置321: Application Execution Device

322:硬體發現單元322: Hardware Discovery Unit

323:適配策略創建單元323: Adaptation strategy creation unit

324:GPU執行資源隔離單元324: GPU execution resource isolation unit

325:指令優先級管理單元325: Instruction priority management unit

326:GPU執行優化單元326: GPU execution optimization unit

330:GPU330: GPU

340:硬體驅動340: Hard drive

Claims

An apparatus for executing an application program, the apparatus is applied to a server in a GPU node, and at least one GPU hardware is deployed in the GPU node, and the apparatus includes: The execution request receiving unit receives an application program interface instruction execution request from the client, the application program interface instruction execution request includes the first version information of the application program interface specified in the dynamic link library required for the execution of the application program, the first The version information is obtained in response to the client detecting that the application is activated; The adaptation processing unit, based on the application programming interface adaptation strategy, performs application programming interface adaptation processing according to the first version information and the second version information of the application programming interface, and the second version information is installed on the server. version information of the application programming interface in the driver of the at least one GPU hardware; an application execution unit that uses the adapted application programming interface to access the at least one GPU hardware to execute application programming interface instructions; and The execution result sending unit sends the execution result of the application programming interface instruction to the client.

The device of claim 1, further comprising: a hardware discovery unit to discover GPU hardware in the GPU node; and The adaptation strategy creation unit creates the application programming interface adaptation strategy based on the found application programming interface compatibility list of the GPU hardware.

The device of claim 1 or 2, further comprising: A GPU execution resource isolation unit that allocates isolation resources for the GPU hardware to execute the API instructions; and The instruction priority management unit manages the priority of the application program interface instruction to be executed on the GPU hardware.

The device of claim 3, further comprising: The GPU execution optimization unit performs optimization processing on the GPU hardware.

The apparatus according to claim 1, wherein the client and the server are located in the same device, and the communication between the client and the server is implemented by an inter-process communication mechanism.

The apparatus according to claim 1, wherein the client and the server are located in different devices, and the communication between the client and the server is implemented using a network protocol.

The apparatus of claim 1, wherein the client and the server are located in the same GPU node, or the client and the server are located in different GPU nodes.

The device of claim 1, wherein the application execution request includes application scheduling information, the application invocation information is used to specify a target GPU hardware that needs to be accessed when the application is executed, and the The target GPU hardware is part or all of the at least one GPU hardware.

A device for executing an application program, the device is applied to a client in a GPU node, wherein at least one GPU hardware is deployed in the GPU node, the device includes: a version information acquisition unit, in response to detecting that the application program is activated and executed, acquires the first version information of the application program interface specified in the dynamic link library required for the execution of the application program; An execution request sending unit sends an application programming interface instruction execution request to the server in the GPU node, where the application programming interface instruction execution request includes the first version information, so as to perform application programming interface adaptation on the server end processing and application execution processing; and The execution result receiving unit receives the execution result of the application programming interface instruction from the server, Wherein, the application program interface adaptation process is performed according to the first version information and the second version information of the application program interface based on the application program interface adaptation rule, and the application program execution process is performed using the adapted The processed API is used to access the at least one GPU hardware to execute the API instructions, and the second version information is in the driver of the at least one GPU hardware installed on the server Version information for the API.

A GPU node including: A server terminal, the server terminal includes the device according to any one of request items 1 to 8; at least one client, each client comprising the apparatus of claim 9; and At least one GPU hardware.

A method for executing an application program, the method is applied to a server in a GPU node, wherein at least one GPU hardware is deployed in the GPU node, the method includes: An application programming interface instruction execution request is received from the client, and the application programming interface instruction execution request includes the first version information of the application programming interface specified in the dynamic link library required for the execution of the application program, and the first version information is a response to the Acquired by the client when it detects that the application is activated and executed; Based on the application programming interface adaptation strategy, the application programming interface adaptation process is performed according to the first version information and the second version information of the application programming interface, and the second version information is the at least one version installed on the server. Version information of the API in the driver of the GPU hardware; using the adapted API to access the at least one GPU hardware to execute API instructions; and Sending the execution result of the API instruction to the client.

The method of claim 11, wherein the API adaptation policy is created based on an API compatibility list of the GPU hardware.

The method according to claim 11, wherein the client and the server are located in the same device, and the communication between the client and the server is implemented by using an inter-process communication mechanism.

The method according to claim 11, wherein the client and the server are located in different devices, and the communication between the client and the server is implemented using a network protocol.

A method for executing an application program, the method is applied to a client in a GPU node in which GPU hardware is deployed, the method comprising: In response to detecting that the application program is started to execute, obtain the first version information of the application program interface specified in the dynamic link library required for the execution of the application program; Sending an API command execution request to the server in the GPU node, the API command execution request including the first version information, so as to perform API adaptation processing and application execution on the server processing; and receiving an execution result of an API command from the server, Wherein, the application program interface adaptation process is performed according to the first version information and the second version information of the application program interface based on the application program interface adaptation rule, and the application program execution process is performed using the adapted The processed API is used to access the at least one GPU hardware to execute the API instructions, and the second version information is in the driver of the at least one GPU hardware installed on the server Version information for the API.

A computing device comprising: one or more processors, and storage coupled to the one or more processors, the storage stores instructions that, when executed by the one or more processors, cause the one or more processors to perform as requested The method of any one of 11 to 14.

A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 11-14.

A computing device comprising: one or more processors, and storage coupled to the one or more processors, the storage stores instructions that, when executed by the one or more processors, cause the one or more processors to perform as requested 15 of the method.

A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of claim 15.