WO2023035899A1

WO2023035899A1 - Model deployment method, platform and computer-readable medium

Info

Publication number: WO2023035899A1
Application number: PCT/CN2022/113305
Authority: WO
Inventors: 谢丽慧; 刘康
Original assignee: Siemens Ltd China
Current assignee: Siemens Ltd China
Priority date: 2021-09-10
Filing date: 2022-08-18
Publication date: 2023-03-16
Anticipated expiration: 2024-03-10
Also published as: CN115794363A

Abstract

The embodiments of the present invention relate to an edge computing technology, and in particular to a model deployment method, a platform, and a computer-readable medium. The method comprises: a Kubernetes platform (100) starting a first service (21) (S301); the first service (21) downloading a model (30) to a first Pod (51) (S302); the Kubernetes platform (100) starting a second service (22) (S303); the second service (22) notifying the Kubernetes platform (100) to start at least one second Pod (52) (S304); the Kubernetes platform (100) starting the at least one second Pod (52) (S305); and each second Pod (52) downloading the model (30) from the first Pod (51) (S306). In this way, the flow used by each Pod to download a model can be reduced.

Description

Method, platform and computer readable medium for deploying models

technical field

本发明实施例涉及边缘计算技术，尤其涉及一种部署模型的方法、平台和计算机可读介质。Embodiments of the present invention relate to edge computing technology, and in particular to a method, platform and computer-readable medium for deploying a model.

Background technique

诸如TensorFlow服务、KFServing等框架(framework)可用于无服务器AI部署模型的(在线服务)，它们大多基于Kubernetes集群平台来实现负载均衡，并可基于K8s进行自动容量调整。Frameworks such as TensorFlow service and KFServing can be used for serverless AI deployment models (online services). Most of them are based on the Kubernetes cluster platform to achieve load balancing, and can perform automatic capacity adjustment based on K8s.

在这些框架中进行模型存储时，如图1所示的Kubernetes平台100中，一个节点40中可启动多个Pod 50，每个进行推理的Pod 50需要分别在磁盘和内存中存储一个模型30的副本。假设模型30的大小为109MB，则每个Pod 50将使用109MB的磁盘存储空间从云或其他地方下载模30，并使用109M的内存存储空间从磁盘存储空间加载模型30，以进行有效的推理过程，这既浪费空间又浪费流量。When storing models in these frameworks, in the Kubernetes platform 100 shown in Figure 1, multiple Pods 50 can be started in one node 40, and each Pod 50 performing inference needs to store a model 30 in the disk and memory respectively. copy. Assuming that the size of the model 30 is 109MB, each Pod 50 will use 109MB of disk storage space to download the model 30 from the cloud or other places, and use 109M of memory storage space to load the model 30 from the disk storage space for efficient inference process , which wastes both space and traffic.

发明内容Contents of the invention

本发明实施例提供了一种部署模型的方法、装置和计算机可读介质，用以解决图1所示的平台中模型下载时浪费流量的问题。Embodiments of the present invention provide a method, device, and computer-readable medium for deploying a model, so as to solve the problem of wasting traffic when the model is downloaded in the platform shown in FIG. 1 .

第一方面，提供一种在Kubernetes平台上部署模型的方法，该方法中，Kubernetes平台启动第一服务；所述第一服务下载模型到所述第一Pod；所述Kubernetes平台启动第二服务；所述第二服务通知所述Kubernetes平台启动至少一个第二Pod；所述Kubernetes平台启动所述至少一个第二Pod；每一个第二Pod从所述第一Pod下载所述模型。In a first aspect, a method for deploying a model on a Kubernetes platform is provided. In the method, the Kubernetes platform starts a first service; the first service downloads a model to the first Pod; the Kubernetes platform starts a second service; The second service notifies the Kubernetes platform to start at least one second Pod; the Kubernetes platform starts the at least one second Pod; and each second Pod downloads the model from the first Pod.

第二方面，提供一种Kubernetes平台，包括：部署模块，被配置为启动第一服务；所述第一服务，被配置为下载模型到所述第一Pod；所述部署模块，还被配置为启动第二服务；所述第二服务，被配置通知所述Kubernetes平台启动至少一个第二Pod；所述部署模块，还被配置为启动所述至少一个第二Pod；每一个第二Pod，被配置为从所述第一Pod下载所述模型。In a second aspect, a Kubernetes platform is provided, including: a deployment module configured to start a first service; the first service configured to download a model to the first Pod; the deployment module configured to Start a second service; the second service is configured to notify the Kubernetes platform to start at least one second Pod; the deployment module is also configured to start the at least one second Pod; each second Pod is configured to configured to download the model from the first Pod.

第三方面，提供一种Kubernetes平台，包括：至少一个存储器，被配置为存储计算机可读代码；至少一个处理器，被配置为调用所述计算机可读代码，执行第一方面提供的方法中各步骤。In a third aspect, a Kubernetes platform is provided, including: at least one memory configured to store computer-readable codes; at least one processor configured to invoke the computer-readable codes to execute each of the methods provided in the first aspect step.

第四方面，一种计算机可读介质，所述计算机可读介质上存储有计算机可读指令，所述计算机可读指令在被处理器执行时，使所述处理器执行第一方面提供的方法中各步骤。In a fourth aspect, a computer-readable medium, where computer-readable instructions are stored on the computer-readable medium, and when the computer-readable instructions are executed by a processor, the processor executes the method provided in the first aspect each step.

其中，通过在Kubernetes平台上部署第一服务以及第一Pod，首先将模型下载到新部署的第一Pod上，各个部署模型的第二Pod再从第一Pod上下载模型，这样极大降低了各个Pod从云或其他外部空间下载模型所使用的流量。Among them, by deploying the first service and the first Pod on the Kubernetes platform, the model is first downloaded to the newly deployed first Pod, and the second Pod of each deployed model downloads the model from the first Pod, which greatly reduces the The traffic used by each Pod to download models from the cloud or other external spaces.

对于上述任一方面，可选地，所述第二服务在通知所述Kubernetes平台启动至少一个第二Pod，可依据接收到的请求部署所述模型的数量确定需要启动的所述第二Pod的数量；并通知所述Kubernetes平台按照确定的数量启动所述至少一个第二Pod。这样实现了Kubernetes平台自动调整容量的功能。For any of the above aspects, optionally, when the second service notifies the Kubernetes platform to start at least one second Pod, it can determine the number of the second Pod that needs to be started according to the quantity of the model deployed according to the received request. quantity; and notify the Kubernetes platform to start the at least one second Pod according to the determined quantity. In this way, the automatic capacity adjustment function of the Kubernetes platform is realized.

对于上述任一方面，可选地，每一个第二Pod从所述第一Pod下载所述模型时，可直接从所述第一Pod下载所述模型到对应的第二Pod的内存中，这样，无需先将模型下载到磁盘空间，再由磁盘空间下载到内存，进一步节省了Pod的存储空间。For any of the above aspects, optionally, when each second Pod downloads the model from the first Pod, it can directly download the model from the first Pod to the memory of the corresponding second Pod, so that , there is no need to download the model to the disk space first, and then download the model from the disk space to the memory, which further saves the storage space of the Pod.

对于上述任一方面，可选地，每一个第二Pod被配置为访问所述第一服务的IP地址。这样，实现了第二Pod在启动时不再从云上或外部资源处下载模型，而是直接通过第一服务从第一Pod处下载模型。For any aspect above, optionally, each second Pod is configured to access the IP address of the first service. In this way, it is realized that the second Pod no longer downloads the model from the cloud or external resources when it is started, but directly downloads the model from the first Pod through the first service.

Description of drawings

图1为现有技术中在Kubernetes平台上部署模型的方法示意图。FIG. 1 is a schematic diagram of a method for deploying a model on a Kubernetes platform in the prior art.

图2为本发明实施例提供的Kubernetes平台的结构示意图。FIG. 2 is a schematic structural diagram of a Kubernetes platform provided by an embodiment of the present invention.

图3为本发明实施例提供的部署模型的方法的流程图。Fig. 3 is a flowchart of a method for deploying a model provided by an embodiment of the present invention.

附图标记列表：List of reference signs:

100：Kubernetes平台100: Kubernetes Platform

30：模型 40：Kubernetes平台100中的节点30: Model 40: Nodes in the Kubernetes Platform 100

50：Kubernetes平台100中每个节点40中的Pod50: Pods in each node 40 in the Kubernetes platform 100

10：Kubernetes平台100中的部署模块10: Deployment Modules in the Kubernetes Platform 100

21：第一服务 22：第二服务21: First service 22: Second service

51：第一Pod 52：第二Pod51: First Pod 52: Second Pod

41、42、43：Kubernetes平台100中的节点41, 42, 43: Nodes in the Kubernetes platform 100

300：本发明实施例提供的在Kubernetes平台100上部署模型的方法300: The method for deploying a model on the Kubernetes platform 100 provided by the embodiment of the present invention

S301～S306：方法300中的步骤S301～S306: steps in the method 300

Detailed ways

现在将参考示例实施方式讨论本文描述的主题。应该理解，讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题，并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本发明实施例内容的保护范围的情况下，对所讨论的元素的功能和排列进行改变。各个示例可以根据需要，省略、替代或者添加各种过程或组件。例如，所描述的方法可以按照与所描述的顺序不同的顺序来执行，以及各个步骤可以被添加、省略或者组合。另外，相对一些示例所描述的特征在其它例子中也可以进行组合。The subject matter described herein will now be discussed with reference to example implementations. It should be understood that the discussion of these implementations is only to enable those skilled in the art to better understand and realize the subject matter described herein, and is not intended to limit the protection scope, applicability or examples set forth in the claims. Changes may be made in the function and arrangement of the elements discussed without departing from the scope of the embodiments of the invention. Various examples may omit, substitute, or add various procedures or components as needed. For example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples.

如本文中使用的，术语“包括”及其变型表示开放的术语，含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义，无论是明确的还是隐含的。除非上下文中明确地指明，否则一个术语的定义在整个说明书中是一致的。As used herein, the term "comprising" and its variants represent open terms meaning "including but not limited to". The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment." The terms "first", "second", etc. may refer to different or the same object. The following may include other definitions, either express or implied. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.

下面，结合图2和图3对本发明实施例进行详细说明。Below, the embodiment of the present invention will be described in detail with reference to FIG. 2 and FIG. 3 .

图2为本发明实施例提供的Kubernetes平台100的结构示意图。如图2所示，Kubernetes平台100，包括：FIG. 2 is a schematic structural diagram of a Kubernetes platform 100 provided by an embodiment of the present invention. As shown in Figure 2, the Kubernetes platform 100 includes:

-部署模块10(deployment)，被配置为启动第一服务21。- a deployment module 10 (deployment), configured to start the first service 21 .

在Kubernetes平台中，部署模块(deployment)可维护多个.yaml文件，每个.yaml文件对应一个服务(service)，定义了要启动的该服务所管理的Pod以及Pod中所包含容器(docker)的信息。Kubernetes平台启动后，读取.yaml文件，启动对应的服务。In the Kubernetes platform, the deployment module (deployment) can maintain multiple .yaml files, and each .yaml file corresponds to a service (service), which defines the Pod managed by the service to be started and the container (docker) contained in the Pod Information. After the Kubernetes platform starts, read the .yaml file and start the corresponding service.

-第一服务21，被配置为下载模型30到第一Pod51。- A first service 21 configured to download a model 30 to a first Pod 51 .

这里，第一服务21是本发明实施例中新定义的，其要实现的操作是从云中或其他外部资源中下载模型30，比如：一个AI模型。通常，AI模型可在云中的服务器或其他具有充足处理资源的服务器中进行训练，训练得到的模型可发送至边缘侧进行模型推断等处理，即部署在边缘侧。这里通过新增加的第一服务21可将模型30下载到新定义的Pod，即第一Pod51中，而其他Pod再从第一Pod51中下载模型30，这样可减少如图1所示的，云端到各Pod下载时所使用的流量。Here, the first service 21 is newly defined in the embodiment of the present invention, and its operation is to download the model 30 from the cloud or other external resources, such as an AI model. Usually, AI models can be trained on servers in the cloud or other servers with sufficient processing resources, and the trained models can be sent to the edge side for model inference and other processing, that is, deployed on the edge side. Here, through the newly added first service 21, the model 30 can be downloaded to the newly defined Pod, that is, the first Pod51, and other Pods can download the model 30 from the first Pod51. Traffic used when downloading to each Pod.

-部署模块10，还被配置为启动第二服务22。其中，启动第二服务22的原理与启动第二服务22的原理相同，这里不再赘述。- The deployment module 10 is also configured to start the second service 22 . Wherein, the principle of starting the second service 22 is the same as the principle of starting the second service 22, and will not be repeated here.

-第二服务22，被配置为通知部署模块10启动至少一个第二Pod52。- A second service 22 configured to notify the deployment module 10 to start at least one second Pod 52 .

-部署模块10，还被配置为启动至少一个第二Pod52。这里，每一个第二Pod52进行模型推断等各种部署模型的操作，被配置为从第一Pod51下载模型30。其中，可将每一个第二Pod52配置为访问第一服务的IP地址，从而实现第二Pod52从第一Pod51下载模型30。- a deployment module 10 further configured to start at least one second Pod 52 . Here, each second Pod 52 performs various model deployment operations such as model inference, and is configured to download the model 30 from the first Pod 51 . Wherein, each second Pod52 can be configured as an IP address for accessing the first service, so that the second Pod52 can download the model 30 from the first Pod51.

这里，可选地，第二服务22在通知部署模块10启动第二Pod52时，可依据接收到的请求部署模型30的数量确定需要启动的第二Pod52的数量；并通知部署模块10按照确定的数量启动至少一个第二Pod52。这样，实现了Kubernetes平台100自动调整容量的功能。Here, optionally, when the second service 22 notifies the deployment module 10 to start the second Pod 52, it can determine the number of the second Pod 52 to be started according to the number of the received request deployment model 30; and notify the deployment module 10 to follow the determined The number starts at least one second Pod 52 . In this way, the automatic capacity adjustment function of the Kubernetes platform 100 is realized.

可选地，每一个第二Pod52从第一Pod51下载模型30到对应的第二Pod52的内存中。与图1所示的每一个Pod下载两份模型30的副本分别到磁盘和内存相比，该可选实现方式可进一步节省Pod的存储空间。Optionally, each second Pod52 downloads the model 30 from the first Pod51 to the memory of the corresponding second Pod52. Compared with each Pod shown in FIG. 1 downloading two copies of the model 30 to the disk and memory respectively, this optional implementation can further save the storage space of the Pod.

图2所示的Kubernetes平台100中，第一服务21、第二服务22以及第一Pod51可位于Kubernetes平台100的同一个节点中，比如：第一节点41、第二节点42或第三节点43中。而各个第二Pod52可位于Kubernetes平台100的同一节点或不同节点中。In the Kubernetes platform 100 shown in Figure 2, the first service 21, the second service 22 and the first Pod 51 can be located in the same node of the Kubernetes platform 100, such as: the first node 41, the second node 42 or the third node 43 middle. And each second Pod 52 can be located in the same node or different nodes of the Kubernetes platform 100 .

图2所示的Kubernetes平台100可部署在一台计算机中，或多台计算机构成的计算机集群中，以执行本发明实施例中的部署模型的方法300。在硬件结构上，可包括至少一个存储器，其包括计算机可读介质，例如随机存取存储器(RAM)；还可包括与至少一个存储器耦合的至少一个处理器。计算机可执行指令存储在至少一个存储器中，并且当由至少一个处理器执行时，可以使至少一个处理器执行Kubernetes平台100所执行的各种操作。The Kubernetes platform 100 shown in FIG. 2 can be deployed in one computer, or in a computer cluster composed of multiple computers, so as to execute the method 300 for deploying the model in the embodiment of the present invention. In terms of hardware structure, it may include at least one memory, which includes a computer-readable medium, such as a random access memory (RAM); it may also include at least one processor coupled with the at least one memory. Computer-executable instructions are stored in at least one memory and, when executed by at least one processor, may cause the at least one processor to perform various operations performed by the Kubernetes platform 100 .

其中，至少一个处理器可以包括微处理器、专用集成电路(ASIC)、数字信号处理器(DSP)、中央处理单元(CPU)、图形处理单元(GPU)、状态机等。计算机可读介质的实施例包括但不限于软盘、CD-ROM、磁盘，存储器芯片、ROM、RAM、ASIC、配置的处理器、全光介质、所有磁带或其他磁性介质，或计算机处理器可以从中读取指令的任何其他介质。此外，各种其它形式的计算机可读介质可以向计算机发送或携带指令，包括路由器、专用或公用网络、或其它有线和无线传输设备或信道。指令可以包括任何计算机编程语言的代码，包括C、C++、C语言、Visual Basic、java和JavaScript。Wherein, at least one processor may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, and the like. Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, magnetic tape or other magnetic media, or from which a computer processor can Any other medium that reads instructions. In addition, various other forms of computer-readable media can transmit or carry instructions to the computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions may include code in any computer programming language, including C, C++, C++, Visual Basic, java, and JavaScript.

图3为本发明实施例提供的部署模型的方法的流程图。该方法300可由前述的Kubernetes平台100执行，可包括如下步骤：Fig. 3 is a flowchart of a method for deploying a model provided by an embodiment of the present invention. The method 300 can be executed by the aforementioned Kubernetes platform 100, and can include the following steps:

-S301：Kubernetes平台100启动第一服务21；-S301: the Kubernetes platform 100 starts the first service 21;

-S302：第一服务21下载模型30到第一Pod51；-S302: the first service 21 downloads the model 30 to the first Pod51;

-S303：Kubernetes平台100启动第二服务22；-S303: the Kubernetes platform 100 starts the second service 22;

-S304：第二服务22通知Kubernetes平台100启动至少一个第二Pod52；- S304: the second service 22 notifies the Kubernetes platform 100 to start at least one second Pod 52;

-S305：Kubernetes平台100启动至少一个第二Pod52；-S305: The Kubernetes platform 100 starts at least one second Pod52;

-S306：每一个第二Pod52从第一Pod51下载模型30。-S306: Each second Pod52 downloads the model 30 from the first Pod51.

可选地，在步骤S304中，第二服务22可依据接收到的请求部署模型30的数量确定需要启动的第二Pod52的数量，并通知Kubernetes平台100按照确定的数量启动至少一个第二Pod52。而在步骤S306中，每一个第二Pod52从第一Pod51下载模型30到对应的第二Pod52的内存中。Optionally, in step S304, the second service 22 may determine the number of second Pods 52 to be started according to the number of received request deployment models 30, and notify the Kubernetes platform 100 to start at least one second Pod 52 according to the determined number. In step S306, each second Pod52 downloads the model 30 from the first Pod51 to the memory of the corresponding second Pod52.

该方法的其他可选实现方式可参考前述Kubernetes平台100部分的描述，这里不再赘述。For other optional implementation manners of the method, reference may be made to the foregoing description of the Kubernetes platform 100 , which will not be repeated here.

此外，本发明实施例实施例还提供一种计算机可读介质，该计算机可读介质上存储有计算机可读指令，计算机可读指令在被处理器执行时，使处理器执行前述的部署模型的方法。计算机可读介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选地，可以由通信网络从服务器计算机上或云上下载计算机可读指令。In addition, the embodiments of the present invention also provide a computer-readable medium, on which computer-readable instructions are stored. When executed by a processor, the computer-readable instructions cause the processor to execute the aforementioned deployment model. method. Examples of computer readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory card and ROM. Alternatively, the computer readable instructions may be downloaded from a server computer or cloud by a communication network.

需要说明的是，上述各流程和各系统结构图中不是所有的步骤和模块都是必须的，可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的，可以根据需要进行调整。上述各实施例中描述的系统结构可以是物理结构，也可以是逻辑结构，即，有些模块可能由同一物理实体实现，或者，有些模块可能分由多个物理实体实现，或者，可以由多个独立设备中的某些部件共同实现。It should be noted that not all the steps and modules in the above processes and system structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as required. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be realized by the same physical entity, or some modules may be realized by multiple physical entities, or may be realized by multiple Certain components in individual devices are implemented together.

Claims

A method (300) for deploying a model on a Kubernetes platform, characterized in that it includes:

-Kubernetes platform (100) starts (S301) first service (21);

- said first service (21) downloads (S302) a model (30) to said first Pod (51);

- the Kubernetes platform (100) starts (S303) the second service (22);

- said second service (22) notifies (S304) said Kubernetes platform (100) to start at least one second Pod (52);

- the Kubernetes platform (100) starts (S305) the at least one second Pod (52);

- Each second Pod (52) downloads (S306) said model (30) from said first Pod (51).

The method according to claim 1, wherein the second service (22) notifies (S304) that the Kubernetes platform (100) starts at least one second Pod (52), including:

- the second service (22) determines the number of the second Pods (52) that need to be started according to the number of deployed models (30) received;

- The second service (22) notifies the Kubernetes platform (100) to start the at least one second Pod (52) according to the determined number.

The method according to claim 1, wherein each second Pod (52) downloads (S306) the model (30) from the first Pod (51), comprising:

- Each second Pod (52) downloads said model (30) from said first Pod (51) into the memory of the corresponding second Pod (52).

The method of claim 1, wherein each second Pod (52) is configured to access the IP address of the first service.

The method according to any one of claims 1 to 4, characterized in that,

- said first service (21), said second service (22) and said first Pod (51) are located in a first node (41) of said Kubernetes platform (100);

- said at least one second Pod (52) is located in the same node or a different node of said Kubernetes platform (100).

A kind of Kubernetes platform (100), is characterized in that, comprises:

- a deployment module (10) configured to start the first service (21);

- said first service (21), configured to download a model (30) to said first Pod (51);

- said deployment module (10), further configured to start a second service (22);

- said second service (22), configured to notify said deployment module (10) to start at least one second Pod (52);

- said deployment module (10), further configured to start said at least one second Pod (52);

- Each second Pod (52), configured to download said model (30) from said first Pod (51).

The Kubernetes platform according to claim 6, wherein said second service (22) is specifically configured as:

- determining the number of said second Pods (52) that need to be started according to the number of deployed models (30) received;

- Informing said deployment module (10) to start said at least one second Pod (52) by the determined number.

The Kubernetes platform according to claim 6, wherein each second Pod (52) is configured as:

- downloading said model (30) from said first Pod (51) into the memory of a corresponding second Pod (52).

The Kubernetes platform according to claim 6, wherein each of the second Pods (52) is configured to access the IP address of the first service.

The Kubernetes platform according to any one of claims 6 to 9, wherein,

A kind of Kubernetes platform (100), is characterized in that, comprises:

at least one memory configured to store computer readable code;

At least one processor configured to call the computer-readable code to execute the method according to any one of claims 1-5.

A computer-readable medium, characterized in that, computer-readable instructions are stored on the computer-readable medium, and when the computer-readable instructions are executed by a processor, the processor can perform the operation according to any one of claims 1-5. one of the methods described.