[go: up one dir, main page]

WO2025026280A1 - Model deployment method and system, and electronic device and computer-readable storage medium - Google Patents

Model deployment method and system, and electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2025026280A1
WO2025026280A1 PCT/CN2024/108233 CN2024108233W WO2025026280A1 WO 2025026280 A1 WO2025026280 A1 WO 2025026280A1 CN 2024108233 W CN2024108233 W CN 2024108233W WO 2025026280 A1 WO2025026280 A1 WO 2025026280A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
deployed
storage device
model file
storage devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/108233
Other languages
French (fr)
Chinese (zh)
Inventor
张欣
王凯
周躜
蔡寅翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Publication of WO2025026280A1 publication Critical patent/WO2025026280A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment

Definitions

  • the present disclosure relates to the field of large model technology and model deployment, and in particular, to a model deployment method, system, electronic device, and computer-readable storage medium.
  • the current large models store a large number of parameters and calculation graph structures, which makes the model files very large. It takes a long time to load the model when starting the service. In addition, due to factors such as region, network, and hardware, the loading time will be further extended when cross-region file transfer is involved, resulting in low model deployment efficiency.
  • the embodiments of the present disclosure provide a model deployment method, system, electronic device, and computer-readable storage medium to at least solve the technical problem of low efficiency in cross-regional deployment of models in related technologies.
  • a model deployment method including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • a model deployment method including: in response to receiving a model file of a model to be deployed, storing the model file in a central warehouse; in response to receiving a model distribution request, distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving a server deployment request, based on the server deployment request, mounting the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • a model deployment method including: obtaining a model file of a model to be deployed by calling a first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying an inference service of the model to be deployed, mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain a deployment result of the model to be deployed; outputting the deployment result by calling a second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.
  • a model deployment system including: a plurality of server clusters, where different server clusters are geographically deployed in different regions; a plurality of storage devices, where different storage devices are geographically deployed in different regions; a control device, connected to the storage device and the server cluster, for Distribute the model files of the model to be deployed to the storage device, and when deploying the inference service of the model to be deployed, mount the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed can be deployed to multiple server clusters.
  • an electronic device including: a memory storing an executable program; and a processor for running the program, wherein the program executes any one of the methods in the above embodiments when it is run.
  • a computer-readable storage medium including a stored executable program, wherein when the executable program is running, the device where the computer-readable storage medium is located is controlled to execute any one of the methods in the above embodiments.
  • a model file of a model to be deployed can be obtained; the model file can be distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the storage device can be mounted on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed can be deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to achieve high-speed interconnection between the storage device and the server cluster, eliminate the transmission of files across regions, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.
  • FIG1 is a schematic diagram of a hardware environment of a virtual reality device according to a model deployment method according to an embodiment of the present disclosure
  • FIG2 is a flow chart of a model deployment method according to Embodiment 1 of the present disclosure.
  • FIG3 is a structural diagram of a model deployment method according to an embodiment of the present disclosure.
  • FIG4 is a flow chart of another model deployment method according to an embodiment of the present disclosure.
  • FIG5 is a flow chart of another model deployment method according to an embodiment of the present disclosure.
  • FIG6 is a flow chart of a model deployment method according to Embodiment 2 of the present disclosure.
  • FIG7 is a flow chart of a model deployment method according to Embodiment 3 of the present disclosure.
  • FIG8 is a schematic diagram of a model deployment device according to Embodiment 4 of the present disclosure.
  • FIG9 is a schematic diagram of a model deployment device according to Embodiment 5 of the present disclosure.
  • FIG10 is a schematic diagram of a model deployment device according to Embodiment 6 of the present disclosure.
  • FIG11 is a schematic diagram of a model deployment system according to Embodiment 7 of the present disclosure.
  • FIG. 12 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure.
  • the technical solution provided by the present disclosure is mainly implemented by large-scale model technology.
  • the large model here refers to a deep learning model with large-scale model parameters, which can usually contain hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters.
  • the large model can also be called the foundation model/foundation model (Foundation Model).
  • the large model is pre-trained by large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters.
  • This model can adapt to a wide range of downstream tasks and has good generalization ability, such as large-scale language model (Large Language Model, LLM), multi-modal pre-training model (multi-modal pre-training model), etc.
  • LLM Large Language Model
  • multi-modal pre-training model multi-modal pre-training model
  • the pre-trained model can be fine-tuned through a small number of samples so that the large model can be applied to different tasks.
  • the large model can be widely used in natural language processing (NLP), computer vision and other fields, and can be specifically applied to computer vision tasks such as visual question answering (VQA), image caption (IC), image generation, etc. It can also be widely used in natural language processing tasks such as text-based sentiment classification, text summary generation, and machine translation. Therefore, the main application scenarios of the large model include but are not limited to digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc. In the embodiment of the present disclosure, the deployment of the large model in the model deployment scenario is used as an example for explanation.
  • NAS Network Attached Storage
  • Cloud Object Storage Service It can provide cloud storage services. Used to store and manage massive amounts of unstructured data;
  • VPC Virtual Private Cloud
  • EAS Online service platform
  • EAS can be a platform for deploying and managing machine learning models.
  • EAS provides users with a convenient way to deploy, run and manage their own machine learning algorithms and models, and provides scalable computing resources and high-performance computing capabilities;
  • Container orchestration platform (Kubernetes, abbreviated as K8S): can organize and manage containerized applications, helping users simplify application deployment, management, and expansion.
  • the present disclosure proposes a method based on multi-regional storage of storage devices, which deploys storage devices and services in the same virtual private network in the same region to achieve high-speed interconnection, eliminate cross-regional file transfer, thereby shortening the loading time of model files and further improving the elasticity of services.
  • a model deployment method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.
  • FIG. 1 is a schematic diagram of the hardware environment of a virtual reality device according to a model deployment method of an embodiment of the present disclosure.
  • the large model is deployed in a server 10, and the server 10 can be connected to one or more client devices 20 through a local area network connection, a wide area network connection, an Internet connection, or other types of networks.
  • the client device 20 here may include but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a car-mounted device, etc.
  • the client device 20 can interact with the user through a graphical user interface to implement the call to the large model, thereby implementing the method provided in the embodiment of the present disclosure.
  • the system composed of the client device and the server can perform the following steps: the client device generates a model file of the model to be deployed; the server obtains the model file of the model to be deployed and multiple server clusters; distributes the model file to multiple storage devices; mounts the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters. It should be noted that the embodiments of the present disclosure can be performed in the client device when the operating resources of the client device can meet the deployment and operation conditions of the large model.
  • Figure 2 is a flow chart of the model deployment method according to Embodiment 1 of the present disclosure. As shown in Figure 2, the method may include the following steps:
  • Step S202 obtaining the model file of the model to be deployed.
  • the above-mentioned model to be deployed can be any model to be deployed.
  • the model to be deployed can be a large model, a neural network Model, language processing model, image processing model, where a large model can refer to a machine learning or deep learning model with a large number of parameters and powerful computing power.
  • the model to be deployed is a large model as an example for explanation, and the model to be deployed can be determined according to actual needs.
  • the above-mentioned model file may be a model file of a trained model to be deployed, wherein the model file includes but is not limited to the parameters and structure of the model. This is only an example, and the specific model file may be defined according to actual usage.
  • a model file of a model to be deployed may be obtained, and a plurality of server clusters may be determined, wherein different server clusters are geographically deployed in different regions.
  • the above-mentioned multiple server clusters can be used to provide services for users.
  • the above-mentioned different regions can be different countries, different provinces, different cities, etc., and the different regions are not specifically limited here.
  • the model file can be stored in a cloud server.
  • the model file can be retrieved from the cloud server and distributed to storage devices in multiple different regions.
  • Step S204 distribute the model file to multiple storage devices.
  • the above storage device may be a hardware storage device, for example, a NAS, but is not limited thereto.
  • the model file can be distributed to multiple storage devices according to the distribution request initiated by the user.
  • the model management and control platform can retrieve the model file from the cloud server according to the distribution request initiated by the user, and distribute and store the model file to multiple storage devices.
  • Step S206 mounting the storage device to a target server cluster in the multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters.
  • the above-mentioned mounting refers to connecting the storage device to the server of the target server cluster so that the model files in the storage device can be visible and accessible on the servers of the target server cluster.
  • the storage device can be mounted to a target server cluster deployed in the same region as the storage device, so that users can quickly access the model file through the mounted storage device on the target server cluster, thereby more efficiently using the services provided by the model.
  • the model file of the model to be deployed can be obtained; the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the storage device is mounted on a target server cluster in multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to achieve high-speed interconnection between the storage device and the server cluster, eliminate cross-regional file transmission, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.
  • the storage device includes: a network attached storage, wherein the model file is distributed to The plurality of storage devices comprises: sending the model file to the plurality of network attached storages through a public network and storing the model file in the plurality of network attached storages.
  • the above-mentioned network attached storage can be a storage device that provides file sharing services through the network and has other additional functions. Among them, the network attached storage can also provide users with a convenient and reliable way to store and share data.
  • the above-mentioned public network may be an open, shared and widely covered network.
  • the model file can be sent to multiple network attached storages through the public network so that the multiple network attached storages can receive the model file. After receiving the model file, the multiple network attached storages can store the model file.
  • the model file in order to improve the security during the storage of the model file, can be encrypted before sending the model file to multiple network attached storages through a public network to obtain an encrypted model file, and then the encrypted model file can be sent to the multiple network attached storages through the public network. After receiving the encrypted model file, the multiple network attached storages can decrypt the encrypted model file to obtain the model file and store the model file.
  • the storage device when deploying the inference service of the model to be deployed, the storage device is mounted to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, including: determining the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed; obtaining the server cluster corresponding to the target virtual private network to obtain the target server cluster, wherein different server clusters correspond to different virtual private networks; when deploying the inference service of the model to be deployed, the storage device is mounted to a server in the target server cluster.
  • the above-mentioned target region can be determined according to the geographical location where the storage device is deployed, or according to the country, province, etc. where the storage device is deployed.
  • the method for determining the target region is not limited here, and the method for determining the target region for storage device deployment can be set according to actual conditions.
  • the target virtual private network (Virtual Private Cloud, referred to as VPC) can provide an isolated and secure way to deploy and manage resources, and can also increase the transmission speed between files through the virtual private network.
  • the target virtual private network can be a virtual network environment created in a cloud computing environment.
  • the target virtual private network corresponding to the storage device can be determined based on the target region where the storage device is deployed, the server cluster connected to the target virtual private network can be obtained, and one or more server clusters can be selected from the server clusters connected to the target virtual private network as the target server cluster; optionally, the server cluster that is geographically closest to the storage device can be selected from the server clusters connected to the target virtual private network as the above-mentioned target server cluster.
  • the method of determining the target server cluster here is only for example description and is not specifically limited. It can be set according to actual conditions.
  • the server cluster in the target region may be searched according to the target virtual private network, and one or more of the server clusters in the target region may be searched. Multiple server clusters are used as the target server cluster.
  • the server cluster closest to the storage device can be used as the target server cluster. The specific method for determining the target server cluster is not limited here.
  • the storage device when deploying the inference service of the model to be deployed, can be connected to the server in the target server cluster so that the model file in the storage device can be read on the server through the intranet.
  • the method after mounting the storage device to a target server cluster in multiple server clusters and deploying the storage device in the same region, the method also includes: building an elastic scheduling cluster of the target server cluster; based on the elastic scheduling cluster, determining the computing resources required to deploy the model to be deployed.
  • the above-mentioned elastic scheduling cluster is a cluster management method based on cloud computing and virtualization technology, which can automatically adjust the computing resources in the cluster according to changes in the system load to meet different needs.
  • the elastic scheduling cluster can adjust the cluster size according to the load situation, including adding or reducing computing nodes to cope with high or low load situations.
  • This cluster management method can improve the flexibility and availability of the system, while also saving resources and costs.
  • the above-mentioned elastic scaling capability can automatically adjust the resources and capacity of the target service cluster according to real-time demand and load conditions to meet user needs; the elastic scaling capability can automatically increase or decrease the number of server instances in the target server cluster according to different situations to achieve load balancing and high availability.
  • an elastic scheduling cluster can be built based on an online service platform (Elastic Algorithm Service, referred to as EAS) or a container orchestration platform (Kubernetes, referred to as K8S).
  • EAS Elastic Algorithm Service
  • K8S container orchestration platform
  • the elastic scheduling cluster can provide elastic scaling capabilities for the target service cluster so that the target service cluster can automatically adjust system resources and capacity to meet user needs based on actual demand and load conditions.
  • the number of servers when the load of the target server cluster is low, the number of servers can be reduced to save costs, and when the load of the target server cluster is high, the number of servers can be increased to improve performance and capacity; optionally, the number of servers in the target server cluster can be managed according to pre-set rules, and the number of servers in the target server cluster can also be dynamically adjusted based on real-time monitoring and analysis.
  • an elastic scheduling cluster of the target server cluster after building an elastic scheduling cluster of the target server cluster, it is necessary to determine the computing resources required to deploy the model to be deployed, such as memory, storage and other resources, and configure and allocate them according to the computing requirements of the model. By determining the computing resources required for deployment, it can be ensured that the model to be deployed can run normally on the elastic scheduling cluster and meet the required performance and availability requirements. In addition, sufficient computing resources can be provided by building an elastic scheduling cluster to support the deployment and operation of the model to be deployed.
  • the method also includes: distributing preset resources to multiple server clusters; distributing model files to multiple storage devices, including: distributing model files to multiple storage devices when the preset resources are distributed.
  • the above-mentioned preset resources can be various resources shared and allocated in the server cluster, wherein the preset resources can be computing resources, network resources, storage resources, software resources, load balancing resources, and the examples are described here only.
  • the specific preset resources can be set according to the actual situation.
  • the above-mentioned preset resources may be pre-set resource types and resource quantities, and the preset resources are not limited here.
  • the above-mentioned preset resources can also determine the preset resources according to the computing resources required for the deployment of the model to be deployed, and distribute the preset resources to multiple server clusters, so as to facilitate the subsequent deployment of the model to be deployed through multiple server clusters.
  • the model management and control platform can be used to provide preset resources, including but not limited to managing cluster metadata, service deployment, automatic/scheduled scalers, monitoring data, model distribution, monitoring resource usage, and grayscale/rollback.
  • the preset resources can be distributed to multiple server clusters, and after the preset resources are distributed, the model file can be distributed to multiple storage devices, so that the multiple storage devices can subsequently use the preset resources of the target server in the server cluster to deploy the model to be deployed according to the model file.
  • obtaining the model file of the model to be deployed includes: obtaining the model file of the model to be deployed from a central warehouse, wherein the model file is pre-uploaded to the central warehouse.
  • the central warehouse mentioned above can be a cloud object storage (Object Storage Service, referred to as OSS), but is not limited to this, and is only used as an example. Among them, the central warehouse can be used to provide cloud storage services and can be used to store and manage massive unstructured data.
  • OSS Object Storage Service
  • the above-mentioned central warehouse can communicate with the server cluster and transfer files with the storage devices on the server cluster. According to the user's model deployment instructions, the central warehouse can automatically distribute the model files of the model to be deployed to multiple storage devices mounted on multiple server clusters.
  • the model file of the model to be deployed can be pre-uploaded to the central warehouse.
  • the model file corresponding to the model to be deployed can be retrieved from the central warehouse according to the model deployment instruction, and the model file can be distributed to multiple storage devices.
  • the model to be deployed is a large model.
  • the aforementioned large model may be a model with a large storage capacity and a strong computing capability.
  • FIG3 is a structural diagram of a model deployment method according to an embodiment of the present disclosure.
  • the model management and control platform can store the model files of the model to be deployed in the central warehouse, wherein the model management and control platform can be used to provide preset resources, for example, to manage cluster metadata, service deployment, auto/cron scaler, monitoring data, model distribute, monitor resource usage, gray/rollback, and manage multiple accounts.
  • the preset resources can be distributed to multiple server clusters.
  • the model files can be distributed to multiple storage devices.
  • region A includes server cluster A, server cluster B, and network attached storage
  • region B includes server cluster A, server cluster B, and network attached storage
  • region C includes server cluster A, server cluster B, and network attached storage. It includes server cluster A, server cluster B, and network attached storage.
  • the network attached storage in different regions can be mounted on the servers in the server cluster, so that the agents in the server cluster can store resources for model files in the network attached storage, and the servers in the server cluster can load resources for model files in the network attached storage.
  • An elastic scheduling cluster can be built based on the online service platform or container orchestration platform, and elastic scaling capabilities can be provided for the target service cluster based on the elastic scheduling cluster, so that the target service cluster can automatically adjust system resources and capacity to meet user needs based on actual needs and load conditions.
  • the present disclosure may include two model deployment scenarios, wherein the first model deployment scenario may be a scenario where models, programs, data, and images are deployed separately.
  • FIG4 is a flowchart of another model deployment method according to an embodiment of the present disclosure. As shown in FIG4, the method includes:
  • Step S401 distributing preset resources to multiple server clusters
  • the model management and control platform in Figure 3 can be used to provide the above-mentioned preset resources.
  • Step S402 when the preset resources are distributed, the model file is distributed to multiple storage devices;
  • an error prompt is returned until the preset resources are distributed completely, and then the step of distributing the model file to multiple storage devices is executed.
  • Step S403 determining a target virtual private network corresponding to the storage device based on the target region where the storage device is deployed;
  • Step S404 obtaining a server cluster corresponding to the target virtual private network, and obtaining a target server cluster;
  • Step S405 when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.
  • the second model deployment scenario may be a mirror deployment scenario. In this process, there is no need to execute the resource distribution process, and the service deployment can be performed directly.
  • FIG5 is a flow chart of another model deployment method according to an embodiment of the present disclosure. As shown in FIG5, the method includes:
  • Step S501 obtaining a model file of a model to be deployed
  • Step S502 distributing the model file to multiple storage devices
  • Step S503 determining a target virtual private network corresponding to the storage device based on the target region where the storage device is deployed;
  • Step S504 obtaining a server cluster corresponding to the target virtual private network, and obtaining a target server cluster;
  • Step S505 when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.
  • This disclosure mainly uses NAS mount model files.
  • the OSS mount model file and the NAS mount model file can be compared using a 13B parameter GPT model (25GB size):
  • the present disclosure aims to provide a system for improving the multi-regional and large-scale elasticity of large model services.
  • the system is based on NAS storage to achieve the distribution of model files in multiple regions, and automatically mounts the NAS in the region to load the model files when deploying the inference service. Since the NAS and the inference service are in the same region VPC, high-speed interconnection can be achieved, thereby shortening the model loading time, speeding up the service startup speed, and improving elasticity.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the technical solution of the present disclosure, or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • a storage medium such as ROM/RAM, a disk, or an optical disk
  • a terminal device which can be a mobile phone, a computer, a server, or a network device, etc.
  • a model deployment method is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although the logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.
  • FIG. 6 is a flow chart of a model deployment method according to Embodiment 2 of the present disclosure. As shown in FIG. 6 , the method The steps include:
  • Step S602 in response to receiving the model file of the model to be deployed, storing the model file in the central warehouse;
  • Step S604 in response to receiving the model distribution request, distributing the model file to multiple storage devices;
  • the above-mentioned model distribution request may be a request for distributing the model to a storage device in advance when the model to be deployed is deployed remotely.
  • the above-mentioned model distribution request may be generated by touching the interactive interface.
  • the specific method for generating the above-mentioned model distribution request is not limited here.
  • Step S606 in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • the above-mentioned server deployment request may be a request generated when a user needs to deploy a model to be deployed.
  • the above-mentioned server deployment request may be generated by touching an interactive interface.
  • the specific method of generating the above-mentioned server deployment request is not limited here.
  • the model file in response to receiving the model file of the model to be deployed, the model file is stored in the central warehouse; in response to receiving the model distribution request, the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted to the target server cluster deployed in the same region as the storage device in the multiple server clusters, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to the storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize the high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related
  • a model deployment method is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although the logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.
  • FIG. 7 is a flow chart of a model deployment method according to Embodiment 3 of the present disclosure. As shown in FIG. 7 , the method includes the following steps:
  • Step S702 Acquire the model file of the model to be deployed by calling the first interface.
  • the first interface includes a first parameter, and a parameter value of the first parameter is a model file.
  • the first interface in the above steps can be an interface for data interaction between the cloud server and the client.
  • the client can pass the model file of the model to be deployed into the interface function as the first parameter of the interface function to achieve the purpose of uploading the model file of the model to be deployed to the cloud server.
  • Step S704 distribute the model file to multiple storage devices.
  • Step S706 when deploying the inference service of the model to be deployed, the storage device is mounted to the target server cluster in the multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters to obtain the deployment result of the model to be deployed.
  • Step S708 output the deployment result by calling the second interface.
  • the second interface includes a second parameter, and a parameter value of the second parameter is a deployment result.
  • the above-mentioned second interface can be an interface for data interaction between the cloud server and the client.
  • the cloud server can pass the deployment result to the interface function as the second parameter of the interface function to achieve the purpose of sending the deployment result to the client.
  • the model file of the model to be deployed is obtained by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, the storage device is mounted to a target server cluster in the same region as the storage device in multiple server clusters, so that the model to be deployed is deployed to multiple server clusters, and the deployment result of the model to be deployed is obtained; the deployment result is output by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to the storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the
  • FIG. 8 is a schematic diagram of a model deployment device according to embodiment 4 of the present disclosure.
  • the device 800 includes: an acquisition module 802, a distribution module 804, and a mounting module 806.
  • the acquisition module is configured to acquire the model file of the model to be deployed;
  • the distribution module is configured to distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions;
  • the mounting module is configured to mount the storage device to multiple server clusters deployed in the same region as the storage device.
  • the model to be deployed can be deployed to multiple server clusters.
  • the acquisition module 802, the distribution module 804, and the mounting module 806 correspond to steps S202 to S206 in Example 1, and the three modules and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1.
  • the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned modules may also be part of the device and may run in the server 10 provided in Example 1.
  • the storage device includes: a network attached storage, wherein the distribution module is further used to send the model file to multiple network attached storages through a public network and store the model file in the multiple network attached storages.
  • the mounting module is also used to determine the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed, obtain the server cluster corresponding to the target virtual private network, and obtain the target server cluster, wherein different server clusters correspond to different virtual private networks, and when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.
  • the mounting module is also used to determine the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed, obtain the server cluster corresponding to the target virtual private network, and obtain the target server cluster, wherein different server clusters correspond to different virtual private networks.
  • the device further includes: a building module and a providing module.
  • the construction module is configured to construct an elastic scheduling cluster;
  • the provision module is configured to provide elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.
  • the distribution module is further used to distribute the preset resources to multiple server clusters; the distribution module is further used to distribute the model file to multiple storage devices when the preset resources are distributed.
  • the acquisition module is further used to acquire the model file of the model to be deployed from the central warehouse, wherein the model file is uploaded to the central warehouse in advance.
  • the model to be deployed is a large model.
  • FIG. 9 is a schematic diagram of a model deployment device according to embodiment 5 of the present disclosure.
  • the device 900 includes: a storage module 902, a distribution module 904, and a mounting module 906.
  • the storage module is configured to store the model file of the model to be deployed in the central warehouse in response to receiving the model file;
  • the distribution module is configured to distribute the model file stored in the central warehouse to multiple storage devices in response to receiving the model distribution request, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions;
  • the mounting module is configured to mount the storage device to multiple server clusters deployed in the same region as the storage device based on the server deployment request in response to receiving the server deployment request.
  • the target server clusters in the region can deploy the model to multiple server clusters.
  • the storage module 902, distribution module 904, and mounting module 906 correspond to steps S602 to S606 in Example 2, and the three modules and corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in Example 1.
  • the modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the modules may also be part of a device and run in the server 10 provided in Example 1.
  • FIG. 10 is a schematic diagram of a model deployment device according to embodiment 6 of the present disclosure.
  • the device 1000 includes: an acquisition module 1002, a distribution module 1004, a mounting module 1006, and an output module 1008.
  • the acquisition module is configured to obtain the model file of the model to be deployed by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file;
  • the distribution module is configured to distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions;
  • the mounting module is configured to mount the storage device to a target server cluster deployed in the same region as the storage device in multiple server clusters when deploying the inference service of the model to be deployed, so that the model to be deployed is deployed to multiple server clusters to obtain the deployment result of the model to be deployed;
  • the output module is configured to output the deployment result by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.
  • the acquisition module 1002, distribution module 1004, mounting module 1006, and output module 1008 correspond to steps S702 to S708 in Example 3, and the four modules and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1.
  • the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned modules may also be part of the device and may be run in the server 10 provided in Example 1.
  • FIG. 11 is a schematic diagram of a model deployment system according to Embodiment 7 of the present disclosure. As shown in FIG. 11 , the system 1100 includes:
  • the control device 1106 is connected to the storage device and the server cluster, and is used to distribute the model files of the model to be deployed to the storage device, and mount the storage device to the target server cluster deployed in the same region as the storage device. To deploy the model to be deployed to multiple server clusters.
  • the storage device 1104 includes: a network attached storage, which is connected to the control device through a public network.
  • the target server cluster is connected to the storage device via a target virtual private network.
  • the system further includes: an elastic scheduling cluster, which is used to provide elastic scaling capabilities for the target server cluster.
  • the embodiment of the present disclosure may provide an electronic device, which may be any electronic device in a group of electronic devices.
  • the electronic device may also be replaced by a terminal device such as a mobile terminal.
  • the electronic device may be located in at least one network device among a plurality of network devices of a computer network.
  • the above-mentioned electronic device can execute the program code of the following steps in the model deployment method: obtain the model file of the model to be deployed, and multiple server clusters, wherein different server clusters are geographically deployed in different regions; distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mount the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • Figure 12 is a block diagram of a computer terminal according to an embodiment of the present disclosure.
  • the computer terminal A may include: one or more (only one is shown in the figure) processors 102, a memory 104, a storage controller, and a peripheral interface, wherein the peripheral interface is connected to a radio frequency module, an audio module, and a display.
  • the memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the model deployment method and device in the embodiment of the present disclosure.
  • the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the above-mentioned model deployment method is realized.
  • the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory may further include a memory remotely arranged relative to the processor, and these remote memories can be connected to the terminal A via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: obtain the model file of the model to be deployed; distribute the model file to multiple storage devices, where different storage devices are geographically deployed in different regions; mount the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • the processor may also execute the following steps: Sent to and stored in a plurality of network attached storages.
  • the processor may also execute program codes of the following steps: based on the target region where the storage device is deployed, determining a target server cluster deployed in the target region; and mounting the storage device to a server in the target server cluster.
  • the processor may also execute the program code of the following steps: based on the target region where the storage device is deployed, determine the target virtual private network corresponding to the storage device; obtain the server cluster corresponding to the target virtual private network to obtain the target server cluster, wherein different server clusters correspond to different virtual private networks; and when deploying the inference service of the model to be deployed, mount the storage device to the server in the target server cluster.
  • the processor may also execute program codes of the following steps: constructing an elastic scheduling cluster; and providing elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.
  • the processor may also execute program codes of the following steps: distributing preset resources to multiple server clusters; and distributing the model file to multiple storage devices when the preset resources have been distributed.
  • the processor may also execute the program code of the following steps: obtaining a model file of the model to be deployed from the central warehouse, wherein the model file is uploaded to the central warehouse in advance.
  • the processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: in response to receiving the model file of the model to be deployed, the model file is stored in the central warehouse; in response to receiving the model distribution request, the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted on the target server cluster deployed in the same region as the storage device in the multiple server clusters, so that the model to be deployed is deployed to multiple server clusters.
  • the processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: obtain the model file of the model to be deployed by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, mount the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain the deployment result of the model to be deployed; output the deployment result by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.
  • the disclosed embodiment provides a model deployment method, including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device to multiple server clusters and a target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving
  • the high model processing efficiency in different regions solves the technical problem of low efficiency of cross-regional deployment of models in related technologies.
  • the structure shown in FIG. 12 is for illustration only, and the computer terminal may also be a terminal device such as a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a PDA, and a mobile Internet device (Mobile Internet Devices, MID), PAD, etc.
  • FIG. 12 does not limit the structure of the above-mentioned electronic device.
  • the computer terminal A may also include more or fewer components (such as a network interface, a display device, etc.) than those shown in FIG. 12, or may have a configuration different from that shown in FIG. 12.
  • a person of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by instructing the hardware related to the terminal device through a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.
  • the embodiment of the present disclosure also provides a computer-readable storage medium.
  • the storage medium can be used to store the program code executed by the model deployment method provided in the first embodiment.
  • the above storage medium may be located in any computer terminal in a computer terminal group in a computer network, or in any mobile terminal in a mobile terminal group.
  • the storage medium is configured to store program code for performing the following steps: obtaining a model file of a model to be deployed, wherein different server clusters are geographically deployed in different regions; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • the storage medium is further configured to store program codes for executing the following steps: sending the model file to a plurality of network attached storages via a public network, and storing the model file in the plurality of network attached storages.
  • the storage medium is also configured to store program codes for executing the following steps: determining a target virtual private network corresponding to the storage device based on a target region for storage device deployment; obtaining a server cluster corresponding to the target virtual private network to obtain a target server cluster, wherein different server clusters correspond to different virtual private networks; and mounting the storage device on a server in the target server cluster when deploying an inference service for the model to be deployed.
  • the storage medium is further configured to store program codes for executing the following steps: constructing an elastic scheduling cluster; and providing elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.
  • the storage medium is further configured to store program codes for executing the following steps: distributing preset resources to multiple server clusters; and distributing the model file to multiple storage devices when the preset resources have been distributed.
  • the storage medium is further configured to store program codes for executing the following steps:
  • the model file of the model to be deployed is obtained from the library, where the model file is uploaded to the central warehouse in advance.
  • the storage medium is configured to store program codes for executing the following steps: in response to receiving a model file of a model to be deployed, storing the model file in a central warehouse; in response to receiving a model distribution request, distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving a server deployment request, based on the server deployment request, mounting the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.
  • the storage medium is configured to store program code for executing the following steps: obtaining a model file of the model to be deployed by calling a first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain a deployment result of the model to be deployed; outputting the deployment result by calling a second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.
  • a model deployment method including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device to a target server cluster in multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed to On multiple network units, some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present disclosure.
  • the aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to large model technology and the field of model deployment. Disclosed are a model deployment method and system, and an electronic device and a computer-readable storage medium. The method comprises: acquiring a model file of a model to be deployed; distributing the model file to a plurality of storage devices, wherein different storage devices are geographically deployed in different regions; and mounting the storage devices to a target server cluster, which is deployed in the same region as the storage devices, among a plurality of server clusters, and thus the model to be deployed is deployed to the plurality of server clusters. The present disclosure solves the technical problem in the related art of the efficiency of cross-region deployment of a model being relatively low.

Description

模型部署方法、系统、电子设备以及计算机可读存储介质Model deployment method, system, electronic device, and computer-readable storage medium 技术领域Technical Field

本公开涉及大模型技术、模型部署领域,具体而言,涉及一种模型部署方法、系统、电子设备以及计算机可读存储介质。The present disclosure relates to the field of large model technology and model deployment, and in particular, to a model deployment method, system, electronic device, and computer-readable storage medium.

背景技术Background Art

目前的大模型会存储大量的参数和计算图结构,导致模型的模型文件非常大,在启动服务时加载模型需要花费很长时间,并且受限于地域、网络、硬件等因素,在涉及到跨地域文件传输的场景时会进一步延长加载时间,从而导致模型的部署效率较低。The current large models store a large number of parameters and calculation graph structures, which makes the model files very large. It takes a long time to load the model when starting the service. In addition, due to factors such as region, network, and hardware, the loading time will be further extended when cross-region file transfer is involved, resulting in low model deployment efficiency.

针对上述的问题,目前尚未提出有效的解决方案。To address the above-mentioned problems, no effective solution has been proposed yet.

发明内容Summary of the invention

本公开实施例提供了一种模型部署方法、系统、电子设备以及计算机可读存储介质,以至少解决相关技术中模型跨地域部署的效率较低的技术问题。The embodiments of the present disclosure provide a model deployment method, system, electronic device, and computer-readable storage medium to at least solve the technical problem of low efficiency in cross-regional deployment of models in related technologies.

根据本公开实施例的一个方面,提供了一种模型部署方法,包括:获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。According to one aspect of an embodiment of the present disclosure, a model deployment method is provided, including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

根据本公开实施例的另一方面,还提供了一种模型部署方法,包括:响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;响应于接收到模型分发请求,将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群;响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。According to another aspect of an embodiment of the present disclosure, a model deployment method is also provided, including: in response to receiving a model file of a model to be deployed, storing the model file in a central warehouse; in response to receiving a model distribution request, distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving a server deployment request, based on the server deployment request, mounting the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

根据本公开实施例的另一方面,还提供了一种模型部署方法,包括:通过调用第一接口获取待部署模型的模型文件,其中,第一接口包括第一参数,第一参数的参数值为模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果;通过调用第二接口输出部署结果,其中,第二接口包括第二参数,第二参数的参数值为部署结果。According to another aspect of an embodiment of the present disclosure, a model deployment method is also provided, including: obtaining a model file of a model to be deployed by calling a first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying an inference service of the model to be deployed, mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain a deployment result of the model to be deployed; outputting the deployment result by calling a second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.

根据本公开实施例的另一方面,还提供了一种模型部署系统,包括:多个服务器集群,不同服务器集群在地理位置上部署在不同地域内;多个存储设备,不同存储设备在地理位置上部署在不同地域内;控制设备,与存储设备和服务器集群连接,用于 将待部署模型的模型文件分发至存储设备,并在部署待部署模型的推理服务时,将存储设备挂载至与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。According to another aspect of the embodiment of the present disclosure, a model deployment system is also provided, including: a plurality of server clusters, where different server clusters are geographically deployed in different regions; a plurality of storage devices, where different storage devices are geographically deployed in different regions; a control device, connected to the storage device and the server cluster, for Distribute the model files of the model to be deployed to the storage device, and when deploying the inference service of the model to be deployed, mount the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed can be deployed to multiple server clusters.

根据本公开实施例的另一方面,还提供了一种电子设备,包括:存储器,存储有可执行程序;处理器,用于运行程序,其中,程序运行时执行上述实施例中任意一项的方法。According to another aspect of the embodiments of the present disclosure, there is further provided an electronic device, including: a memory storing an executable program; and a processor for running the program, wherein the program executes any one of the methods in the above embodiments when it is run.

根据本公开实施例的另一方面,还提供了一种计算机可读存储介质,计算机可读存储介质包括存储的可执行程序,其中,在可执行程序运行时控制计算机可读存储介质所在设备执行上述实施例中任意一项的方法。According to another aspect of the embodiments of the present disclosure, a computer-readable storage medium is further provided, the computer-readable storage medium including a stored executable program, wherein when the executable program is running, the device where the computer-readable storage medium is located is controlled to execute any one of the methods in the above embodiments.

在本公开实施例中,可以获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。In the embodiments of the present disclosure, a model file of a model to be deployed can be obtained; the model file can be distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the storage device can be mounted on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed can be deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to achieve high-speed interconnection between the storage device and the server cluster, eliminate the transmission of files across regions, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.

容易注意到的是,上面的通用描述和后面的详细描述仅仅是为了对本公开进行举例和解释,并不构成对本公开的限定。It is easily noted that the above general description and the following detailed description are merely for exemplifying and explaining the present disclosure, and do not constitute a limitation of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present disclosure. The illustrative embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation on the present disclosure. In the drawings:

图1是根据本公开实施例的一种模型部署方法的虚拟现实设备的硬件环境的示意图;FIG1 is a schematic diagram of a hardware environment of a virtual reality device according to a model deployment method according to an embodiment of the present disclosure;

图2是根据本公开实施例1的模型部署方法的流程图;FIG2 is a flow chart of a model deployment method according to Embodiment 1 of the present disclosure;

图3是根据本公开实施例的一种模型部署方法的结构图;FIG3 is a structural diagram of a model deployment method according to an embodiment of the present disclosure;

图4是根据本公开实施例的另一种模型部署方法的流程图;FIG4 is a flow chart of another model deployment method according to an embodiment of the present disclosure;

图5是根据本公开实施例的另一种模型部署方法的流程图;FIG5 is a flow chart of another model deployment method according to an embodiment of the present disclosure;

图6是根据本公开实施例2的一种模型部署方法的流程图;FIG6 is a flow chart of a model deployment method according to Embodiment 2 of the present disclosure;

图7是根据本公开实施例3的一种模型部署方法的流程图;FIG7 is a flow chart of a model deployment method according to Embodiment 3 of the present disclosure;

图8是根据本公开实施例4的一种模型部署装置的示意图;FIG8 is a schematic diagram of a model deployment device according to Embodiment 4 of the present disclosure;

图9是根据本公开实施例5的一种模型部署装置的示意图;FIG9 is a schematic diagram of a model deployment device according to Embodiment 5 of the present disclosure;

图10是根据本公开实施例6的一种模型部署装置的示意图; FIG10 is a schematic diagram of a model deployment device according to Embodiment 6 of the present disclosure;

图11是根据本公开实施例7的一种模型部署系统的示意图;FIG11 is a schematic diagram of a model deployment system according to Embodiment 7 of the present disclosure;

图12是根据本公开实施例的一种计算机终端的结构框图。FIG. 12 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the scheme of the present disclosure, the technical scheme in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in the field without creative work should fall within the scope of protection of the present disclosure.

需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present disclosure described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products, or devices.

本公开提供的技术方案主要采用大模型技术实现,此处的大模型是指具有大规模模型参数的深度学习模型,通常可以包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型(Foundation Model),通过大规模无标注的语料进行大模型的预训练,产出亿级以上参数的预训练模型,这种模型能适应广泛的下游任务,模型具有较好的泛化能力,例如大规模语言模型(Large Language Model,LLM)、多模态预训练模型(multi-modal pre-training model)等。The technical solution provided by the present disclosure is mainly implemented by large-scale model technology. The large model here refers to a deep learning model with large-scale model parameters, which can usually contain hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters. The large model can also be called the foundation model/foundation model (Foundation Model). The large model is pre-trained by large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks and has good generalization ability, such as large-scale language model (Large Language Model, LLM), multi-modal pre-training model (multi-modal pre-training model), etc.

需要说明的是,大模型在实际应用时,可以通过少量样本对预训练模型进行微调,使得大模型可以应用于不同的任务中。例如,大模型可以广泛应用于自然语言处理(Natural Language Processing,简称NLP)、计算机视觉等领域,具体可以应用于如视觉问答(Visual Question Answering,简称VQA)、图像描述(Image Caption,简称IC)、图像生成等计算机视觉领域任务,也可以广泛应用于基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务。因此,大模型主要的应用场景包括但不限于数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。在本公开实施例中,以模型部署场景下通过对大模型进行部署为例进行解释说明。It should be noted that when the large model is actually applied, the pre-trained model can be fine-tuned through a small number of samples so that the large model can be applied to different tasks. For example, the large model can be widely used in natural language processing (NLP), computer vision and other fields, and can be specifically applied to computer vision tasks such as visual question answering (VQA), image caption (IC), image generation, etc. It can also be widely used in natural language processing tasks such as text-based sentiment classification, text summary generation, and machine translation. Therefore, the main application scenarios of the large model include but are not limited to digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc. In the embodiment of the present disclosure, the deployment of the large model in the model deployment scenario is used as an example for explanation.

首先,在对本公开实施例进行描述的过程中出现的部分名词或术语适用于如下解释:First, some nouns or terms that appear in the process of describing the embodiments of the present disclosure are subject to the following explanations:

网络附接存储(Network Attached Storage,简称为NAS):通过网络提供文件共享服务,并具有其他附加功能,可以提供一个方便和可靠的方式来存储和共享数据;Network Attached Storage (NAS): provides file sharing services over the network and has other additional functions, which can provide a convenient and reliable way to store and share data;

云对象存储(Object Storage Service,简称为OSS):可以提供云存储服务,可以 用于存储和管理海量的非结构数据;Cloud Object Storage Service (OSS): It can provide cloud storage services. Used to store and manage massive amounts of unstructured data;

虚拟专有网络(Virtual Private Cloud,简称为VPC):可以是一种在云计算环境中创建的虚拟网络环境,可以提供一种隔离和安全的方式来部署和管理云资源;Virtual Private Cloud (VPC): A virtual network environment created in a cloud computing environment that provides an isolated and secure way to deploy and manage cloud resources.

在线服务平台(Elastic Algorithm Service,简称为EAS):可以是一种用于部署和管理机器学习模型的平台,EAS为用户提供了一种便捷的方式来部署、运行和管理自己的机器学习算法和模型,并提供了可扩展的计算资源和高性能的计算能力;Online service platform (Elastic Algorithm Service, referred to as EAS): It can be a platform for deploying and managing machine learning models. EAS provides users with a convenient way to deploy, run and manage their own machine learning algorithms and models, and provides scalable computing resources and high-performance computing capabilities;

容器编排平台(Kubernetes,简称为K8S):可以组织和管理容器化应用程序,帮助用户简化应用程序的部署、管理和扩展。Container orchestration platform (Kubernetes, abbreviated as K8S): can organize and manage containerized applications, helping users simplify application deployment, management, and expansion.

为了解决上述问题,本公开提出了一种基于存储设备多地域存储的方法,在同一地域内将存储设备和服务部署在同一个虚拟专有网络中以实现高速互联,消除跨地域文件传输,从而缩短模型文件的加载时长,并进一步提升了服务的弹性能力。In order to solve the above problems, the present disclosure proposes a method based on multi-regional storage of storage devices, which deploys storage devices and services in the same virtual private network in the same region to achieve high-speed interconnection, eliminate cross-regional file transfer, thereby shortening the loading time of model files and further improving the elasticity of services.

实施例1Example 1

根据本公开实施例,提供了一种模型部署方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, a model deployment method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

考虑到大模型的模型参数量庞大,且移动终端的运算资源有限,本公开实施例提供的上述模型部署方法可以应用于如图1所示的应用场景,但不仅限于此。图1是根据本公开实施例的一种模型部署方法的虚拟现实设备的硬件环境的示意图。在如图1所示的应用场景中,大模型部署在服务器10中,服务器10可以通过局域网连接、广域网连接、因特网连接,或者其他类型的网络,连接一个或多个客户端设备20,此处的客户端设备20可以包括但不限于:智能手机、平板电脑、笔记本电脑、掌上电脑、个人计算机、智能家居设备、车载设备等。客户端设备20可以通过图形用户界面与用户进行交互,实现对大模型的调用,进而实现本公开实施例所提供的方法。Considering the huge amount of model parameters of the large model and the limited computing resources of the mobile terminal, the above-mentioned model deployment method provided in the embodiment of the present disclosure can be applied to the application scenario shown in Figure 1, but is not limited thereto. Figure 1 is a schematic diagram of the hardware environment of a virtual reality device according to a model deployment method of an embodiment of the present disclosure. In the application scenario shown in Figure 1, the large model is deployed in a server 10, and the server 10 can be connected to one or more client devices 20 through a local area network connection, a wide area network connection, an Internet connection, or other types of networks. The client device 20 here may include but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a car-mounted device, etc. The client device 20 can interact with the user through a graphical user interface to implement the call to the large model, thereby implementing the method provided in the embodiment of the present disclosure.

在本公开实施例中,客户端设备和服务器构成的系统可以执行如下步骤:客户端设备执行生成待部署模型的模型文件;服务器执行获取待部署模型的模型文件,以及多个服务器集群;将模型文件分发至多个存储设备;将存储设备挂载至与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。需要说明的是,在客户端设备的运行资源能够满足大模型的部署和运行条件的情况下,本公开实施例可以在客户端设备中进行。In the embodiments of the present disclosure, the system composed of the client device and the server can perform the following steps: the client device generates a model file of the model to be deployed; the server obtains the model file of the model to be deployed and multiple server clusters; distributes the model file to multiple storage devices; mounts the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters. It should be noted that the embodiments of the present disclosure can be performed in the client device when the operating resources of the client device can meet the deployment and operation conditions of the large model.

在上述运行环境下,本公开提供了如图2所示的模型部署方法。图2是根据本公开实施例1的模型部署方法的流程图。如图2所示,该方法可以包括如下步骤:In the above operating environment, the present disclosure provides a model deployment method as shown in Figure 2. Figure 2 is a flow chart of the model deployment method according to Embodiment 1 of the present disclosure. As shown in Figure 2, the method may include the following steps:

步骤S202,获取待部署模型的模型文件。Step S202, obtaining the model file of the model to be deployed.

上述的待部署模型可以为待部署的任意模型。待部署模型可以大模型、神经网络 模型、语言处理模型、图像处理模型,其中,大模型可以是指具有大量参数和强大计算能力的机器学习或深度学习模型。在本公开实施例中,以待部署模型为大模型为例进行说明,可以根据实际需求确定待部署模型。The above-mentioned model to be deployed can be any model to be deployed. The model to be deployed can be a large model, a neural network Model, language processing model, image processing model, where a large model can refer to a machine learning or deep learning model with a large number of parameters and powerful computing power. In the embodiments of the present disclosure, the model to be deployed is a large model as an example for explanation, and the model to be deployed can be determined according to actual needs.

上述的模型文件可以为训练好待部署模型的模型文件,其中,模型文件包括但不限于模型的参数、结构,此处仅作示例,具体的模型文件可以根据实际使用情况定义。The above-mentioned model file may be a model file of a trained model to be deployed, wherein the model file includes but is not limited to the parameters and structure of the model. This is only an example, and the specific model file may be defined according to actual usage.

在一种可选的实施例中,可以获取待部署模型的模型文件,并确定多个服务器集群,其中,不同服务器集群在地理位置上部署在不同地域内。In an optional embodiment, a model file of a model to be deployed may be obtained, and a plurality of server clusters may be determined, wherein different server clusters are geographically deployed in different regions.

上述的多个服务器集群可以用于为用户提供服务。上述的不同地域可以是不同国家、不同省份、不同城市等,此处对不同地域不做具体限定。The above-mentioned multiple server clusters can be used to provide services for users. The above-mentioned different regions can be different countries, different provinces, different cities, etc., and the different regions are not specifically limited here.

在一种可选的实施例中,在待部署模型训练完成,生成模型文件之后,可以将模型文件存储至云服务器中,在对待部署模型进行部署时,可以从云服务器中调取模型文件,并将模型文件分发至多个不同地域内的存储设备。In an optional embodiment, after the training of the model to be deployed is completed and the model file is generated, the model file can be stored in a cloud server. When the model to be deployed is deployed, the model file can be retrieved from the cloud server and distributed to storage devices in multiple different regions.

步骤S204,将模型文件分发至多个存储设备。Step S204: distribute the model file to multiple storage devices.

其中,不同存储设备在地理位置上部署在不同地域内。Among them, different storage devices are deployed in different regions geographically.

上述的存储设备可以为硬件存储设备,例如,可以是NAS,但不仅限于此。The above storage device may be a hardware storage device, for example, a NAS, but is not limited thereto.

在一种可选的实施例中,可以根据用户发起的分发请求,将模型文件分发至多个存储设备中。可选的,模型管控平台可以根据用户发起的分发请求从云服务器中调取模型文件,并将模型文件分发存储至多个存储设备中。In an optional embodiment, the model file can be distributed to multiple storage devices according to the distribution request initiated by the user. Optionally, the model management and control platform can retrieve the model file from the cloud server according to the distribution request initiated by the user, and distribute and store the model file to multiple storage devices.

步骤S206,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。Step S206, mounting the storage device to a target server cluster in the multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters.

上述的挂载是指将存储设备连接到目标服务器集群的服务器上,以便于存储设备中的模型文件可以在目标服务器集群的服务器上可见和可访问。The above-mentioned mounting refers to connecting the storage device to the server of the target server cluster so that the model files in the storage device can be visible and accessible on the servers of the target server cluster.

在一种可选的实施例中,可以将存储设备挂载至与存储设备部署在同一个地域的目标服务器集群上,便于用户在目标服务器集群上可以通过挂载的存储设备快速访问到模型文件,从而可以更高效的使用模型提供的服务。In an optional embodiment, the storage device can be mounted to a target server cluster deployed in the same region as the storage device, so that users can quickly access the model file through the mounted storage device on the target server cluster, thereby more efficiently using the services provided by the model.

通过上述步骤,可以获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。Through the above steps, the model file of the model to be deployed can be obtained; the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the storage device is mounted on a target server cluster in multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to achieve high-speed interconnection between the storage device and the server cluster, eliminate cross-regional file transmission, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.

本公开上述实施例中,存储设备包括:网络附接存储,其中,将模型文件分发至 多个存储设备,包括:通过公共网络将模型文件发送至多个网络附接存储,并存储在多个网络附接存储中。In the above embodiment of the present disclosure, the storage device includes: a network attached storage, wherein the model file is distributed to The plurality of storage devices comprises: sending the model file to the plurality of network attached storages through a public network and storing the model file in the plurality of network attached storages.

上述的网络附接存储可以为通过网络提供文件共享服务的存储设备,并具有其他附加功能,其中,网络附接存储还可以为用户提供一个方便和可靠的方式来存储和共享数据。The above-mentioned network attached storage can be a storage device that provides file sharing services through the network and has other additional functions. Among them, the network attached storage can also provide users with a convenient and reliable way to store and share data.

上述的公共网络可以为开放、共享和广泛覆盖的网络。The above-mentioned public network may be an open, shared and widely covered network.

在一种可选的实施例中,由于公共网络的覆盖范围较广,因此,可以通过公共网络将模型文件发送至多个网络附接存储中,以便于多个网络附接存储都可以接收到模型文件,多个网络附接存储在接收到模型文件之后,可以对模型文件进行存储。In an optional embodiment, since the public network has a wide coverage area, the model file can be sent to multiple network attached storages through the public network so that the multiple network attached storages can receive the model file. After receiving the model file, the multiple network attached storages can store the model file.

在另一种可选的实施例中,为了提高模型文件存储过程中的安全性,可以在通过公共网络将模型文件发送至多个网络附接存储之前,可以对模型文件进行加密,得到加密后的模型文件,然后通过公共网络将加密后的模型文件发送至多个网络附接存储中,多个网络附接存储在接收到加密后的模型文件后,可以对加密后的模型文件进行解密,得到模型文件,并对模型文件进行存储。In another optional embodiment, in order to improve the security during the storage of the model file, the model file can be encrypted before sending the model file to multiple network attached storages through a public network to obtain an encrypted model file, and then the encrypted model file can be sent to the multiple network attached storages through the public network. After receiving the encrypted model file, the multiple network attached storages can decrypt the encrypted model file to obtain the model file and store the model file.

本公开上述实施例中,在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,包括:基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络;获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群,其中,不同服务器集群对应不同虚拟专有网络;在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。In the above embodiment of the present disclosure, when deploying the inference service of the model to be deployed, the storage device is mounted to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, including: determining the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed; obtaining the server cluster corresponding to the target virtual private network to obtain the target server cluster, wherein different server clusters correspond to different virtual private networks; when deploying the inference service of the model to be deployed, the storage device is mounted to a server in the target server cluster.

上述的目标地域可以根据存储设备部署的地理位置确定,也可以根据存储设备部署的国家、省份等确定,此处对目标地域的确定方式不做限定,可以根据实际情况设置确定存储设备部署的目标地域的方式。The above-mentioned target region can be determined according to the geographical location where the storage device is deployed, or according to the country, province, etc. where the storage device is deployed. The method for determining the target region is not limited here, and the method for determining the target region for storage device deployment can be set according to actual conditions.

上述的目标虚拟专有网络(Virtual Private Cloud,简称为VPC)可以提供一种隔离和安全的方式来部署和管理资源,通过虚拟专有网络还可以提高文件之间的传输速度。可选的,目标虚拟专有网络可以是一种在云计算环境中创建的虚拟网络环境。The target virtual private network (Virtual Private Cloud, referred to as VPC) can provide an isolated and secure way to deploy and manage resources, and can also increase the transmission speed between files through the virtual private network. Optionally, the target virtual private network can be a virtual network environment created in a cloud computing environment.

在一种可选的实施例中,可以基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络,可以获取连接该目标虚拟专用网络的服务器集群,从连接该目标虚拟专用网络的服务器集群中选择一个或多个服务器集群作为目标服务器集群;可选的,可以从连接目标虚拟专用网络的服务器集群中选择与存储设备在地理位置上最近的服务器集群为上述的目标服务器集群。此处确定目标服务器集群的方式仅作示例说明,不做具体限定,可以根据实际情况进行设置。In an optional embodiment, the target virtual private network corresponding to the storage device can be determined based on the target region where the storage device is deployed, the server cluster connected to the target virtual private network can be obtained, and one or more server clusters can be selected from the server clusters connected to the target virtual private network as the target server cluster; optionally, the server cluster that is geographically closest to the storage device can be selected from the server clusters connected to the target virtual private network as the above-mentioned target server cluster. The method of determining the target server cluster here is only for example description and is not specifically limited. It can be set according to actual conditions.

在另一种可选的实施例中,在得到存储设备部署的目标地域之后,可以根据目标虚拟专有网络对目标地域内的服务器集群进行搜索,可以将属于目标地域中的一个或 多个服务器集群作为上述的目标服务器集群。可选的,可以将与存储设备距离最近的服务器集群作为上述的目标服务器集群,此处对确定目标服务器集群的具体方式不做限定。In another optional embodiment, after obtaining the target region for storage device deployment, the server cluster in the target region may be searched according to the target virtual private network, and one or more of the server clusters in the target region may be searched. Multiple server clusters are used as the target server cluster. Optionally, the server cluster closest to the storage device can be used as the target server cluster. The specific method for determining the target server cluster is not limited here.

在又一种可选的实施例中,在部署待部署模型的推理服务时,可以将存储设备连接在目标服务器集群中的服务器上,以便在服务器上可以通过内网读取到存储设备中的模型文件。In another optional embodiment, when deploying the inference service of the model to be deployed, the storage device can be connected to the server in the target server cluster so that the model file in the storage device can be read on the server through the intranet.

本公开上述实施例中,在将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上之后,该方法还包括:构建目标服务器集群的弹性调度集群;基于弹性调度集群,确定部署待部署模型所需的计算资源。In the above embodiment of the present disclosure, after mounting the storage device to a target server cluster in multiple server clusters and deploying the storage device in the same region, the method also includes: building an elastic scheduling cluster of the target server cluster; based on the elastic scheduling cluster, determining the computing resources required to deploy the model to be deployed.

上述的弹性调度集群是一种基于云计算和虚拟化技术的集群管理方式,可以根据系统负载的变化自动调整集群中的计算资源,以满足不同的需求。弹性调度集群可以根据负载情况调整集群规模,包括增加或减少计算节点,以应对高负载或低负载的情况,这种集群管理方式可以提高系统的灵活性和可用性,同时也能够节省资源和成本。The above-mentioned elastic scheduling cluster is a cluster management method based on cloud computing and virtualization technology, which can automatically adjust the computing resources in the cluster according to changes in the system load to meet different needs. The elastic scheduling cluster can adjust the cluster size according to the load situation, including adding or reducing computing nodes to cope with high or low load situations. This cluster management method can improve the flexibility and availability of the system, while also saving resources and costs.

上述的弹性伸缩能力可以根据实时的需求和负载情况,自动调整目标服务集群的资源和容量,以便满足用户的需求;弹性伸缩能力可以根据不同的情况自动增加或减少目标服务器集群中服务器实例的数量,以便实现负载均衡和高可用性。The above-mentioned elastic scaling capability can automatically adjust the resources and capacity of the target service cluster according to real-time demand and load conditions to meet user needs; the elastic scaling capability can automatically increase or decrease the number of server instances in the target server cluster according to different situations to achieve load balancing and high availability.

在一种可选的实施例中,可以根据在线服务平台(Elastic Algorithm Service,简称为EAS)或容器编排平台(Kubernetes,简称为K8S)构建弹性调度集群,可以根据弹性调度集群为目标服务集群提供弹性伸缩能力,以便目标服务集群可以根据事实的需求和负载情况,自动调整系统资源和容量以满足用户的需求。In an optional embodiment, an elastic scheduling cluster can be built based on an online service platform (Elastic Algorithm Service, referred to as EAS) or a container orchestration platform (Kubernetes, referred to as K8S). The elastic scheduling cluster can provide elastic scaling capabilities for the target service cluster so that the target service cluster can automatically adjust system resources and capacity to meet user needs based on actual demand and load conditions.

在另一种可选的实施例中,当目标服务器集群的负载较低时,可以减少服务器的数量以节省成本,当目标服务器集群的负载较高时,可以增加服务器的数量以提高性能和容量;可选的,可以根据预先设置的规则来对目标服务器集群中的服务器的数量进行管理,也可以根据实时监测和分析来动态调整目标服务器集群中服务器的数量。In another optional embodiment, when the load of the target server cluster is low, the number of servers can be reduced to save costs, and when the load of the target server cluster is high, the number of servers can be increased to improve performance and capacity; optionally, the number of servers in the target server cluster can be managed according to pre-set rules, and the number of servers in the target server cluster can also be dynamically adjusted based on real-time monitoring and analysis.

在又一种可选的实施例中,在构建目标服务器集群的弹性调度集群后,需要确定部署待部署模型所需的计算资源,例如,内存、存储等资源,并根据模型的计算需求进行配置和分配,通过确定部署所需的计算资源,可以确保待部署模型在弹性调度集群上能够正常运行,并满足所需的性能和可用性要求,并且,可以通过构建弹性调度集群提供足够的计算资源来支持待部署模型的部署和运行。In another optional embodiment, after building an elastic scheduling cluster of the target server cluster, it is necessary to determine the computing resources required to deploy the model to be deployed, such as memory, storage and other resources, and configure and allocate them according to the computing requirements of the model. By determining the computing resources required for deployment, it can be ensured that the model to be deployed can run normally on the elastic scheduling cluster and meet the required performance and availability requirements. In addition, sufficient computing resources can be provided by building an elastic scheduling cluster to support the deployment and operation of the model to be deployed.

本公开上述实施例中,该方法还包括:将预设资源分发至多个服务器集群;将模型文件分发至多个存储设备,包括:在预设资源分发完毕的情况下,将模型文件分发至多个存储设备。In the above embodiment of the present disclosure, the method also includes: distributing preset resources to multiple server clusters; distributing model files to multiple storage devices, including: distributing model files to multiple storage devices when the preset resources are distributed.

上述的预设资源可以是在服务器集群中共享和分配的各种资源,其中,预设资源可以是计算资源、网络资源、存储资源、软件资源、负载均衡资源,此处仅作实例描 述,具体的预设资源可以根据实际情况进行设置。The above-mentioned preset resources can be various resources shared and allocated in the server cluster, wherein the preset resources can be computing resources, network resources, storage resources, software resources, load balancing resources, and the examples are described here only. The specific preset resources can be set according to the actual situation.

上述的预设资源可以为预先设置的资源类型和资源量,此处对预设资源不做限定。The above-mentioned preset resources may be pre-set resource types and resource quantities, and the preset resources are not limited here.

上述的预设资源还可以为根据待部署模型部署所需要的计算资源确定预设资源,并将预设资源分发至多个服务器集群中,以便于后续通过多个服务器集群对待部署模型进行部署。The above-mentioned preset resources can also determine the preset resources according to the computing resources required for the deployment of the model to be deployed, and distribute the preset resources to multiple server clusters, so as to facilitate the subsequent deployment of the model to be deployed through multiple server clusters.

在一种可选的实施例中,可以通过模型管控平台可以用于提供预设资源,预设资源包括但不限于管理集群元数据、服务部署、自动/定时缩放器、监测数据、模型分布、监测资源使用情况、灰度/回滚。In an optional embodiment, the model management and control platform can be used to provide preset resources, including but not limited to managing cluster metadata, service deployment, automatic/scheduled scalers, monitoring data, model distribution, monitoring resource usage, and grayscale/rollback.

在另一种可选的实施例中,可以将预设资源分发至多个服务器集群,可以在预设资源分发完毕的情况下,将模型文件分发至多个存储设备,以便于后续多个存储设备可以使用服务器集群中目标服务器的预设资源根据模型文件对待部署模型进行部署。In another optional embodiment, the preset resources can be distributed to multiple server clusters, and after the preset resources are distributed, the model file can be distributed to multiple storage devices, so that the multiple storage devices can subsequently use the preset resources of the target server in the server cluster to deploy the model to be deployed according to the model file.

本公开上述实施例中,获取待部署模型的模型文件,包括:从中心仓库中获取待部署模型的模型文件,其中,模型文件预先上传至中心仓库。In the above embodiment of the present disclosure, obtaining the model file of the model to be deployed includes: obtaining the model file of the model to be deployed from a central warehouse, wherein the model file is pre-uploaded to the central warehouse.

上述的中心仓库可以为云对象存储(Object Storage Service,简称为OSS),但不限于此,此处仅作示例。其中,中心仓库可以用于提供云存储服务,可以用于存储和管理海量非结构化数据。The central warehouse mentioned above can be a cloud object storage (Object Storage Service, referred to as OSS), but is not limited to this, and is only used as an example. Among them, the central warehouse can be used to provide cloud storage services and can be used to store and manage massive unstructured data.

上述的中心仓库可以与服务器集群进行通讯,还可以与服务器集群上的存储设备进行文件传输,根据用户的模型部署指令,中心仓库可以自动将待部署模型的模型文件分发至多个服务器集群上挂载的多个存储设备中。The above-mentioned central warehouse can communicate with the server cluster and transfer files with the storage devices on the server cluster. According to the user's model deployment instructions, the central warehouse can automatically distribute the model files of the model to be deployed to multiple storage devices mounted on multiple server clusters.

在一种可选的实施例中,由于中心仓库的内存较大,因此,在对待部署模型训练完毕之后,可以先将待部署模型的模型文件预先上传到中心仓库中,在需要对待部署模型进行部署时,可以根据模型部署指令从中心仓库中调取待部署模型对应的模型文件,并将模型文件分发至多个存储设备。In an optional embodiment, since the central warehouse has a large memory, after the training of the model to be deployed is completed, the model file of the model to be deployed can be pre-uploaded to the central warehouse. When the model to be deployed needs to be deployed, the model file corresponding to the model to be deployed can be retrieved from the central warehouse according to the model deployment instruction, and the model file can be distributed to multiple storage devices.

本公开上述实施例中,待部署模型为大模型。In the above embodiments of the present disclosure, the model to be deployed is a large model.

上述的大模型可以为存储量较大,计算能力较强的模型。The aforementioned large model may be a model with a large storage capacity and a strong computing capability.

图3是根据本公开实施例的一种模型部署方法的结构图,如图3所示,模型管控平台可以将待部署模型的模型文件存储至中心仓库中,其中,模型管控平台可以用于提供预设资源,例如,管理集群元数据(cluster metadata)、服务部署(service deploy)、自动/定时缩放器(auto/cron scaler)、监测数据(monitoring data)、模型分布(model distribute)、监测资源使用情况(resource usage)、灰度/回滚(gray/rollback)、管理多个账户(multiple account)。可以将预设资源分发至多个服务器集群,在预设资源分发完毕的情况下,可以将模型文件分发至多个存储设备,如图3中所示,可以包含有区域A、区域B、区域C,其中,区域A中包含有服务器集群A、服务器集群B、网络附接存储,区域B中包含有服务器集群A、服务器集群B、网络附接存储,区域C中 包含有服务器集群A、服务器集群B、网络附接存储,可以将不同地区的网络附接存储挂载在服务器集群中的服务器中,以便于服务器集群中的代理可以对网络附接存储中模型文件进行资源存储,服务器集群中的服务器可以对网络附接存储中的模型文件进行资源加载。可以根据在线服务平台或容器编排平台构建弹性调度集群,可以根据弹性调度集群为目标服务集群提供弹性伸缩能力,以便目标服务集群可以根据事实的需求和负载情况,自动调整系统资源和容量以满足用户的需求。FIG3 is a structural diagram of a model deployment method according to an embodiment of the present disclosure. As shown in FIG3 , the model management and control platform can store the model files of the model to be deployed in the central warehouse, wherein the model management and control platform can be used to provide preset resources, for example, to manage cluster metadata, service deployment, auto/cron scaler, monitoring data, model distribute, monitor resource usage, gray/rollback, and manage multiple accounts. The preset resources can be distributed to multiple server clusters. When the preset resources are distributed, the model files can be distributed to multiple storage devices. As shown in FIG3 , it can include region A, region B, and region C, wherein region A includes server cluster A, server cluster B, and network attached storage, region B includes server cluster A, server cluster B, and network attached storage, and region C includes server cluster A, server cluster B, and network attached storage. It includes server cluster A, server cluster B, and network attached storage. The network attached storage in different regions can be mounted on the servers in the server cluster, so that the agents in the server cluster can store resources for model files in the network attached storage, and the servers in the server cluster can load resources for model files in the network attached storage. An elastic scheduling cluster can be built based on the online service platform or container orchestration platform, and elastic scaling capabilities can be provided for the target service cluster based on the elastic scheduling cluster, so that the target service cluster can automatically adjust system resources and capacity to meet user needs based on actual needs and load conditions.

本公开中可以包含有两种模型部署场景,其中,第一种模型部署场景可以为模型、程序、数据和镜像分离部署的场景,图4是根据本公开实施例的另一种模型部署方法的流程图,如图4所示,该方法包括:The present disclosure may include two model deployment scenarios, wherein the first model deployment scenario may be a scenario where models, programs, data, and images are deployed separately. FIG4 is a flowchart of another model deployment method according to an embodiment of the present disclosure. As shown in FIG4, the method includes:

步骤S401,将预设资源分发至多个服务器集群中;Step S401, distributing preset resources to multiple server clusters;

图3中的模型管控平台可以用于提供上述预设资源。The model management and control platform in Figure 3 can be used to provide the above-mentioned preset resources.

步骤S402,在预设资源分发完毕的情况下,将模型文件分发至多个存储设备;Step S402, when the preset resources are distributed, the model file is distributed to multiple storage devices;

可选的,在预设资源未分发完毕的情况下,若执行将模型文件分发至多个存储设备,则返回错误提示,直至预设资源分发完毕,再执行将模型文件分发至多个存储设备的步骤。Optionally, if the preset resources are not distributed completely, if the step of distributing the model file to multiple storage devices is executed, an error prompt is returned until the preset resources are distributed completely, and then the step of distributing the model file to multiple storage devices is executed.

步骤S403,基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络;Step S403, determining a target virtual private network corresponding to the storage device based on the target region where the storage device is deployed;

步骤S404,获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群;Step S404, obtaining a server cluster corresponding to the target virtual private network, and obtaining a target server cluster;

步骤S405,在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。Step S405, when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.

第二种模型部署场景可以为镜像部署的场景,在此过程无需执行资源分发流程,直接进行服务部署即可,图5是根据本公开实施例的另一种模型部署方法的流程图,如图5所示,该方法包括:The second model deployment scenario may be a mirror deployment scenario. In this process, there is no need to execute the resource distribution process, and the service deployment can be performed directly. FIG5 is a flow chart of another model deployment method according to an embodiment of the present disclosure. As shown in FIG5, the method includes:

步骤S501,获取待部署模型的模型文件;Step S501, obtaining a model file of a model to be deployed;

步骤S502,将模型文件分发至多个存储设备;Step S502, distributing the model file to multiple storage devices;

步骤S503,基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络;Step S503, determining a target virtual private network corresponding to the storage device based on the target region where the storage device is deployed;

步骤S504,获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群;Step S504, obtaining a server cluster corresponding to the target virtual private network, and obtaining a target server cluster;

步骤S505,在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。Step S505, when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.

以下为单机测试的实验结果,本公开主要采用NAS挂载模型文件,可以使用13B参数的GPT模型(25GB大小)对OSS挂载模型文件和NAS挂载模型文件进行对比:The following are the experimental results of a single-machine test. This disclosure mainly uses NAS mount model files. The OSS mount model file and the NAS mount model file can be compared using a 13B parameter GPT model (25GB size):

1.使用dd命令进行测试,time dd if=/home/workspace/model/pytorch_model.bin of=/dev/null bs=100K count=100000000; 1. Use the dd command to test, time dd if = /home/workspace/model/pytorch_model.bin of = /dev/null bs = 100K count = 100000000;

a.使用OSS挂载模型文件,25716611305字节(26GB,24GiB)已复制,115.811s,222MB/s;a. Use OSS to mount the model file. 25716611305 bytes (26GB, 24GiB) have been copied. 115.811s, 222MB/s;

b.使用NAS挂载模型文件,25716611305字节(26GB,24GiB)已复制,4.23957s,6.1GB/s;b. Use NAS to mount the model file. 25716611305 bytes (26GB, 24GiB) have been copied. 4.23957s, 6.1GB/s;

2.大模型推理服务启动速度测试2. Large model inference service startup speed test

a.使用OSS挂载模型文件,服务启动总耗时:5m28.259s;a. Use OSS to mount the model file. The total time taken to start the service is: 5m28.259s;

b.使用NAS挂载模型文件,服务启动总耗时:4m2.845s;b. Use NAS to mount the model file. The total time taken to start the service is: 4m2.845s;

通过对比可知,NAS存储启动耗时优于OSS。By comparison, it can be seen that the startup time of NAS storage is shorter than that of OSS.

本公开旨在提供一种用于提升大模型服务多地域、大规模弹性能力的系统。该系统基于NAS存储,实现模型文件在多地域的分发,并在部署推理服务时自动挂载所在地域的NAS以加载模型文件。由于NAS和推理服务在同一地域VPC内,可以实现高速互联,从而缩短模型加载时长,加快服务启动速度,提高弹性能力。The present disclosure aims to provide a system for improving the multi-regional and large-scale elasticity of large model services. The system is based on NAS storage to achieve the distribution of model files in multiple regions, and automatically mounts the NAS in the region to load the model files when deploying the inference service. Since the NAS and the inference service are in the same region VPC, high-speed interconnection can be achieved, thereby shortening the model loading time, speeding up the service startup speed, and improving elasticity.

需要说明的是,本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that, for the aforementioned method embodiments, for the sake of simplicity, they are all described as a series of action combinations, but those skilled in the art should be aware that the present disclosure is not limited by the order of the actions described, because according to the present disclosure, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present disclosure.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on such an understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.

实施例2Example 2

根据本公开实施例,还提供了一种模型部署方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, a model deployment method is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although the logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

图6是根据本公开实施例2的一种模型部署方法的流程图,如图6所示,该方法 包括如下步骤:FIG. 6 is a flow chart of a model deployment method according to Embodiment 2 of the present disclosure. As shown in FIG. 6 , the method The steps include:

步骤S602,响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;Step S602, in response to receiving the model file of the model to be deployed, storing the model file in the central warehouse;

步骤S604,响应于接收到模型分发请求,将模型文件分发至多个存储设备;Step S604, in response to receiving the model distribution request, distributing the model file to multiple storage devices;

其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群。Among them, different storage devices are deployed in different regions geographically, and different server clusters are also deployed in different regions.

上述的模型分发请求可以为在异地部署待部署模型时,需要预先将模型分发到存储设备时的请求,可选的,可以通过对交互界面进行触控的方式生成上述的模型分发请求,此处对生成上述模型分发请求的具体方式不做限定。The above-mentioned model distribution request may be a request for distributing the model to a storage device in advance when the model to be deployed is deployed remotely. Optionally, the above-mentioned model distribution request may be generated by touching the interactive interface. The specific method for generating the above-mentioned model distribution request is not limited here.

步骤S606,响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。Step S606, in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

上述的服务器部署请求可以为用户需要对待部署模型进行部署时生成的请求,可选的,可以通过对交互界面进行触控的方式生成上述的服务器部署请求,此处对生成上述服务器部署请求的具体方式不做限定。The above-mentioned server deployment request may be a request generated when a user needs to deploy a model to be deployed. Optionally, the above-mentioned server deployment request may be generated by touching an interactive interface. The specific method of generating the above-mentioned server deployment request is not limited here.

通过上述步骤,响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;响应于接收到模型分发请求,将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群;响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。Through the above steps, in response to receiving the model file of the model to be deployed, the model file is stored in the central warehouse; in response to receiving the model distribution request, the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted to the target server cluster deployed in the same region as the storage device in the multiple server clusters, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to the storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize the high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例3Example 3

根据本公开实施例,还提供了一种模型部署方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, a model deployment method is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although the logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

图7是根据本公开实施例3的一种模型部署方法的流程图,如图7所示,该方法包括如下步骤:FIG. 7 is a flow chart of a model deployment method according to Embodiment 3 of the present disclosure. As shown in FIG. 7 , the method includes the following steps:

步骤S702,通过调用第一接口获取待部署模型的模型文件。 Step S702: Acquire the model file of the model to be deployed by calling the first interface.

其中,第一接口包括第一参数,第一参数的参数值为模型文件。The first interface includes a first parameter, and a parameter value of the first parameter is a model file.

上述步骤中的第一接口可以是云服务器与客户端之间进行数据交互的接口,客户端可以将待部署模型的模型文件传入接口函数,作为接口函数的第一参数,实现将待部署模型的模型文件上传到云服务器的目的。The first interface in the above steps can be an interface for data interaction between the cloud server and the client. The client can pass the model file of the model to be deployed into the interface function as the first parameter of the interface function to achieve the purpose of uploading the model file of the model to be deployed to the cloud server.

步骤S704,将模型文件分发至多个存储设备。Step S704: distribute the model file to multiple storage devices.

其中,不同存储设备在地理位置上部署在不同地域内。Among them, different storage devices are deployed in different regions geographically.

步骤S706,在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果。Step S706, when deploying the inference service of the model to be deployed, the storage device is mounted to the target server cluster in the multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters to obtain the deployment result of the model to be deployed.

步骤S708,通过调用第二接口输出部署结果。Step S708: output the deployment result by calling the second interface.

其中,第二接口包括第二参数,第二参数的参数值为部署结果。The second interface includes a second parameter, and a parameter value of the second parameter is a deployment result.

上述的第二接口可以是云服务器和客户端之间进行数据交互的接口,云服务器可以将部署结果传入接口函数,作为接口函数的第二参数,实现将部署结果下发至客户端的目的。The above-mentioned second interface can be an interface for data interaction between the cloud server and the client. The cloud server can pass the deployment result to the interface function as the second parameter of the interface function to achieve the purpose of sending the deployment result to the client.

通过上述步骤,通过调用第一接口获取待部署模型的模型文件,其中,第一接口包括第一参数,第一参数的参数值为模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果;通过调用第二接口输出部署结果,其中,第二接口包括第二参数,第二参数的参数值为部署结果,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。Through the above steps, the model file of the model to be deployed is obtained by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, the storage device is mounted to a target server cluster in the same region as the storage device in multiple server clusters, so that the model to be deployed is deployed to multiple server clusters, and the deployment result of the model to be deployed is obtained; the deployment result is output by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to the storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and thus solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例4Example 4

根据本公开实施例,还提供了一种用于实施上述模型部署方法的模型部署装置,图8是根据本公开实施例4的一种模型部署装置的示意图,如图8所示,该装置800包括:获取模块802、分发模块804、挂载模块806。According to an embodiment of the present disclosure, a model deployment device for implementing the above-mentioned model deployment method is also provided. Figure 8 is a schematic diagram of a model deployment device according to embodiment 4 of the present disclosure. As shown in Figure 8, the device 800 includes: an acquisition module 802, a distribution module 804, and a mounting module 806.

其中,获取模块,设置为获取待部署模型的模型文件;分发模块,设置为将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;挂载模块,设置为将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的 目标服务器集群上,以使待部署模型部署至多个服务器集群。The acquisition module is configured to acquire the model file of the model to be deployed; the distribution module is configured to distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the mounting module is configured to mount the storage device to multiple server clusters deployed in the same region as the storage device. On the target server cluster, the model to be deployed can be deployed to multiple server clusters.

此处需要说明的是,上述获取模块802、分发模块804、挂载模块806对应于实施例1中的步骤S202至步骤S206,三个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述模块也可以作为装置的一部分可以运行在实施例1提供的服务器10中。It should be noted that the acquisition module 802, the distribution module 804, and the mounting module 806 correspond to steps S202 to S206 in Example 1, and the three modules and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned modules may also be part of the device and may run in the server 10 provided in Example 1.

本公开上述实施例中,存储设备包括:网络附接存储,其中,分发模块还用于通过公共网络将模型文件发送至多个网络附接存储,并存储在多个网络附接存储中。In the above embodiments of the present disclosure, the storage device includes: a network attached storage, wherein the distribution module is further used to send the model file to multiple network attached storages through a public network and store the model file in the multiple network attached storages.

本公开上述实施例中,挂载模块还用于基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络,获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群,其中,不同服务器集群对应不同虚拟专有网络,在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。In the above-mentioned embodiment of the present disclosure, the mounting module is also used to determine the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed, obtain the server cluster corresponding to the target virtual private network, and obtain the target server cluster, wherein different server clusters correspond to different virtual private networks, and when deploying the inference service of the model to be deployed, the storage device is mounted to the server in the target server cluster.

本公开上述实施例中,挂载模块还用于基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络,获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群,其中,不同服务器集群对应不同虚拟专有网络。In the above-mentioned embodiment of the present disclosure, the mounting module is also used to determine the target virtual private network corresponding to the storage device based on the target region where the storage device is deployed, obtain the server cluster corresponding to the target virtual private network, and obtain the target server cluster, wherein different server clusters correspond to different virtual private networks.

本公开上述实施例中,该装置还包括:构建模块、提供模块。In the above embodiments of the present disclosure, the device further includes: a building module and a providing module.

其中,构建模块,设置为构建弹性调度集群;提供模块,设置为基于弹性调度集群为目标服务器集群提供弹性伸缩能力。Among them, the construction module is configured to construct an elastic scheduling cluster; the provision module is configured to provide elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.

本公开上述实施例中,分发模块还用于将预设资源分发至多个服务器集群;分发模块还用于在预设资源分发完毕的情况下,将模型文件分发至多个存储设备。In the above embodiments of the present disclosure, the distribution module is further used to distribute the preset resources to multiple server clusters; the distribution module is further used to distribute the model file to multiple storage devices when the preset resources are distributed.

本公开上述实施例中,获取模块还用于从中心仓库中获取待部署模型的模型文件,其中,模型文件预先上传至中心仓库。In the above embodiment of the present disclosure, the acquisition module is further used to acquire the model file of the model to be deployed from the central warehouse, wherein the model file is uploaded to the central warehouse in advance.

本公开上述实施例中,待部署模型为大模型。In the above embodiments of the present disclosure, the model to be deployed is a large model.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例5Example 5

根据本公开实施例,还提供了一种用于实施上述模型部署方法的模型部署装置,图9是根据本公开实施例5的一种模型部署装置的示意图,如图9所示,该装置900包括:存储模块902、分发模块904、挂载模块906。According to an embodiment of the present disclosure, a model deployment device for implementing the above-mentioned model deployment method is also provided. Figure 9 is a schematic diagram of a model deployment device according to embodiment 5 of the present disclosure. As shown in Figure 9, the device 900 includes: a storage module 902, a distribution module 904, and a mounting module 906.

其中,存储模块,设置为响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;分发模块,设置为响应于接收到模型分发请求,将中心仓库存储的模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群;挂载模块,设置为响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个 地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。The storage module is configured to store the model file of the model to be deployed in the central warehouse in response to receiving the model file; the distribution module is configured to distribute the model file stored in the central warehouse to multiple storage devices in response to receiving the model distribution request, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; the mounting module is configured to mount the storage device to multiple server clusters deployed in the same region as the storage device based on the server deployment request in response to receiving the server deployment request. The target server clusters in the region can deploy the model to multiple server clusters.

此处需要说明的是,上述存储模块902、分发模块904、挂载模块906对应于实施例2中的步骤S602至步骤S606,三个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述模块也可以作为装置的一部分可以运行在实施例1提供的服务器10中。It should be noted that the storage module 902, distribution module 904, and mounting module 906 correspond to steps S602 to S606 in Example 2, and the three modules and corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in Example 1. It should be noted that the modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the modules may also be part of a device and run in the server 10 provided in Example 1.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例6Example 6

根据本公开实施例,还提供了一种用于实施上述模型部署方法的模型部署装置,图10是根据本公开实施例6的一种模型部署装置的示意图,如图10所示,该装置1000包括:获取模块1002、分发模块1004、挂载模块1006、输出模块1008。According to an embodiment of the present disclosure, a model deployment device for implementing the above-mentioned model deployment method is also provided. Figure 10 is a schematic diagram of a model deployment device according to embodiment 6 of the present disclosure. As shown in Figure 10, the device 1000 includes: an acquisition module 1002, a distribution module 1004, a mounting module 1006, and an output module 1008.

其中,获取模块,设置为通过调用第一接口获取待部署模型的模型文件,其中,第一接口包括第一参数,第一参数的参数值为模型文件;分发模块,设置为将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;挂载模块,设置为在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果;输出模块,设置为通过调用第二接口输出部署结果,其中,第二接口包括第二参数,第二参数的参数值为部署结果。Among them, the acquisition module is configured to obtain the model file of the model to be deployed by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; the distribution module is configured to distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; the mounting module is configured to mount the storage device to a target server cluster deployed in the same region as the storage device in multiple server clusters when deploying the inference service of the model to be deployed, so that the model to be deployed is deployed to multiple server clusters to obtain the deployment result of the model to be deployed; the output module is configured to output the deployment result by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.

此处需要说明的是,上述获取模块1002、分发模块1004、挂载模块1006、输出模块1008对应于实施例3中的步骤S702至步骤S708,四个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述模块也可以作为装置的一部分可以运行在实施例1提供的服务器10中。It should be noted that the acquisition module 1002, distribution module 1004, mounting module 1006, and output module 1008 correspond to steps S702 to S708 in Example 3, and the four modules and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned modules may also be part of the device and may be run in the server 10 provided in Example 1.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例7Example 7

根据本公开实施例,还提供了一种用于实施上述模型部署方法的模型部署系统,图11是根据本公开实施例7的一种模型部署系统的示意图,如图11所示,该系统1100包括:According to an embodiment of the present disclosure, a model deployment system for implementing the above-mentioned model deployment method is also provided. FIG. 11 is a schematic diagram of a model deployment system according to Embodiment 7 of the present disclosure. As shown in FIG. 11 , the system 1100 includes:

多个服务器集群1102,不同服务器集群在地理位置上部署在不同地域内;A plurality of server clusters 1102, where different server clusters are geographically deployed in different regions;

多个存储设备1104,不同存储设备在地理位置上部署在不同地域内;A plurality of storage devices 1104, where different storage devices are geographically deployed in different regions;

控制设备1106,与存储设备和服务器集群连接,用于将待部署模型的模型文件分发至存储设备,并将存储设备挂载至与存储设备部署在同一个地域的目标服务器集群 上,以使待部署模型部署至多个服务器集群。The control device 1106 is connected to the storage device and the server cluster, and is used to distribute the model files of the model to be deployed to the storage device, and mount the storage device to the target server cluster deployed in the same region as the storage device. To deploy the model to be deployed to multiple server clusters.

本公开上述实施例中,存储设备1104包括:网络附接存储,通过公网与控制设备连接。In the above embodiments of the present disclosure, the storage device 1104 includes: a network attached storage, which is connected to the control device through a public network.

本公开上述实施例中,目标服务器集群通过目标虚拟专有网络与存储设备连接。In the above embodiments of the present disclosure, the target server cluster is connected to the storage device via a target virtual private network.

本公开上述实施例中,该系统还包括:弹性调度集群,用于为目标服务器集群提供弹性伸缩能力。In the above embodiment of the present disclosure, the system further includes: an elastic scheduling cluster, which is used to provide elastic scaling capabilities for the target server cluster.

需要说明的是,本公开上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同,但不仅限于实施例1所提供的方案。It should be noted that the preferred implementation scheme involved in the above embodiments of the present disclosure is the same as the scheme provided in Example 1, as well as the application scenario and implementation process, but is not limited to the scheme provided in Example 1.

实施例8Example 8

本公开的实施例可以提供一种电子设备,该电子设备可以是电子设备群中的任意一个电子设备。可选地,在本实施例中,上述电子设备也可以替换为移动终端等终端设备。The embodiment of the present disclosure may provide an electronic device, which may be any electronic device in a group of electronic devices. Optionally, in this embodiment, the electronic device may also be replaced by a terminal device such as a mobile terminal.

可选地,在本实施例中,上述电子设备可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the electronic device may be located in at least one network device among a plurality of network devices of a computer network.

在本实施例中,上述电子设备可以执行模型部署方法中以下步骤的程序代码:获取待部署模型的模型文件,以及多个服务器集群,其中,不同服务器集群在地理位置上部署在不同地域内;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。In this embodiment, the above-mentioned electronic device can execute the program code of the following steps in the model deployment method: obtain the model file of the model to be deployed, and multiple server clusters, wherein different server clusters are geographically deployed in different regions; distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mount the storage device to the target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

可选地,图12是根据本公开实施例的一种计算机终端的结构框图。如图12所示,该计算机终端A可以包括:一个或多个(图中仅示出一个)处理器102、存储器104、存储控制器、以及外设接口,其中,外设接口与射频模块、音频模块和显示器连接。Optionally, Figure 12 is a block diagram of a computer terminal according to an embodiment of the present disclosure. As shown in Figure 12, the computer terminal A may include: one or more (only one is shown in the figure) processors 102, a memory 104, a storage controller, and a peripheral interface, wherein the peripheral interface is connected to a radio frequency module, an audio module, and a display.

其中,存储器可用于存储软件程序以及模块,如本公开实施例中的模型部署方法和装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的模型部署方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。Among them, the memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the model deployment method and device in the embodiment of the present disclosure. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the above-mentioned model deployment method is realized. The memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include a memory remotely arranged relative to the processor, and these remote memories can be connected to the terminal A via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。The processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: obtain the model file of the model to be deployed; distribute the model file to multiple storage devices, where different storage devices are geographically deployed in different regions; mount the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

可选的,上述处理器还可以执行如下步骤的程序代码:通过公共网络将模型文件 发送至多个网络附接存储,并存储在多个网络附接存储中。Optionally, the processor may also execute the following steps: Sent to and stored in a plurality of network attached storages.

可选的,上述处理器还可以执行如下步骤的程序代码:基于存储设备部署的目标地域,确定部署在目标地域的目标服务器集群;将存储设备挂载至目标服务器集群中的服务器上。Optionally, the processor may also execute program codes of the following steps: based on the target region where the storage device is deployed, determining a target server cluster deployed in the target region; and mounting the storage device to a server in the target server cluster.

可选的,上述处理器还可以执行如下步骤的程序代码:基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络;获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群,其中,不同服务器集群对应不同虚拟专有网络;在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。Optionally, the processor may also execute the program code of the following steps: based on the target region where the storage device is deployed, determine the target virtual private network corresponding to the storage device; obtain the server cluster corresponding to the target virtual private network to obtain the target server cluster, wherein different server clusters correspond to different virtual private networks; and when deploying the inference service of the model to be deployed, mount the storage device to the server in the target server cluster.

可选的,上述处理器还可以执行如下步骤的程序代码:构建弹性调度集群;基于弹性调度集群为目标服务器集群提供弹性伸缩能力。Optionally, the processor may also execute program codes of the following steps: constructing an elastic scheduling cluster; and providing elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.

可选的,上述处理器还可以执行如下步骤的程序代码:将预设资源分发至多个服务器集群;在预设资源分发完毕的情况下,将模型文件分发至多个存储设备。Optionally, the processor may also execute program codes of the following steps: distributing preset resources to multiple server clusters; and distributing the model file to multiple storage devices when the preset resources have been distributed.

可选的,上述处理器还可以执行如下步骤的程序代码:从中心仓库中获取待部署模型的模型文件,其中,模型文件预先上传至中心仓库。Optionally, the processor may also execute the program code of the following steps: obtaining a model file of the model to be deployed from the central warehouse, wherein the model file is uploaded to the central warehouse in advance.

处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;响应于接收到模型分发请求,将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群;响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。The processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: in response to receiving the model file of the model to be deployed, the model file is stored in the central warehouse; in response to receiving the model distribution request, the model file is distributed to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving the server deployment request, based on the server deployment request, the storage device is mounted on the target server cluster deployed in the same region as the storage device in the multiple server clusters, so that the model to be deployed is deployed to multiple server clusters.

处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:通过调用第一接口获取待部署模型的模型文件,其中,第一接口包括第一参数,第一参数的参数值为模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果;通过调用第二接口输出部署结果,其中,第二接口包括第二参数,第二参数的参数值为部署结果。The processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: obtain the model file of the model to be deployed by calling the first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distribute the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, mount the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain the deployment result of the model to be deployed; output the deployment result by calling the second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.

采用本公开实施例,提供了一种模型部署方法,包括:获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提 高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。The disclosed embodiment provides a model deployment method, including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device to multiple server clusters and a target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving The high model processing efficiency in different regions solves the technical problem of low efficiency of cross-regional deployment of models in related technologies.

本领域普通技术人员可以理解,图12所示的结构仅为示意,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(MobileInternetDevices,MID)、PAD等终端设备。图12其并不对上述电子装置的结构造成限定。例如,计算机终端A还可包括比图12中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图12所示不同的配置。It will be understood by those skilled in the art that the structure shown in FIG. 12 is for illustration only, and the computer terminal may also be a terminal device such as a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a PDA, and a mobile Internet device (Mobile Internet Devices, MID), PAD, etc. FIG. 12 does not limit the structure of the above-mentioned electronic device. For example, the computer terminal A may also include more or fewer components (such as a network interface, a display device, etc.) than those shown in FIG. 12, or may have a configuration different from that shown in FIG. 12.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by instructing the hardware related to the terminal device through a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.

实施例6Example 6

本公开的实施例还提供了一种计算机可读存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述实施例一所提供的模型部署方法所执行的程序代码。The embodiment of the present disclosure also provides a computer-readable storage medium. Optionally, in this embodiment, the storage medium can be used to store the program code executed by the model deployment method provided in the first embodiment.

可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。Optionally, in this embodiment, the above storage medium may be located in any computer terminal in a computer terminal group in a computer network, or in any mobile terminal in a mobile terminal group.

可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:获取待部署模型的模型文件,其中,不同服务器集群在地理位置上部署在不同地域内;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: obtaining a model file of a model to be deployed, wherein different server clusters are geographically deployed in different regions; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

可选地,上述存储介质还被设置为存储用于执行以下步骤的程序代码:通过公共网络将模型文件发送至多个网络附接存储,并存储在多个网络附接存储中。Optionally, the storage medium is further configured to store program codes for executing the following steps: sending the model file to a plurality of network attached storages via a public network, and storing the model file in the plurality of network attached storages.

可选地,上述存储介质还被设置为存储用于执行以下步骤的程序代码:基于存储设备部署的目标地域,确定存储设备对应的目标虚拟专有网络;获取目标虚拟专有网络对应的服务器集群,得到目标服务器集群,其中,不同服务器集群对应不同虚拟专有网络;在部署待部署模型的推理服务时,将存储设备挂载至目标服务器集群中的服务器上。Optionally, the storage medium is also configured to store program codes for executing the following steps: determining a target virtual private network corresponding to the storage device based on a target region for storage device deployment; obtaining a server cluster corresponding to the target virtual private network to obtain a target server cluster, wherein different server clusters correspond to different virtual private networks; and mounting the storage device on a server in the target server cluster when deploying an inference service for the model to be deployed.

可选地,上述存储介质还被设置为存储用于执行以下步骤的程序代码:构建弹性调度集群;基于弹性调度集群为目标服务器集群提供弹性伸缩能力。Optionally, the storage medium is further configured to store program codes for executing the following steps: constructing an elastic scheduling cluster; and providing elastic scaling capabilities for the target server cluster based on the elastic scheduling cluster.

可选地,上述存储介质还被设置为存储用于执行以下步骤的程序代码:将预设资源分发至多个服务器集群;在预设资源分发完毕的情况下,将模型文件分发至多个存储设备。Optionally, the storage medium is further configured to store program codes for executing the following steps: distributing preset resources to multiple server clusters; and distributing the model file to multiple storage devices when the preset resources have been distributed.

可选地,上述存储介质还被设置为存储用于执行以下步骤的程序代码:从中心仓 库中获取待部署模型的模型文件,其中,模型文件预先上传至中心仓库。Optionally, the storage medium is further configured to store program codes for executing the following steps: The model file of the model to be deployed is obtained from the library, where the model file is uploaded to the central warehouse in advance.

可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:响应于接收到待部署模型的模型文件,将模型文件存储至中心仓库;响应于接收到模型分发请求,将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内,不同地域内还部署有不同服务器集群;响应于接收到服务器部署请求,基于服务器部署请求,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群。Optionally, in this embodiment, the storage medium is configured to store program codes for executing the following steps: in response to receiving a model file of a model to be deployed, storing the model file in a central warehouse; in response to receiving a model distribution request, distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in different regions; in response to receiving a server deployment request, based on the server deployment request, mounting the storage device to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters.

可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:通过调用第一接口获取待部署模型的模型文件,其中,第一接口包括第一参数,第一参数的参数值为模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;在部署待部署模型的推理服务时,将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,得到待部署模型的部署结果;通过调用第二接口输出部署结果,其中,第二接口包括第二参数,第二参数的参数值为部署结果。Optionally, in this embodiment, the storage medium is configured to store program code for executing the following steps: obtaining a model file of the model to be deployed by calling a first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the model file; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; when deploying the inference service of the model to be deployed, mounting the storage device on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters to obtain a deployment result of the model to be deployed; outputting the deployment result by calling a second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the deployment result.

采用本公开实施例,提供了一种模型部署方法,包括:获取待部署模型的模型文件;将模型文件分发至多个存储设备,其中,不同存储设备在地理位置上部署在不同地域内;将存储设备挂载至多个服务器集群中与存储设备部署在同一个地域的目标服务器集群上,以使待部署模型部署至多个服务器集群,实现了提高模型在不同地域上的处理效率;容易注意到的是,可以将模型文件分发至部署在不同地域内的存储设备上,以便利用同一地域内存储设备和服务器集群之间的网络实现存储设备和服务器集群之间的高速互联,消除跨地域文件的传输,从而可以缩短模型文件的加载时间,从而提高模型在不同地域上的处理效率,进而解决了相关技术中模型跨地域部署的效率较低的技术问题。By adopting the embodiment of the present disclosure, a model deployment method is provided, including: obtaining a model file of a model to be deployed; distributing the model file to multiple storage devices, wherein different storage devices are geographically deployed in different regions; mounting the storage device to a target server cluster in multiple server clusters and deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters, thereby improving the processing efficiency of the model in different regions; it is easy to notice that the model file can be distributed to storage devices deployed in different regions, so as to utilize the network between the storage device and the server cluster in the same region to realize high-speed interconnection between the storage device and the server cluster, eliminate the transmission of cross-regional files, thereby shortening the loading time of the model file, thereby improving the processing efficiency of the model in different regions, and further solving the technical problem of low efficiency of cross-regional deployment of models in related technologies.

上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above-mentioned embodiments of the present disclosure are only for description and do not represent the advantages or disadvantages of the embodiments.

在本公开的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments of the present disclosure, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本公开所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in the present disclosure, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到 多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed to On multiple network units, some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present disclosure. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

以上所述仅是本公开的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本公开的保护范围。 The above is only a preferred embodiment of the present disclosure. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principle of the present disclosure. These improvements and modifications should also be regarded as the scope of protection of the present disclosure.

Claims (20)

一种模型部署方法,包括:A model deployment method, comprising: 获取待部署模型的模型文件;Get the model file of the model to be deployed; 将所述模型文件分发至多个存储设备,其中,不同所述存储设备在地理位置上部署在不同地域内;Distributing the model file to a plurality of storage devices, wherein different storage devices are geographically deployed in different regions; 在部署所述待部署模型的推理服务时,将所述存储设备挂载至多个服务器集群中与所述存储设备部署在同一个地域的目标服务器集群上,以使所述待部署模型部署至多个所述服务器集群。When deploying the inference service of the model to be deployed, the storage device is mounted on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters. 根据权利要求1所述的方法,其中,所述存储设备包括:网络附接存储,将所述模型文件分发至多个存储设备,包括:The method according to claim 1, wherein the storage device comprises: a network attached storage, and distributing the model file to a plurality of storage devices comprises: 通过公共网络将所述模型文件发送至多个所述网络附接存储,并存储在多个所述网络附接存储中。The model file is sent to the plurality of network attached storages through a public network and stored in the plurality of network attached storages. 根据权利要求1所述的方法,其中,在部署所述待部署模型的推理服务时,将所述存储设备挂载至多个服务器集群中与所述存储设备部署在同一个地域的目标服务器集群上,包括:The method according to claim 1, wherein, when deploying the inference service of the model to be deployed, mounting the storage device to a target server cluster in a plurality of server clusters that is deployed in the same region as the storage device comprises: 基于所述存储设备部署的目标地域,确定所述存储设备对应的目标虚拟专有网络;Determining a target virtual private network corresponding to the storage device based on the target region where the storage device is deployed; 获取所述目标虚拟专有网络对应的服务器集群,得到所述目标服务器集群,其中,不同所述服务器集群对应不同虚拟专有网络;Acquire a server cluster corresponding to the target virtual private network to obtain the target server cluster, wherein different server clusters correspond to different virtual private networks; 在部署所述待部署模型的推理服务时,将所述存储设备挂载至所述目标服务器集群中的服务器上。When deploying the inference service of the model to be deployed, the storage device is mounted to a server in the target server cluster. 根据权利要求1所述的方法,其中,在将所述存储设备挂载至多个服务器集群中与所述存储设备部署在同一个地域的目标服务器集群上之后,所述方法还包括:The method according to claim 1, wherein, after mounting the storage device to a target server cluster in a plurality of server clusters and deployed in the same region as the storage device, the method further comprises: 构建所述目标服务器集群的弹性调度集群;Building an elastic scheduling cluster of the target server cluster; 基于所述弹性调度集群,确定部署所述待部署模型所需的计算资源。Based on the elastic scheduling cluster, computing resources required to deploy the to-be-deployed model are determined. 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises: 将预设资源分发至多个所述服务器集群;Distributing the preset resources to the plurality of server clusters; 将所述模型文件分发至多个存储设备。The model file is distributed to multiple storage devices. 根据权利要求5所述的方法,其中,将所述模型文件分发至多个存储设备,包括:The method according to claim 5, wherein distributing the model file to multiple storage devices comprises: 在所述预设资源分发完毕的情况下,将所述模型文件分发至多个所述存储设备。When the preset resources are distributed, the model file is distributed to the plurality of storage devices. 根据权利要求1所述的方法,其中,获取待部署模型的模型文件,包括: The method according to claim 1, wherein obtaining a model file of the model to be deployed comprises: 从中心仓库中获取所述待部署模型的模型文件,其中,所述模型文件预先上传至所述中心仓库。The model file of the model to be deployed is obtained from the central warehouse, wherein the model file is pre-uploaded to the central warehouse. 根据权利要求7所述的方法,其中,所述中心仓库可以为云对象存储。The method according to claim 7, wherein the central warehouse can be a cloud object storage. 根据权利要求1所述的方法,其中,所述待部署模型为大语言模型。The method according to claim 1, wherein the model to be deployed is a large language model. 一种模型部署方法,其中,包括:A model deployment method, comprising: 响应于接收到待部署模型的模型文件,将所述模型文件存储至中心仓库;In response to receiving a model file of a model to be deployed, storing the model file in a central repository; 响应于接收到模型分发请求,将所述中心仓库存储的所述模型文件分发至多个存储设备,其中,不同所述存储设备在地理位置上部署在不同地域内,所述不同地域内还部署有不同服务器集群;In response to receiving a model distribution request, distributing the model file stored in the central warehouse to a plurality of storage devices, wherein different storage devices are geographically deployed in different regions, and different server clusters are also deployed in the different regions; 响应于接收到服务器部署请求,基于所述服务器部署请求,将所述存储设备挂载至多个服务器集群中与所述存储设备部署在同一个地域的目标服务器集群上,以使所述待部署模型部署至多个所述服务器集群。In response to receiving a server deployment request, based on the server deployment request, the storage device is mounted to a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters. 根据权利要求10所述的方法,其中,所述存储设备包括:网络附接存储,将所述中心仓库存储的所述模型文件分发至多个存储设备,包括:The method according to claim 10, wherein the storage device comprises: a network attached storage, and distributing the model file stored in the central warehouse to a plurality of storage devices comprises: 通过公共网络将所述中心仓库存储的所述模型文件发送至多个所述网络附接存储,并存储在多个所述网络附接存储中。The model file stored in the central warehouse is sent to the plurality of network attached storages through a public network and stored in the plurality of network attached storages. 一种模型部署方法,其中,包括:A model deployment method, comprising: 通过调用第一接口获取待部署模型的模型文件,其中,所述第一接口包括第一参数,所述第一参数的参数值为所述模型文件;Acquire a model file of the model to be deployed by calling a first interface, wherein the first interface includes a first parameter, and a parameter value of the first parameter is the model file; 将所述模型文件分发至多个存储设备,其中,不同所述存储设备在地理位置上部署在不同地域内;Distributing the model file to a plurality of storage devices, wherein different storage devices are geographically deployed in different regions; 在部署所述待部署模型的推理服务时,将所述存储设备挂载至多个服务器集群中与所述存储设备部署在同一个地域的目标服务器集群上,以使所述待部署模型部署至多个所述服务器集群,得到所述待部署模型的部署结果;When deploying the inference service of the model to be deployed, the storage device is mounted on a target server cluster in multiple server clusters that is deployed in the same region as the storage device, so that the model to be deployed is deployed to the multiple server clusters, and a deployment result of the model to be deployed is obtained; 通过调用第二接口输出所述部署结果,其中,所述第二接口包括第二参数,所述第二参数的参数值为所述部署结果。The deployment result is output by calling a second interface, wherein the second interface includes a second parameter, and a parameter value of the second parameter is the deployment result. 根据权利要求12所述的方法,其中,所述存储设备包括:网络附接存储,将所述模型文件分发至多个存储设备,包括:The method according to claim 12, wherein the storage device comprises: a network attached storage, and distributing the model file to a plurality of storage devices comprises: 通过公共网络将所述模型文件发送至多个所述网络附接存储,并存储在多个所述网络附接存储中。The model file is sent to the plurality of network attached storages through a public network and stored in the plurality of network attached storages. 一种模型部署系统,其中,包括:A model deployment system, comprising: 多个服务器集群,不同所述服务器集群在地理位置上部署在不同地域内;A plurality of server clusters, wherein different server clusters are geographically deployed in different regions; 多个存储设备,不同所述存储设备在地理位置上部署在所述不同地域内;A plurality of storage devices, different storage devices being geographically deployed in different regions; 控制设备,与所述多个存储设备和所述多个服务器集群连接,用于将待部署 模型的模型文件分发至多个所述存储设备,并在部署所述待部署模型的推理服务时,将所述存储设备挂载至与所述存储设备部署在同一个地域的目标服务器集群上,以使所述待部署模型部署至多个所述服务器集群。A control device connected to the plurality of storage devices and the plurality of server clusters for storing The model file of the model is distributed to multiple storage devices, and when deploying the inference service of the model to be deployed, the storage device is mounted to a target server cluster deployed in the same region as the storage device, so that the model to be deployed is deployed to multiple server clusters. 根据权利要求14所述的系统,其中,所述存储设备包括:The system of claim 14, wherein the storage device comprises: 网络附接存储,通过公共网络与所述控制设备连接。The network attached storage is connected to the control device via a public network. 根据权利要求14所述的系统,其中,同一地域的服务器集群和存储设备通过虚拟专有网络连接。The system according to claim 14, wherein the server cluster and storage device in the same region are connected via a virtual private network. 根据权利要求14所述的系统,其中,所述目标服务器集群通过目标虚拟专有网络与所述存储设备连接。The system according to claim 14, wherein the target server cluster is connected to the storage device via a target virtual private network. 根据权利要求14所述的系统,其中,所述系统还包括:The system according to claim 14, wherein the system further comprises: 弹性调度集群,用于为所述目标服务器集群提供弹性伸缩能力。The elastic scheduling cluster is used to provide elastic scaling capabilities for the target server cluster. 一种电子设备,其中,包括:An electronic device, comprising: 存储器,存储有可执行程序;A memory storing an executable program; 处理器,用于运行所述程序,其中,所述程序运行时执行权利要求1至13中任意一项所述的方法。A processor, configured to run the program, wherein the program executes the method according to any one of claims 1 to 13 when running. 一种计算机可读存储介质,其中,所述计算机可读存储介质包括存储的可执行程序,其中,在所述可执行程序运行时控制所述计算机可读存储介质所在设备执行权利要求1至13中任意一项所述的方法。 A computer-readable storage medium, wherein the computer-readable storage medium includes a stored executable program, wherein when the executable program is run, the device where the computer-readable storage medium is located is controlled to execute the method described in any one of claims 1 to 13.
PCT/CN2024/108233 2023-08-02 2024-07-29 Model deployment method and system, and electronic device and computer-readable storage medium Pending WO2025026280A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310966926.5A CN117008836A (en) 2023-08-02 2023-08-02 Model deployment method, system, electronic device and computer readable storage medium
CN202310966926.5 2023-08-02

Publications (1)

Publication Number Publication Date
WO2025026280A1 true WO2025026280A1 (en) 2025-02-06

Family

ID=88565046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/108233 Pending WO2025026280A1 (en) 2023-08-02 2024-07-29 Model deployment method and system, and electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN117008836A (en)
WO (1) WO2025026280A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008836A (en) * 2023-08-02 2023-11-07 阿里巴巴(中国)有限公司 Model deployment method, system, electronic device and computer readable storage medium
CN118276878B (en) * 2024-06-03 2024-08-16 浙江大华技术股份有限公司 Multi-platform model deployment method, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051687A (en) * 2012-12-10 2013-04-17 浪潮(北京)电子信息产业有限公司 System and method for deploying application service to cloud-storage virtual machine
US20210281662A1 (en) * 2020-03-04 2021-09-09 Hewlett Packard Enterprise Development Lp Multiple model injection for a deployment cluster
CN114564450A (en) * 2022-03-04 2022-05-31 北京宇信科技集团股份有限公司 Processing method, device, system, medium and equipment of distributed file system
CN115695210A (en) * 2022-09-26 2023-02-03 北京金山云网络技术有限公司 Cloud server deployment method and device, electronic equipment and storage medium
CN115794424A (en) * 2023-02-13 2023-03-14 成都古河云科技有限公司 Method for accessing three-dimensional model through distributed architecture
CN117008836A (en) * 2023-08-02 2023-11-07 阿里巴巴(中国)有限公司 Model deployment method, system, electronic device and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395072B (en) * 2019-08-14 2024-12-27 北京三快在线科技有限公司 Model deployment method, device, storage medium and electronic device
CN114546580B (en) * 2020-11-24 2025-02-28 亚信科技(中国)有限公司 Cache deployment system, cache deployment method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051687A (en) * 2012-12-10 2013-04-17 浪潮(北京)电子信息产业有限公司 System and method for deploying application service to cloud-storage virtual machine
US20210281662A1 (en) * 2020-03-04 2021-09-09 Hewlett Packard Enterprise Development Lp Multiple model injection for a deployment cluster
CN114564450A (en) * 2022-03-04 2022-05-31 北京宇信科技集团股份有限公司 Processing method, device, system, medium and equipment of distributed file system
CN115695210A (en) * 2022-09-26 2023-02-03 北京金山云网络技术有限公司 Cloud server deployment method and device, electronic equipment and storage medium
CN115794424A (en) * 2023-02-13 2023-03-14 成都古河云科技有限公司 Method for accessing three-dimensional model through distributed architecture
CN117008836A (en) * 2023-08-02 2023-11-07 阿里巴巴(中国)有限公司 Model deployment method, system, electronic device and computer readable storage medium

Also Published As

Publication number Publication date
CN117008836A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
WO2025026280A1 (en) Model deployment method and system, and electronic device and computer-readable storage medium
CN111241195B (en) Database processing method, device, equipment and storage medium of distributed system
CN103713951A (en) Managing execution of programs by multiple computing systems
CN109040212A (en) Equipment access server cluster method, system, equipment and storage medium
US20240028415A1 (en) Instance deployment method and apparatus, cloud system, computing device, and storage medium
CN109145539A (en) A kind of right management method and electronic equipment of more programming projects
CN112165523B (en) Data downloading method and device
WO2015167587A1 (en) Determining application deployment recommendations
CN111857977B (en) Elastic expansion method, device, server and storage medium
CN105095103A (en) Storage device management method and device used for cloud environment
CN109802986A (en) Device management method, system, device and server
CN115037756A (en) Method for operating alliance chain network, alliance chain network and node equipment for alliance chain network
US10931750B1 (en) Selection from dedicated source volume pool for accelerated creation of block data volumes
CN113014608A (en) Flow distribution control method and device, electronic equipment and storage medium
CN112114968A (en) Recommendation method and device, electronic equipment and storage medium
CN113360689B (en) Image retrieval system, method, related device and computer program product
US10956442B1 (en) Dedicated source volume pool for accelerated creation of block data volumes from object data snapshots
CN111211998A (en) Elastically scalable resource allocation method, device and electronic device
US11720414B2 (en) Parallel execution controller for partitioned segments of a data model
CN114070847B (en) Method, device, equipment and storage medium for limiting current of server
CN115238006A (en) Retrieval data synchronization method, device, equipment and computer storage medium
CN119127469A (en) Computing resource allocation method, device, computer equipment, readable storage medium and program product
CN109525443B (en) processing method and device for distributed pre-acquisition communication link and computer equipment
CN110568996A (en) Local storage capacity expansion system based on device driver
CN114338124A (en) Management method and system of cloud password computing service, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24848235

Country of ref document: EP

Kind code of ref document: A1