[go: up one dir, main page]

WO2025199849A1 - Procédé de traitement de modèle, système, appareil et support de stockage - Google Patents

Procédé de traitement de modèle, système, appareil et support de stockage

Info

Publication number
WO2025199849A1
WO2025199849A1 PCT/CN2024/084318 CN2024084318W WO2025199849A1 WO 2025199849 A1 WO2025199849 A1 WO 2025199849A1 CN 2024084318 W CN2024084318 W CN 2024084318W WO 2025199849 A1 WO2025199849 A1 WO 2025199849A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
model
parameter
training
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/084318
Other languages
English (en)
Chinese (zh)
Inventor
祖春山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to PCT/CN2024/084318 priority Critical patent/WO2025199849A1/fr
Publication of WO2025199849A1 publication Critical patent/WO2025199849A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and in particular to a model processing method, system, device, and storage medium.
  • AI artificial intelligence
  • image recognition speech recognition
  • natural language processing images can be generated based on input natural language, which requires image generation models.
  • These models typically involve vast amounts of data and require powerful computing resources, posing computational and storage challenges.
  • the present disclosure provides a method for processing a model, wherein the method comprises:
  • the target model is initialized based on target data to be processed and processor resources included in a device running the target model, wherein the initialization includes at least a first initialization;
  • the first initialization includes: dividing at least one of the multiple layers, data within the layers, and the target data included in the target model into blocks on multiple processors, so that the target model runs in a distributed manner on the multiple processors.
  • scheduling a target module combination in the target model includes:
  • the target module combination is determined from a plurality of candidate module combinations.
  • the target model is used to output an image corresponding to the text data based on the input text data, and determining the target module combination from a plurality of candidate module combinations includes:
  • the target module combination is determined from the plurality of candidate module combinations.
  • determining the target module combination from a plurality of the candidate module combinations based on each of the candidate module combinations and the corresponding predicted image includes:
  • the candidate module combination and the quality score are used as training samples to train a predictor; wherein the predictor is used to search for the target module combination in a search space, wherein the search space includes multiple submodules included in each module of the target model;
  • obtaining the quality score of the predicted image includes:
  • the target module combination is obtained based on the sorting result output by the predictor.
  • the target model is used to output an image corresponding to the input text data based on the input text data, the target data is text data, and the target module combination is scheduled in the target model, including:
  • a target module combination corresponding to the changed prediction image is dispatched from the target model.
  • the training sample includes an image sample and text for describing the image sample, and detecting whether the target condition is met based on the difference between the previous round of training samples and the current round of training samples includes:
  • the operating efficiencies including lag time and/or throughput
  • the target object is segmented.
  • the method further includes:
  • nth round of training allocating training data to be trained to a plurality of nodes so that the nodes update parameters of the target model based on the training data;
  • Parameter synchronization is performed between the multiple nodes, and the parameter synchronization includes each node synchronizing the parameters updated in the nth round of training to the next hop node, and receiving the parameters synchronized by the previous hop node, until the updated parameters on the multiple nodes are consistent in the nth round of training.
  • the method further includes:
  • the performing parameter synchronization among the plurality of nodes includes:
  • the third parameter is sent to a fourth node, and the fourth parameter on the fourth node is sent to the third node.
  • a model processing system is also provided, wherein the system includes a mobile terminal and a server terminal;
  • the server is used to train the target model in each round of training or part of the round of training.
  • a target parameter to be fine-tuned is determined from a plurality of parameters configured in the target model, and a parameter value of the target parameter is adjusted based on a first parameter value of the target parameter updated before the current round and a second parameter value obtained in the current round;
  • the first operating efficiency is different from the second operating efficiency, and the first operating efficiency includes throughput and/or delay duration.
  • a model processing device is further provided, wherein the processing device includes:
  • a parameter updating module configured to, during the training of the target model, determine, in each round of training or a portion of the rounds of training, a portion of target parameters to be fine-tuned from a plurality of parameters configured for the target model, and adjust the parameter value of the target parameter based on a first parameter value of the target parameter updated before the current round and a second parameter value obtained in the current round; and/or,
  • an initialization module configured to initialize the target model during the inference process of the target model based on target data to be processed and processor resources included in a device running the target model, wherein the initialization includes at least a first initialization
  • the first initialization includes: dividing at least one of the multiple layers, data within the layers, and the target data included in the target model into blocks on multiple processors, so that the target model runs in a distributed manner on the multiple processors.
  • the fine-tuning can be performed based on the parameter values updated in previous training of the parameters, so that the knowledge learned by the target model in the previous training can be retained in subsequent rounds of training, avoiding the problem of weakening of the knowledge learned in the previous training, thereby improving the accuracy of the target model, and in each subsequent update, some parameters can be retained without adjustment, while some parameters can be fine-tuned, thereby reducing the amount of data for parameter fine-tuning, thereby reducing the occupation of computing resources.
  • the multiple layers, data within the layers, and at least one of the target data in the target model can be blocked according to processor resources and the amount of target data to be processed, and the blocked model data can be deployed on multiple processors, so that the target model can be run in a distributed manner on multiple processors.
  • the target model can be decomposed into multiple processors for execution, which not only makes full use of processor resources, but also improves the operating efficiency of the target model when processing target data.
  • the present disclosure also discloses an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the model processing method as described above when executing the computer program.
  • the embodiment of the present disclosure further discloses a computer-readable storage medium, which stores a computer program that enables a processor to execute the model processing method described in the present disclosure.
  • FIG1 shows a flowchart of a method for processing a model according to an embodiment of the present disclosure
  • FIG2 shows a schematic diagram of the principle of performing a first initialization on the target model
  • FIG3 shows a schematic diagram of the principle of updating parameters of the target model
  • FIG4 shows a system architecture diagram of distributed training in an embodiment of the present disclosure
  • FIG5 shows a schematic diagram of a communication link structure between multiple nodes in an embodiment of the present disclosure
  • FIG6 is a schematic diagram showing a process of performing parameter synchronization between the four nodes in FIG4 ;
  • FIG7 is a schematic diagram showing a flow chart of steps for performing synchronous remediation in an embodiment of the present disclosure
  • FIG8 shows a schematic flow chart of another parameter synchronization step in an embodiment of the present disclosure
  • FIG9 shows a schematic structural diagram of the Stable Diffusion model according to an embodiment of the present disclosure
  • FIG10 is a schematic diagram showing a process of scheduling a target module combination by a predictor in an embodiment of the present disclosure
  • FIG12 shows a detailed schematic diagram of a processing system for a model according to an embodiment of the present disclosure
  • This type of large model can generate images, videos, etc. that conform to the content described by the natural language based on the input natural language, such as Chinese text information. Therefore, it is also called a text-based graph model.
  • stable diffusion latent diffusion model
  • computing resources mainly include computing, storage, and communication resources.
  • the fine-tuning can be based on the parameter values updated in previous trainings of the parameters, so that the knowledge learned by the target model in the previous training can be retained in subsequent rounds of training, thereby improving the accuracy of the target model, and in each subsequent update, some parameters can be retained without adjustment, while some parameters can be fine-tuned, thereby reducing the parameter
  • the amount of data fine-tuned can reduce the use of computing resources.
  • at least one of the target model's multiple layers, the data within the layers, and the target data can be partitioned. This allows the target model to be split and run on multiple processors. This not only fully utilizes processor resources, reduces the storage and communication pressure on a single processor, but also improves model efficiency.
  • FIG1 shows a flowchart of a model processing method according to an embodiment of the present disclosure
  • FIG2 shows a schematic diagram of the principle of performing a first initialization on a target model
  • FIG3 shows a schematic diagram of the principle of performing parameter updating on a target model.
  • the model processing method according to this embodiment includes the following steps S101 and/or S102, wherein:
  • Step S101 During the training of the target model, in each round of training or a portion of the round of training, a portion of target parameters to be fine-tuned is determined from a plurality of parameters configured for the target model, and the parameter value of the target parameter is adjusted based on a first parameter value of the target parameter updated before the current round and a second parameter value obtained in the current round;
  • Step S102 during the inference process of the target model, the target model is initialized based on the target data to be processed and the processor resources included in the device running the target model, where the initialization includes at least a first initialization;
  • the first initialization includes: dividing at least one of the multiple layers, data within the layers, and target data included in the target model into blocks on multiple processors, so that the target model runs in a distributed manner on the multiple processors.
  • the parameter update described in step S101 can be performed during the training of the target model, or the initialization described in step S102 can be performed during the inference of the target model.
  • the parameter update described in step S101 can be performed during the training of the target model, and the initialization described in step S102 can be performed during the inference of the target model.
  • the process described in step S101 can be performed by the cloud, such as a server, and the process described in step S102 can be performed by a terminal, such as a mobile terminal, a personal computer, etc., or the process described in step S102 can also be performed by the server.
  • step S101 in each round of training after the first round of training, some target parameters to be fine-tuned can be determined from multiple parameters configured in the target model, and then the parameter value of the target parameter can be adjusted according to the first parameter value of the target parameter to be updated before the current round and the second parameter value obtained in the current round.
  • the parameter values of some target parameters in the target model can be fine-tuned starting from the i-th round of training after the start of the first round of training, where i is a positive integer greater than 2.
  • i is a positive integer greater than 2.
  • the entire training of the target model only some rounds of training will be fine-tuned. Fine-tune the parameter values of the target parameters.
  • the entire training of the target model includes 100 training rounds, of which only 40 rounds involve fine-tuning the target parameters, while the first 60 rounds of training are all performed according to the conventional parameter update strategy.
  • FIG. 4 shows a system architecture diagram of distributed training.
  • the distributed training is described as follows:
  • the training samples assigned to each GPU from the sample pool are different.
  • the ultimate goal of model training is to train a model together on each GPU. Therefore, in each round of training, the model on each GPU is updated with parameters.
  • the parameter updates are inconsistent.
  • data synchronization such as parameter sharing, is required between multiple GPUs to make the model on one GPU updated.
  • the updated parameters can be synchronized to all other GPUs.
  • the parameter obtained by GPU0 is parameter 0, which needs to be shared with GPU1-GPU3. In this way, after a round of training is completed, the parameter values of the updated parameters of the models on each GPU are consistent, so as to enter the next round of training.
  • a central node synchronization approach can be used. This involves selecting a node from among the multiple GPUs as the central node, with all other nodes communicating with it and sending their updated parameters to it, which then synchronizes the data with the other nodes.
  • This approach places a heavy burden on the central node, and as the number of model parameters and data volumes increase, the central node's communication load increases exponentially, leading to a gradual slowdown in communication efficiency, or even a breakdown, and significant communication delays.
  • a strategy for flattened communication between multiple nodes allows multiple nodes to autonomously synchronize parameters according to certain rules.
  • training data can be distributed to multiple nodes, allowing the nodes to update the parameters of the target model based on the training data.
  • parameter synchronization is performed between the multiple nodes.
  • Parameter synchronization involves each node synchronizing the parameters updated in the nth round of training to the next hop node, until the updated parameters on multiple nodes are consistent during the nth round of training.
  • a ring communication chain can be formed between multiple nodes.
  • FIG5 a schematic diagram of the communication link structure between multiple nodes is shown.
  • four nodes are included, namely GPU0-GPU3.
  • the four nodes communicate in sequence to form a ring communication chain.
  • the training samples can be divided and different training samples can be assigned to each node for training. For example, training samples numbered 11 to 20 are assigned to GPU0, and training samples numbered 21 to 30 are assigned to GPU1.
  • Each node can input the 10 assigned training samples into the target model to obtain predicted data of the target data output, thereby constructing a loss function based on the predicted data and the target data to update the parameters in the target model on the node.
  • the nodes begin parameter synchronization. This synchronization is performed based on the communication links between nodes. For example, nodes can be sorted by their connections. Because a ring-shaped communication chain forms between nodes, a node has both a previous hop and a next hop. During parameter synchronization, a node can synchronize updated parameters to its next hop node and receive parameters synchronized from its previous hop node.
  • parameter synchronization between multiple nodes may include multiple synchronization cycles.
  • a synchronization cycle refers to a circle of multiple nodes communicating in a circle, that is, multiple nodes participate in a synchronization, which is regarded as a synchronization cycle.
  • the number of synchronization cycles can be the same as the number of shards of the node parameters.
  • the multiple parameters included in the target model can be sharded to obtain multiple data slices.
  • each node can synchronize one of the data slices. For example, as shown in Figure 6, if each node shards the parameters into four data slices, it will take eight synchronization cycles to synchronize all parameters, ensuring that the updated parameters of the target model on each node remain consistent.
  • GPU0 synchronizes data slices d0+d3 to the next-hop node GPU1
  • GPU1 synchronizes data slices a0+a1 to the next-hop node GPU2
  • GPU2 synchronizes data slices b1+b2 to the next-hop node GPU3
  • GPU3 synchronizes data slices c2+c3 to the next-hop node GPU0.
  • each node can synchronize the current data slice to the next-hop node. It should be noted that the parameters synchronized to the next-hop by the same node in different synchronization cycles are different, thereby ensuring that the parameters between multiple nodes can remain consistent after multiple synchronizations.
  • synchronization remediation can be performed on the nodes with synchronization anomalies.
  • the server can help the nodes with synchronization anomalies synchronize parameters.
  • FIG7 a schematic flow chart of the steps for synchronization remediation is shown. As shown in FIG7 , the steps may include the following steps:
  • the parameters on the first node are not synchronized to the second node, which is the next hop of the first node, in at least one synchronization cycle.
  • the first node with synchronization anomaly can be detected by a central node among multiple nodes, or by a server, or by a third-party node.
  • the first node with synchronization anomaly can be determined to be abnormal.
  • the first node times out and does not respond, thereby causing the previous hop node to The next-hop node failed to successfully synchronize data with the first node.
  • the first node also failed to successfully synchronize parameters with the next-hop node due to a communication failure with the first node.
  • a node times out receiving parameters from the next-hop node and fails to send parameters to the next-hop node, it can be determined to be the first node experiencing a synchronization anomaly.
  • the first node can be temporarily removed from the ring communication chain, and a ring communication chain consisting of multiple nodes other than the first node can continue parameter synchronization.
  • a ring communication chain can be formed between GPU1-GPU3, and the node connection order between GPU1-GPU3 can remain unchanged.
  • the synchronization of the first node can be suspended in the cycle where the synchronization exception occurs. For example, if the first node has a synchronization exception in the jth synchronization cycle, the first node can be temporarily removed from the ring communication chain in the j+1th synchronization cycle, and the ring communication chain composed of multiple nodes except the first node can continue to synchronize parameters.
  • Step S202 Acquire the second parameter synchronized by the second node in parameter synchronization and the current first parameter of the first node.
  • the current first parameter on the first node can be obtained. Then, when the remaining second nodes complete parameter synchronization, since the parameters on multiple second nodes are consistent, the second parameter synchronized to any second node during parameter synchronization and the current first parameter of the first node can be obtained.
  • Step S203 During the (n+1)th round of training, parameters are shared in the first node and the second node based on the first parameter and the second parameter, so that the parameters on the first node are consistent with the parameters on the second node.
  • the parameters can be shared in the first node and the second node based on the first parameter and the second parameter before the start of the n+1 round of training.
  • the first node may have an abnormality in any synchronization cycle
  • the current first parameter on the first node may include some parameters synchronized from other nodes. Therefore, the first parameter and the second parameter can be compared first, and the overlapping parameters in the first parameter and the second parameter can be eliminated.
  • a sub-first parameter is obtained; after removing the parameters that overlap with the second parameter from the second parameter, a sub-second parameter is obtained. Then, the sub-first parameter is sent to multiple second nodes, and the sub-second parameter is sent to the first node, thereby ensuring that before the start of the (n+1)th round of training, multiple nodes will not have parameter inconsistency problems due to synchronization abnormalities of the first node.
  • Step S301 Obtain the communication load corresponding to each of the multiple nodes.
  • Step S304 Send the third parameter to the fourth node, and send the fourth parameter on the fourth node to the third node.
  • a central node among multiple nodes can detect the corresponding nodes of the multiple nodes.
  • the communication load of multiple nodes can be detected by a server, or by a third-party node.
  • the communication load can refer to the ratio of the input and output data of a node per unit time to the capacity of the node. The larger the ratio, the greater the communication load, and the smaller the ratio, the smaller the communication load currently borne by the node.
  • operator A in a stage may have operators A1, A2, and A3 during training, and the parameter values (i.e., weights) of these operators may differ.
  • each stage may include multiple operators or parameter settings for multiple operators.
  • candidate module combination A there are 100 candidate module combinations serving as training samples, referred to as candidate module combination A.
  • Candidate module combination A is input into the predictor and used as the initial solution.
  • the predictor searches the search space for more candidate module combinations based on this initial solution, such as more candidate module combinations B, and ranks each candidate module combination by quality score.
  • the quality score ranking includes the predicted quality scores corresponding to candidate module combination A and the predicted quality scores corresponding to candidate module combination B.
  • the predictor parameters are updated based on the quality scores corresponding to candidate module combination A in the training samples and the predicted quality scores corresponding to candidate module combination A. This makes the quality score ranking predicted by the predictor more accurate.
  • the sampling parameters can be set so that the predictor continues to search for more candidate module combinations, thereby continuing to sort them according to the predicted quality scores. In this way, training is continuously carried out.
  • the predictor can accurately predict the quality of the predicted image output by each module combination, so that the training can be stopped, and the candidate module combinations output by the predictor when the training is stopped are sorted, and the target module combination is screened out.
  • the target module combination can be the candidate module combination with the highest predicted quality score.
  • the search space of the anticipator is the multiple modules included in the target model and the multiple sub-modules included in each module. It should be noted that in this search space, the multiple modules and the multiple sub-modules included in each module can be used to construct a topological graph according to the execution order of the multiple modules, such as constructing a graph network, with the module as the root node in the graph network and the sub-module as the leaf node of the root node, so that candidate module combinations can be searched in the graph network.
  • the quality score may represent at least one of the clarity, contrast, and fidelity of the predicted image. Specifically, when determining the quality score, any one of the clarity, contrast, and fidelity may be directly used as the quality score. Alternatively, when determining the quality score based on at least two of the clarity, contrast, and fidelity of the predicted image, at least two of the clarity, contrast, and fidelity may be normalized, and the sum of the normalized values may be used as the quality score.
  • the target is the quality score of the predicted image. Therefore, in a further example, to fully leverage the test results of the target model during testing, the quality score of the predicted image can represent the degree of match between the predicted image and the text data, that is, the realism of the predicted image. For example, if the text data is "A girl at sunset, desert style," and the target model needs to output a predicted image based on this text data, the quality score can represent the degree of match between the content of the predicted image and the content described in the text data.
  • the difference between the predicted image and the image sample corresponding to the text data may be obtained, and the quality score may be obtained based on the difference.
  • the difference between the predicted image and the image sample can be obtained by calculating the similarity between the predicted image and the image sample.
  • the similarity can be the cosine distance between the predicted image and the image sample in the image space. The higher the similarity, the higher the matching degree between the predicted image and the text data. Conversely, the lower the similarity, the lower the matching degree between the predicted image and the text data.
  • the target data is text data.
  • a target module combination with higher image quality can be searched based on the quality of the image to be generated by the text data.
  • a desired target module combination can be flexibly selected from the target model based on the quality of the image to be generated.
  • a user has requirements for both fidelity and resolution of the generated image, they will need to schedule a target module combination from the target model that produces a high-fidelity predicted image and matches the required resolution. However, if the resolution of the user's predicted image changes, the previously searched target module combination will no longer be applicable. Therefore, a new target module combination can be scheduled from the target model.
  • the process for rescheduling the target module combination can be similar to the process described above and will not be repeated here.
  • a target module combination corresponding to the changed prediction image may be dispatched from the target model.
  • the change in the resolution of the image data may include a change from low to high resolution, or a change from high to low resolution.
  • the target model can output images of various resolutions. However, due to the change in resolution, the fidelity of the output predicted image will change accordingly. Therefore, it is necessary to re-dispatch the target module combination from the target model.
  • a candidate module combination corresponding to a predicted image with the same resolution as the changed image data can be obtained, and the quality score corresponding to the candidate module combination and the predicted image can be used as a training sample.
  • the training process can refer to the training process of the above-mentioned predictor, and will not be described in detail here.
  • the target model needs to be decomposed, wherein the target model can be decomposed regardless of whether the target model belongs to the target category or not.
  • the decomposition process can be as follows:
  • the above-mentioned different splitting methods may refer to different splitting methods.
  • the target model is also decomposed according to the number of 4 processors, then in one splitting method, the multiple layers in the target model may be split in an equal manner, while in another splitting method, the multiple layers in the target model may be split in an unequal manner.
  • the target data is also split according to the number of 4 processors, then in one splitting method, the target data may be split according to a first preset field length, while in another splitting method, the target data may be split according to a second preset field length, where the first preset field length is different from the second preset field length.
  • the first segmentation result may be included, and multiple second segmentation results may also be included.
  • multiple first segmentation results and multiple second segmentation results may be randomly combined to obtain multiple segmentation results under this segmentation.
  • Each segmentation result includes a first segmentation result and a second segmentation result that matches the first segmentation result.
  • the target object to be split when randomly splitting at least one of the multiple layers, data within the layers, and target data included in the target model based on the number of processors, can be determined from the layers, data within the layers, and target data included in the target model based on the number of processors and the amount of target data, and then the target object can be split.
  • the number of processors and the amount of target data can be used as a basis to comprehensively determine whether to segment the target model or the target data.
  • the target object when the number of processors is small and the amount of target data is small, the target object can be the target model; when the number of processors is small and the amount of target data is large, the target object can be the target data; when the number of processors is large and the amount of target data is small, the target object can be the target model. target model; when the number of processors is large and the amount of target data is large, the target object can be the target data and the target model.
  • the target segmentation result can be scheduled from multiple segmentation results according to the lag time.
  • the segmentation result with the shortest lag time can be used as the target segmentation result.
  • S12 Divide the training samples into batches and distribute them to multiple nodes for distributed training.
  • the text data sample can be input into the target model to obtain the predicted image output by the target model.
  • a loss function is constructed between the image sample and the target predicted image to update the parameters of the target model.
  • S131 Sharding the parameters of the target model to obtain multiple shard parameters.
  • the number of shards can be determined according to the number of nodes.
  • Synchronization may include multiple synchronization cycles.
  • the node may send the sharding parameters required for synchronization in this synchronization cycle to the next hop node connected to the node, and obtain the sharding parameters of the previous hop node connected to the node in this synchronization cycle.
  • the sharding parameters that are synchronized periodically are cycled until all nodes have synchronized all sharding parameters and the parameters on each node are consistent.
  • S21 Schedule the target module combination from the target model.
  • the entire workflow consists of three steps: (1) training data preparation (2) predictor training (3) predictor-based evolutionary search.
  • sampling distribution of the schedule is intended to generate sufficiently diverse sample quality scores, that is, to include candidate module combinations with different quality scores.
  • the more diverse the quality scores the higher the accuracy of the predictor.
  • the server may be configured to determine, during training of the target model, in each round of training or a portion of the rounds of training, a target parameter to be fine-tuned from a plurality of parameters configured for the target model, and adjust the parameter value of the target parameter based on a first parameter value of the target parameter updated before the current round and a second parameter value obtained in the current round;
  • the target model After the target model training is completed, the target model can be deployed online. It can be deployed to the server for terminal calls, or deployed to the mobile terminal for mobile users to use offline. use.
  • target models are deployed in the cloud, such as the text-graph model, the viewpoint conversion model, and the 2D-to-3D model.
  • the cloud can communicate with the mobile terminal through the HTTP protocol to obtain text data input by the user on the mobile terminal through human-computer interaction operations; then, the text-graph model is used to generate an image corresponding to the text data.
  • the viewpoint conversion model can convert the predicted image output by the text-graph model from 1 viewpoint to multiple viewpoints, such as 49 viewpoints; the 2D-to-3D model can convert the multi-viewpoint image output by the viewpoint conversion model into a 3D image.
  • FIG13 shows a schematic structural diagram of the processing device.
  • the processing device may specifically include the following modules:
  • a parameter updating module configured to, during the training of the target model, determine, in each round of training or a portion of the rounds of training, a portion of target parameters to be fine-tuned from a plurality of parameters configured for the target model, and adjust the parameter value of the target parameter based on a first parameter value of the target parameter updated before the current round and a second parameter value obtained in the current round; and/or,
  • an initialization module configured to initialize the target model during the inference process of the target model based on target data to be processed and processor resources included in a device running the target model, wherein the initialization includes at least a first initialization
  • references herein to "one embodiment,” “an embodiment,” or “one or more embodiments” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Furthermore, please note that instances of the phrase “in one embodiment” do not necessarily all refer to the same embodiment.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in the claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the present disclosure may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by the same item of hardware.
  • the use of the words first, second, and third etc. does not indicate any order. These words may be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de traitement d'exécution de modèle, un système, un appareil et un support, appartenant au domaine technique de l'intelligence artificielle et visant à optimiser l'efficacité d'entraînement et l'efficacité d'inférence de modèles. Le procédé comprend les étapes suivantes : pendant le processus d'entraînement d'un modèle cible, dans chaque cycle d'entraînement ou dans certains cycles d'entraînement, déterminer parmi une pluralité de paramètres configurés pour le modèle cible certains paramètres cibles à ajuster finement et, sur la base de premières valeurs de paramètre des paramètres cibles mis à jour avant le cycle actuel et de secondes valeurs de paramètre de ceux-ci obtenues dans le cycle actuel, ajuster des valeurs de paramètre des paramètres cibles ; et/ou pendant un processus d'inférence du modèle cible, sur la base de données cibles à traiter et de ressources de processeur comprises dans un dispositif exécutant le modèle cible, initialiser le modèle cible, l'initialisation comprenant au moins une première initialisation, la première initialisation comprenant le partitionnement sur une pluralité de processeurs des données cibles et/ou de multiples couches comprises dans le modèle cible et/ou de données intracouche, de façon à mettre en œuvre une exécution distribuée du modèle cible dans la pluralité de processeurs.
PCT/CN2024/084318 2024-03-28 2024-03-28 Procédé de traitement de modèle, système, appareil et support de stockage Pending WO2025199849A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/084318 WO2025199849A1 (fr) 2024-03-28 2024-03-28 Procédé de traitement de modèle, système, appareil et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/084318 WO2025199849A1 (fr) 2024-03-28 2024-03-28 Procédé de traitement de modèle, système, appareil et support de stockage

Publications (1)

Publication Number Publication Date
WO2025199849A1 true WO2025199849A1 (fr) 2025-10-02

Family

ID=97216183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/084318 Pending WO2025199849A1 (fr) 2024-03-28 2024-03-28 Procédé de traitement de modèle, système, appareil et support de stockage

Country Status (1)

Country Link
WO (1) WO2025199849A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114035937A (zh) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 一种基于人工智能的分布式训练和推理方法、系统、设备和可读存储介质
US20220300769A1 (en) * 2021-03-19 2022-09-22 Arizona Board Of Regents On Behalf Of Arizona State University Systems, methods, and apparatuses for actively and continually fine-tuning convolutional neural networks to reduce annotation requirements
CN117688386A (zh) * 2023-12-12 2024-03-12 摩尔线程智能科技(北京)有限责任公司 大模型的参数调整方法、装置、电子设备和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220300769A1 (en) * 2021-03-19 2022-09-22 Arizona Board Of Regents On Behalf Of Arizona State University Systems, methods, and apparatuses for actively and continually fine-tuning convolutional neural networks to reduce annotation requirements
CN114035937A (zh) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 一种基于人工智能的分布式训练和推理方法、系统、设备和可读存储介质
CN117688386A (zh) * 2023-12-12 2024-03-12 摩尔线程智能科技(北京)有限责任公司 大模型的参数调整方法、装置、电子设备和存储介质

Similar Documents

Publication Publication Date Title
CN113064879B (zh) 数据库参数调整方法、装置及计算机可读存储介质
CN111339433B (zh) 基于人工智能的信息推荐方法、装置、电子设备
CN107688493B (zh) 训练深度神经网络的方法、装置及系统
US10706363B2 (en) Data recommendation method and device, and storage medium
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
CN117494816B (zh) 基于计算单元部署的模型推理方法、装置、设备及介质
CN114818510B (zh) 一种基于全局模型优化的联邦学习方法、装置和电子设备
US11144179B2 (en) Next user interaction prediction
WO2025167876A1 (fr) Procédé et appareil d'entraînement de modèle de reconnaissance de catégorie d'objet, et procédé et appareil de reconnaissance de catégorie d'objet
CN118673424B (zh) 一种基于云计算和深度学习的跨境电商商品归类方法
CN117193965A (zh) 任务处理方法、问答处理方法以及分布式系统
CN118865394A (zh) 一种基于多目标优化的文生图大模型优化方法及系统
CN118278534A (zh) 一种生成模型的方法及装置
WO2023222185A1 (fr) Évaluation d'un modèle d'apprentissage automatique de domaine cible pour déploiement
US20240249136A1 (en) Method of solving combinatorial optimization problems using machine learning models adapted from a different set of input problems, and related system and devices
CN120147335A (zh) 基于智能计算中心算力微调图像分割大模型的方法及装置
WO2025199849A1 (fr) Procédé de traitement de modèle, système, appareil et support de stockage
CN118070873B (zh) 一种基于迁移学习的边缘数字孪生体部署方法
CN119721178A (zh) 边缘计算环境中部署混合专家模型的优化方法及相关设备
CN115115024B (zh) 一种多目标学习方法、系统及存储介质和终端设备
CN114510351B (zh) 超大规模分布式机器学习装置
CN114124973B (zh) 一种面向多云场景的镜像同步方法和装置
CN117809095A (zh) 一种图像分类方法、装置、设备和计算机可读存储介质
Kong et al. Deep Reinforcement Learning-based Service Function Chains (SFCs) Deployment
CN112988724A (zh) 一种工作效率数据获取方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24932554

Country of ref document: EP

Kind code of ref document: A1