Disclosure of Invention
The invention provides a data center network flow model training method supporting rapid adaptation of a scene, which aims to solve the problems that the existing flow generation method has high training cost when adapting to a new scene by using a model retraining method, and the existing flow generation method only generates network flow and does not consider additional content required by various downstream applications.
An embodiment of a first aspect of the present invention provides a data center network traffic model training method supporting rapid scene adaptation, including the following steps: acquiring network flow data in a current network flow scene; based on an input format standard and an output format standard of a target training layer of a network traffic large model to be updated, converting the network traffic data into input data and output data, wherein the network traffic large model to be updated is formed by a multi-layer neural network model Transformer; based on a preset LoRA (Low-Rank Adaptation Model, a learnable re-weighting model) model, training a Low-rank parameter matrix of the target training layer by using the input data and the output data, and updating the network traffic large model to be updated by using the trained Low-rank parameter matrix so as to quickly adapt to a new network traffic scene through the updated network traffic large model.
Optionally, after updating the network traffic large model to be updated by using the trained low-rank parameter matrix, the method further comprises: acquiring a downstream task requirement of a downstream model of the current network traffic scene and a network message sequence output by the updated network traffic large model; and constructing a downstream task adaptation model based on the downstream task demands and the network message sequence, and connecting the updated network traffic large model and the downstream model based on the downstream task adaptation model so as to enable the updated network traffic large model to generate traffic data adapting to the downstream task demands.
Optionally, the input data of the downstream task adaptation model is the network message sequence, and the output data of the downstream task adaptation model is determined by the downstream task demand.
Optionally, the target training layer is the last layer of the network traffic heavy model to be updated.
Optionally, the converting the network traffic data into the input data and the output data based on the input format standard and the output format standard of the target training layer of the network traffic large model to be updated includes: acquiring the current task of the network flow large model to be updated; and converting the network traffic data into input data and output data according to the current task based on the input format standard and the output format standard.
An embodiment of a second aspect of the present invention provides a data center network traffic model training device supporting rapid scene adaptation, including: the acquisition module is used for acquiring network flow data in the current network flow scene; the conversion module is used for converting the network traffic data into input data and output data based on the input format standard and the output format standard of a target training layer of the network traffic large model to be updated, wherein the network traffic large model to be updated is formed by a multi-layer neural network model Transformer; the training module is used for training the low-rank parameter matrix of the target training layer by utilizing the input data and the output data based on a preset LoRA model, and updating the network traffic large model to be updated by utilizing the trained low-rank parameter matrix so as to quickly adapt to a new network traffic scene through the updated network traffic large model.
Optionally, after updating the network traffic large model to be updated with the trained low rank parameter matrix, the training module is further configured to: acquiring a downstream task requirement of a downstream model of the current network traffic scene and a network message sequence output by the updated network traffic large model; and constructing a downstream task adaptation model based on the downstream task demands and the network message sequence, and connecting the updated network traffic large model and the downstream model based on the downstream task adaptation model so as to enable the updated network traffic large model to generate traffic data adapting to the downstream task demands.
Optionally, the input data of the downstream task adaptation model is the network message sequence, and the output data of the downstream task adaptation model is determined by the downstream task demand.
Optionally, the target training layer is the last layer of the network traffic heavy model to be updated.
Optionally, the conversion module is further configured to: acquiring the current task of the network flow large model to be updated; and converting the network traffic data into input data and output data according to the current task based on the input format standard and the output format standard.
An embodiment of a third aspect of the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to realize the training method of the data center network traffic model supporting rapid scene adaptation according to the embodiment.
An embodiment of a fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program for execution by a processor for implementing a data center network traffic model training method supporting fast adaptation of a scenario as described in the above embodiment.
In the above embodiment, network traffic data in a current network traffic scenario is acquired, the network traffic data is converted into input data and output data based on an input format standard and an output format standard of a target training layer of a network traffic large model to be updated, a low rank parameter matrix of the target training layer is trained by using the input data and the output data based on a preset LoRA model, and the network traffic large model to be updated is updated by using the trained low rank parameter matrix, so that the updated network traffic large model is quickly adapted to a new network traffic scenario. Therefore, the problem that the existing flow generation method is large in training expenditure when the model retraining method is used for adapting to a new scene is solved, the existing flow generation method only generates network flow, the problem of extra content required by various downstream applications is not considered, retraining of the whole model is avoided, retraining time of the model is shortened on the basis of guaranteeing the fidelity of generated flow, rapid adaptation of the new scene is further achieved, the network flow large model can generate flow data which is accurately adapted to a downstream task, and performance of the generated flow on the downstream task is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The following describes a data center network traffic model training method supporting scene rapid adaptation according to an embodiment of the present invention with reference to the accompanying drawings. Aiming at the problems that the existing flow generation method mentioned in the background art is large in training expenditure when a model retraining method is used for adapting to a new scene, and the existing flow generation method only generates network flow and does not consider additional content required by various downstream applications, the invention provides a data center network flow model training method for supporting rapid adaptation of the scene. Therefore, the problem that the existing flow generation method is large in training expenditure when the model retraining method is used for adapting to a new scene is solved, the existing flow generation method only generates network flow, the problem of extra content required by various downstream applications is not considered, retraining of the whole model is avoided, retraining time of the model is shortened on the basis of guaranteeing the fidelity of generated flow, rapid adaptation of the new scene is further achieved, the network flow large model can generate flow data which is accurately adapted to a downstream task, and performance of the generated flow on the downstream task is improved.
Specifically, fig. 1 is a schematic flow chart of a training method for supporting a data center network traffic model with rapid scene adaptation according to an embodiment of the present invention.
As shown in fig. 1, the training method for supporting the data center network traffic model of the scene rapid adaptation includes the following steps:
in step S101, network traffic data in a current network traffic scenario is acquired.
It should be understood that the current network traffic scenario refers to a new network scenario. For example, a large model for performing a task of generating network traffic may be a model obtained by collecting traffic in a certain network and training, when collecting network traffic from another network and want to make it perform network traffic generation, that is, change a new network traffic scenario, then the training set is collected in a previous network and the obtained model performs poorly in the new network traffic scenario, so the problem to be solved by the embodiment of the present invention is how to train the large model of network traffic quickly when changing the new network traffic scenario.
In step S102, the network traffic data is converted into input data and output data based on the input format standard and the output format standard of the target training layer of the network traffic large model to be updated, wherein the network traffic large model to be updated is composed of a multi-layer neural network model Transformer.
Wherein in some embodiments, the target training layer is the last layer of the network traffic heavy model to be updated.
Optionally, in some embodiments, converting the network traffic data into the input data and the output data based on the input format standard and the output format standard of the target training layer of the network traffic heavy model to be updated includes: acquiring a current task of a network flow large model to be updated; based on the input format standard and the output format standard, the network traffic data is converted into input data and output data according to the current task.
It should be understood that, in the network traffic heavy model to be updated formed by the transformers of the multi-layer neural network model, the transformers of the last layer of the network traffic heavy model to be updated can influence the traffic generation result most. Therefore, in the embodiment of the invention, after obtaining the network traffic data in the current network traffic scene, the whole model is not retrained, but the network traffic data is converted into the input data and the output data conforming to the last layer (namely the target training layer) through preprocessing based on the input format standard and the output format standard of the target training layer of the network traffic large model to be updated, and only the last layer is trained, and particularly as shown in fig. 2, the effect similar to retrained whole model is realized on the premise of greatly saving hardware resources and training time through the balance strategy.
Specifically, in fig. 2, a tree model similar to a transverse one refers to a network traffic large model. Where GTT refers to one module implemented by a transducer-based. Since different industry large models may differ slightly in their structure, it is considered here that GTT is a transducer. L1 to LN represent N layers of the network traffic heavy model to be updated, and the input of the former layer is used as the output of the latter layer for training. The LoRA model is a training method provided by the embodiment of the invention, namely, only the L N layer (namely, the target training layer) is extracted, the output of the L N layer is the output of the whole model, then the output does not need additional processing, the input is determined according to the current task of the whole model, and then the input and output data of the target training layer are based on the input and output data of the target training layer for retraining. In particular, its input data is statistical information of some traffic, for example, how many packets and bytes there are every 10 nanoseconds, and its output format standard of output data is how many packets and bytes there are 1 nanosecond.
In step S103, based on the preset LoRA model, the low-rank parameter matrix of the target training layer is trained by using the input data and the output data, and the network traffic large model to be updated is updated by using the trained low-rank parameter matrix, so that the updated network traffic large model is quickly adapted to the new network traffic scene.
It should be understood that the embodiment of the invention uses the preset LoRA model to enable the network traffic heavy model to be updated to quickly adapt to a new scene. This is inspired by LoRA fine tuning the large language model, loRA is core of training the low rank parameter matrix of the whole model with a small amount of data from the new scene in the retraining stage, after training is completed, the retrained parameters are injected into the original model and replace the corresponding parameters, and the embodiment of the invention trains the low rank parameter matrix of the network traffic large model using the same method.
Specifically, the embodiment of the invention trains the low-rank parameter matrix of the target training layer of the network traffic large model to be updated by utilizing input data and output data based on a preset LoRA model, updates the low-rank parameter matrix before training by utilizing the trained low-rank parameter matrix, and updates the network traffic large model to be updated based on the trained low-rank parameter matrix so as to quickly adapt to a new network traffic scene according to the updated network traffic large model, and realizes the fine tuning effect of the network traffic large model with lower expenditure.
Optionally, in some embodiments, after updating the network traffic large model to be updated with the trained low rank parameter matrix, further comprises: acquiring a downstream task demand of a downstream model of a current network flow scene and a network message sequence output by an updated network flow large model; and constructing a downstream task adaptation model based on the downstream task demands and the network message sequence, and connecting the updated network traffic large model and the downstream model based on the downstream task adaptation model so as to enable the updated network traffic large model to generate traffic data adapting to the downstream task demands.
In some embodiments, the input data of the downstream task adaptation model is a network message sequence, and the output data of the downstream task adaptation model is determined by the downstream task requirements.
In order to cope with various downstream applications in a new network traffic scenario, an embodiment of the present invention further proposes an adaptation model that satisfies traffic characteristics of the downstream application, using a transducer that is widely known in the industry and that obtains an SOTA (state-of-the-art) effect in the general field as a base model, where an input of the downstream task adaptation model is an output of a network traffic heavy model (i.e. a network message sequence), and an output of the downstream task adaptation model is determined according to a requirement of a downstream task, for example, when the downstream application needs traffic with a certain numerical label, the output of the downstream task adaptation model is determined as a message sequence carrying the label, and see fig. 3 in particular.
In addition, in order to generate traffic to achieve better effect on downstream applications, the downstream task adaptation model is not directly trained as a single model, but is connected with the network traffic big model and the downstream applications to achieve end-to-end training, so that the updated network traffic big model generates traffic data adapting to the demands of downstream tasks, for example, the network traffic big model outputs a coarse-grained message (Coar-GRAINED PACKETS) only including the sequence number, the timestamp and the length of the message, but when the downstream tasks need packet header information and even some tags (PACKET TRACE), the downstream task adaptation model can generate downstream task tag fields, for example, the downstream task adaptation model uses a transducer to generate packet headers and some tags. By this method, the quality of the generated flow can no longer be judged only by the evaluation index of some distribution conditions of the flow.
In the embodiment of the invention, for the downstream task adaptation model, a user can directly use a pre-trained generator or use the model after fine adjustment according to own flow data. Use method 1: and using the flow super-resolution model and generating a data packet sequence, converting constraint conditions of the data packet into ACL (Access Control List ) rules by a user, and inputting the ACL rules into an inlet of a flow converter to obtain the flow. Use method 2: fine tuning and then using: the user can collect ACL rules and corresponding traffic in the network environment, call the interface provided by the embodiment of the invention, respectively substitute the ACL rules and the traffic into the rule and traffic encoder, train the similarity matrix (fine tuning process) which is more in line with the scene of the user, and then give ACL rules and generate traffic in the subsequent use.
According to the data center network flow model training method supporting rapid scene adaptation provided by the embodiment of the invention, network flow data in a current network flow scene is obtained, the network flow data are converted into input data and output data based on the input format standard and the output format standard of the target training layer of the network flow large model to be updated, the low-rank parameter matrix of the target training layer is trained by using the input data and the output data based on the preset LoRA model, and the network flow large model to be updated is updated by using the trained low-rank parameter matrix, so that rapid adaptation to a new network flow scene is realized by using the updated network flow large model. Therefore, the problem that the existing flow generation method is large in training cost when the model retraining method is used for adapting to a new scene is solved, the existing flow generation method only generates network flow and does not consider additional content required by various downstream applications, retraining of the whole model is avoided, retraining time of the model is shortened on the basis of guaranteeing flow fidelity generation, rapid adaptation of the new scene is further achieved, a method for generating fields required by a downstream task through a general transformation is achieved, a network flow large model and a downstream task model are connected, and performance of the generated flow on the downstream task is improved through an end-to-end training method.
The data center network flow model training device supporting rapid scene adaptation according to the embodiment of the invention is described with reference to the accompanying drawings.
Fig. 4 is a block diagram of a data center network traffic model training device supporting rapid adaptation of scenarios in accordance with an embodiment of the present invention.
As shown in fig. 4, the data center network traffic model training device 10 supporting rapid adaptation of a scenario includes: an acquisition module 100, a conversion module 200, and a training module 300.
The acquiring module 100 is configured to acquire network traffic data in a current network traffic scenario; the conversion module 200 is configured to convert the network traffic data into input data and output data based on an input format standard and an output format standard of a target training layer of the network traffic large model to be updated, where the network traffic large model to be updated is formed by a multi-layer neural network model transform; the training module 300 is configured to train a low-rank parameter matrix of a target training layer by using input data and output data based on a preset LoRA model, and update a network traffic large model to be updated by using the trained low-rank parameter matrix, so as to quickly adapt to a new network traffic scene by using the updated network traffic large model.
Optionally, in some embodiments, after updating the network traffic big model to be updated with the trained low rank parameter matrix, the training module 300 is further configured to: acquiring a downstream task demand of a downstream model of a current network flow scene and a network message sequence output by an updated network flow large model; and constructing a downstream task adaptation model based on the downstream task demands and the network message sequence, and connecting the updated network traffic large model and the downstream model based on the downstream task adaptation model so as to enable the updated network traffic large model to generate traffic data adapting to the downstream task demands.
Optionally, in some embodiments, the input data of the downstream task adaptation model is a network message sequence, and the output data of the downstream task adaptation model is determined by the downstream task requirements.
Optionally, in some embodiments, the target training layer is a last layer of the network traffic heavy model to be updated.
Optionally, in some embodiments, the conversion module 200 is further configured to: acquiring a current task of a network flow large model to be updated; based on the input format standard and the output format standard, the network traffic data is converted into input data and output data according to the current task.
It should be noted that, the explanation of the foregoing embodiment of the method for training a data center network traffic model supporting rapid adaptation of a scene is also applicable to the device for training a data center network traffic model supporting rapid adaptation of a scene in this embodiment, which is not described herein again.
According to the data center network flow model training device supporting rapid scene adaptation provided by the embodiment of the invention, network flow data in a current network flow scene is acquired, the network flow data are converted into input data and output data based on the input format standard and the output format standard of the target training layer of the network flow large model to be updated, the low-rank parameter matrix of the target training layer is trained by using the input data and the output data based on the preset LoRA model, and the network flow large model to be updated is updated by using the trained low-rank parameter matrix, so that rapid adaptation to a new network flow scene is realized by using the updated network flow large model. Therefore, the problem that the existing flow generation method is large in training expenditure when the model retraining method is used for adapting to a new scene is solved, the existing flow generation method only generates network flow, the problem of extra content required by various downstream applications is not considered, retraining of the whole model is avoided, retraining time of the model is shortened on the basis of guaranteeing the generated flow fidelity, rapid adaptation of the new scene is further achieved, the network flow large model can generate flow data which is accurately adapted to the downstream task, and performance of the generated flow on the downstream task is improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device may include:
Memory 501, processor 502, and a computer program stored on memory 501 and executable on processor 502.
The processor 502 implements the data center network traffic model training method supporting rapid adaptation of the scenario provided in the above embodiment when executing the program.
Further, the electronic device further includes:
A communication interface 503 for communication between the memory 501 and the processor 502.
Memory 501 for storing a computer program executable on processor 502.
The memory 501 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 501, the processor 502, and the communication interface 503 are implemented independently, the communication interface 503, the memory 501, and the processor 502 may be connected to each other via a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.
The processor 502 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the invention.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, realizes the data center network traffic model training method supporting rapid scene adaptation.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.