CN120258167A - AI generation model training, text generation method and electronic device - Google Patents
AI generation model training, text generation method and electronic device Download PDFInfo
- Publication number
- CN120258167A CN120258167A CN202510168186.XA CN202510168186A CN120258167A CN 120258167 A CN120258167 A CN 120258167A CN 202510168186 A CN202510168186 A CN 202510168186A CN 120258167 A CN120258167 A CN 120258167A
- Authority
- CN
- China
- Prior art keywords
- task
- control
- lora modules
- gating function
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the application discloses an AI generation model training method, a text generation method and electronic equipment, wherein the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model; determining task identifiers corresponding to the control task information, inputting the task identifiers to a gating function, determining weight vectors corresponding to the LoRA modules respectively through the gating function, weighting the output information of the LoRA modules through the weight vectors, participating in a text generation process, and updating parameters of the gating function and LoRA modules according to a preset loss function. According to the embodiment of the application, the model training cost can be reduced, and the control effect can be improved.
Description
Technical Field
The application relates to the technical field of AI generation, in particular to an AI generation model training method, a text generation method and electronic equipment.
Background
The goal of multi-aspect controllable text generation is to generate text that can satisfy multiple predefined attributes or features simultaneously. For example, multiple aspects of emotion tendencies, subject contents, style of the text and the like can be controlled simultaneously, so that the generated text contents more meet specific requirements and application scenes.
In the multi-aspect controllable text generation scheme in the prior art, the generation is mainly based on a feature extraction model + Wen Shengwen model (text-to-text generation model), specifically, an input text can be input into a feature extraction model to extract text features from the feature extraction model, then attributes (emotion, theme and the like) to be controlled are fused into the features generated by the model, and the text content is generated in the text input model.
Although this approach enables a somewhat versatile text generation, it requires a significant amount of training data to train in order for the feature extraction model described above to gain the ability to extract features for each attribute, which is costly. In addition, the control force in different control aspects cannot be realized in the mode, and the control effect is difficult to ensure.
Disclosure of Invention
The application provides an AI generation model training method, a text generation method and electronic equipment, which can reduce model training cost and improve control effect.
The application provides the following scheme:
An AI generation model training method, the AI generation model being composed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model, the method comprising:
Determining input information of the AI generation model according to a preset training data set, wherein the input information is a text instruction containing control task information;
Determining task identifiers corresponding to the control task information, and inputting the task identifiers to a gating function so as to determine weight vectors respectively corresponding to the plurality LoRA of modules through the gating function, wherein the task identifiers are identifiers distributed according to a predefined plurality of control tasks, the control tasks comprise unilateral control tasks or multi-aspect control tasks, and the multi-aspect control tasks are tasks for controlling a text generation process from a plurality of aspects;
And weighting the output information of the LoRA modules through the weight vector, participating in the text generation process, and updating parameters of the gating function and the LoRA modules according to a preset loss function.
The gating function and the plurality LoRA of modules are subjected to parameter updating of a plurality of iteration rounds, so that the gating function obtains the capability of generating proper weight vectors for the plurality LoRA of modules according to the input task identifier, and the plurality LoRA of modules obtains the capability of controlling the text generation process of the pre-trained generation model according to the control task information.
The pre-trained generation model comprises a plurality of basic construction units, and each basic construction unit integrates the plurality of LoRA modules respectively.
Wherein the basic building unit comprises a self-attention layer, a layer normalization and a position-aware feedforward layer, wherein the plurality LoRA of modules are integrated in the pre-trained generation model in the form of bypasses of the position-aware feedforward layer;
The basic construction unit processes the input sequence through self-attention layer and layer normalization, inputs the processed input sequence into the plurality LoRA of modules, processes the output information of the plurality LoRA of modules through the weight vector, combines the processed output information with the output information of the basic construction unit, and inputs the processed output information into the next basic construction unit.
The loss function comprises a loss function for balancing differences caused by data maldistribution in different control aspects.
Wherein the loss function comprises a loss function for balancing distances between a plurality of different control attributes comprised by the same control aspect.
A text generation method by using an AI generation model, wherein the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model, the gating function and the plurality of LoRA modules are trained by using a preset training data set, the task identifier is a unique identifier distributed according to a predefined plurality of control tasks, the predefined plurality of control tasks comprise unilateral control tasks or multi-aspect control tasks, the multi-aspect control tasks are tasks for controlling a text generation process from a plurality of aspects at the same time, and the method comprises the following steps:
determining input information of the AI generation model, wherein the input information is a text instruction containing a target control task;
Determining a target task identifier corresponding to the target control task, and inputting the target task identifier to a gating function so as to generate a plurality of weight vectors for the target task identifier through the gating function, wherein the plurality of weight vectors respectively correspond to the plurality LoRA of modules;
And after the weight vectors are used for carrying out weighting processing on the output information of the plurality LoRA of modules, controlling the text generation process of the pre-trained generation model so as to generate target text content.
The target control task comprises a combined task formed by a plurality of predefined unilateral control tasks;
the determining the target task identifier corresponding to the target control task includes:
Determining task identifiers respectively corresponding to a plurality of unilateral control tasks included in the combined task;
Inputting task identifiers respectively corresponding to the unilateral control tasks into the gating function so that the gating function generates a plurality of groups of weight vectors, wherein the plurality of groups of weight vectors correspond to the task identifiers of the unilateral control tasks, and a plurality of weight vectors in the same group are used for corresponding to the plurality of LoRA modules;
The weighting processing of the output information of the plurality LoRA of modules by the weight vector includes:
and carrying out fusion processing on the multiple groups of weight vectors, and carrying out weighting processing on the output information of the multiple LoRA modules by utilizing the fused weight vectors.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.
An electronic device, comprising:
one or more processors, and
A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding claims.
A computer program product comprising computer program/computer executable instructions which, when executed by a processor in an electronic device, implement the steps of any one of the methods of the preceding claims.
According to the specific embodiment provided by the application, the application discloses the following technical effects:
Through the embodiment of the application, firstly, the network structure of the AI generation model is improved, a gating function and a plurality of low-rank decomposition LoRA modules can be integrated on the basis of the pre-trained generation model, and in addition, a plurality of types of control tasks can be predefined and corresponding task identifiers can be respectively defined. Specifically, during the training process, firstly, input information of the AI generation model can be determined according to the training data set, and the LoRA module generates output information. On the other hand, a task identifier corresponding to the current input information can be determined, a group of weight vectors are generated by a gating function and respectively correspond to each LoRA module, the weight vectors are used for carrying out weighting processing on the output information of each LoRA module, and then the text generation process of the main generation model is controlled. And then, according to a preset loss function, updating parameters of the gating function and the plurality of LoRA modules. In this way, only each LoRA module and the gating function participate in training in the specific training process, and parameters of the pre-trained basic generation model can be kept unchanged, so that training cost can be reduced. In addition, as a plurality of LoRA modules are added instead of a single LoRA module, corresponding weight vectors can be generated for each LoRA module through a gating function, so that each LoRA module can have different control forces when controlling different types of control tasks, namely, the contribution degree of a plurality of LoRA modules when generating texts in the aspect of controlling the targets can be determined, better control effects can be obtained, and the generation quality of text contents can be improved.
Of course, it is not necessary for any one product to practice the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
FIG. 2 is a flow chart of a first method provided by an embodiment of the present application;
FIG. 3 is a flow chart of a second method provided by an embodiment of the present application;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.
In the embodiment of the application, in order to reduce training cost and improve the control effect of text generation in the process of generating the controllable text in multiple aspects, a corresponding solution is provided. In this approach, a plurality LoRA (Low-Rank Adaptation of Large Language Models, low-rank decomposition of large language models) modules may first be integrated in a pre-trained text generation model, each LoRA module participating in the control process of text generation. In addition, the proposal combines a gating function and a routing strategy to effectively manage the influence of each LoRA module in the process of generating the controllable text in multiple aspects. This design allows for dynamic adjustment of control strength across multiple aspects without requiring extensive retraining or model repetition.
In order to facilitate understanding of the solution provided by the embodiments of the present application, concepts such as LoRA, low-rank matrix, etc. will be briefly described below.
The "rank" of a matrix refers to the maximum number of linearly independent rows or columns in the matrix. Briefly, rank describes the abundance of information in a matrix. In low rank matrices, this number is much smaller than the number of rows or columns of the matrix. In other words, a low rank matrix may be considered a matrix that contains a large amount of redundant information or highly correlated data. For example, in image processing, a picture can be considered a matrix, which is typically low rank because the pixels in the picture are highly spatially correlated.
Low rank matrix decomposition aims at decomposing a high-dimensional, possibly noisy matrix into products of two or more low rank matrices. This decomposition reveals the inherent structure of the data so that the data can be understood and manipulated in a more simplified form. Mathematically, if the original matrix is M, it is desirable to find two low rank matrices a and B such that m≡a×b, a process called "low rank matrix decomposition". While this approximation reduces the complexity of the data while losing as little information as possible. Of course, the importance of low rank matrix factorization is not only its practical value, but also that it provides an intuitive and powerful way of understanding the inherent structure of data. By the method, high-dimensional data can be effectively processed and analyzed, key features are extracted, and a more complex analysis and prediction model is built on the basis.
In AI generation models such as large language models, the parameter amount is often very huge, and the cost for updating model parameters is very high in the process of training the AI generation models due to the limitation of hardware resource memory and the like. For example, assume a large language model with 7B (70 billion) parameters, the parameters of which are represented by a weight matrix W. During back propagation, the model needs to learn a Δw matrix in order to update the original weights with minimal loss function values. If the weight matrix W contains 7B parameters, the weight update matrix Δw also contains 7B parameters, and computing the matrix Δw is very computationally and memory intensive.
LoRA has been developed for the above case. LoRA is a tuning technique for large language models aimed at reducing the number of parameters required for tuning by low-rank decomposition while maintaining the performance of the model. LoRA by adding a bypass beside the pre-training model, simulating the updating quantity of parameters by low-rank decomposition (firstly dimension reduction and then dimension increase), fixing the parameters of the pre-training model in the training process, and only updating the dimension reduction matrix A and the dimension increase matrix B. After training, multiplying the matrix B with the matrix A, and combining the matrix B with the pre-training model parameters to form fine-tuned model parameters. That is, the decomposition of ΔW means that two smaller LoRA matrices A and B are needed to represent the larger matrix ΔW. If the number of rows of a is the same as the number of rows of aw, the number of columns of B is the same as the number of columns of aw, the above decomposition can be noted as aw=ab, where AB is the result of the matrix multiplication between matrices a and B.
In this scheme of low rank decomposition by LoRA, the amount of memory actually saved also needs to depend on the rank r of matrix A, B, where rank r is a super-parameter. For example, if ΔW has 10,000 rows and 20,000 columns, 200,000,000 parameters are stored. If a and B with r=8 are selected, a has 10,000 rows and 8 columns, B has 8 rows and 20,000 columns, i.e., 10,000×8+8×20,000=240,000 parameters, about 830 times less than 200,000,000 parameters.
In summary LoRA is based on the inherent low rank nature of large language models to model full parameter fine tuning by adding bypass matrices. In particular, a bypass can be added beside the original pre-trained large language model, and the intrinsic rank is simulated through the operation of dimension reduction and dimension increase. Then, the matrix A can be initialized by using random Gaussian distribution and the like, the matrix B can be initialized by using a zero matrix, the parameters of the pre-trained large predictive model are fixed during training, and only the parameters in the matrix A and the matrix B are trained. After training, multiplying the matrix B with the matrix A, and merging the multiplied matrix B with the pre-training model parameters to form the trimmed large language model parameters.
Due to the low rank decomposition characteristics of LoRA, applications are being made in a number of specific scenarios. The embodiment of the application also realizes the reduction of training cost by introducing LoRA modules. However, in the embodiment of the present application, due to the multiple aspects of control, a manner of adding a plurality of LoRA modules is adopted, and weight vectors can be generated for a plurality of LoRA modules by using a gating function, where when the control is performed in different control aspects, the weight vectors generated by the specific gating function are different, so that the difference in contribution degree of each LoRA module when the control is performed in different control aspects is reflected, and different control forces are adopted for different control aspects, so that the control effect is improved.
To achieve the above objective, first, on a model structure, as shown in fig. 1, a gating function and a plurality of LoRA modules may be integrated on the basis of a pre-trained generative model. In particular implementations, the pre-trained generation model may be a model of a transducer structure (or other structure as well), where the model of such a structure typically includes a plurality of basic building blocks (e.g., transformer Block in the transducer structure), where the plurality of LoRA modules may be integrated separately for each basic building block. More particularly, transformer Block may include a self-attention layer, layer normalization, and position-aware feedforward layer, etc., wherein a particular plurality of LoRA modules may be integrated in the pre-trained generation model in the form of bypasses of the position-aware feedforward layer. In the training process, after specific training data are acquired, specific control instructions can be input into the model, wherein the control instructions can exist in the form of texts, and text descriptions of control task information are included, for example, the specific control instructions can be 'please generate text of happy emotion', and the like. The control instructions may be input into the pre-trained generative model described above, and processed sequentially through a plurality Transformer Block. Wherein at each Transformer Block, the input sequence may be processed through self-attention layer and layer normalization, and input into a plurality of LoRA modules, and accordingly, each LoRA module may generate respective output information.
On the other hand, a plurality of different types of control tasks may be predefined. In the embodiment of the present application, the predefined control task type may include a unilateral task and may further include a multifaceted task. The unilateral task is a task of controlling text generation from a certain aspect alone, for example, emotion, theme, length, keyword, "toxicity removal" and the like may correspond to different aspects, and each aspect may correspond to a control task. The multi-aspect task refers to a task of controlling the text generation process from multiple aspects at the same time, for example, controlling from two aspects of emotion and theme at the same time, and specific control instructions may be "please generate text with happy emotion related to running," wherein "sport" belongs to theme aspect, "happy" belongs to emotion aspect, and so on. After defining the plurality of control tasks, a task identifier may also be defined for each different type of control task, which task identifier has uniqueness, i.e. one control task corresponds to one task identifier.
After defining the various types of control tasks and assigning different task identifiers, corresponding training data may be obtained. Specifically, the training dataset in the embodiment of the present application may include a plurality of control instructions, and corresponding target text (for supervising the training process). The control command may cover the aforementioned various types of control tasks, for example, the training data includes a plurality of control commands controlled from emotion, a plurality of control commands controlled from theme, a plurality of control commands controlled from keyword, a plurality of control commands controlled from emotion+theme, and so on. That is, each control instruction in the training data may be respectively corresponding to task identifier information.
Specifically, when training a model, a control instruction in training data may be input into an AI generation model provided in an embodiment of the present application, in each Transformer Block, an input sequence may be first processed through self-attention layer and layer normalization, and then input into a plurality of LoRA modules, where each LoRA module may generate output information respectively. Meanwhile, a task identifier corresponding to the control task information of the current control instruction may be determined, as shown in fig. 1, and the task identifier may be input to a gating function, so as to determine weight vectors corresponding to the plurality LoRA of modules respectively through the gating function. Then, the output information of each LoRA module can be weighted by the weight vector, and then combined with the output information of Transformer Block to obtain the input information of the next Transformer Block, and the like until the processing of the next Transformer Block is completed, so as to obtain the output information of the whole AI generation model, namely the final text generation result. Of course, in the initial stage of training, the weight vector generated by the gating function and the output information of each LoRA module may not be suitable, and the parameters of the gating function and LoRA module need to be gradually adjusted in the training process, so that the gating function gradually learns what weight vector combination is specifically needed to be generated for each task identifier, and the LoRA module also learns how to output information for a specific control task.
After training, if text generation is required, specific control tasks can be expressed in specific input information, corresponding task identifiers corresponding to the control tasks can be obtained, the task identifiers are input into a gating function, a set of weight vectors can be obtained, the weight vectors are used for carrying out weighting processing on the output information of each LoRA module, and a specific basic generation model is finally controlled to complete the text generation tasks.
It should be noted here that, in theory, any two or more unilateral control tasks may be combined into a multi-aspect control task, but in practical application, not every combination mode may be commonly used, so, in order to reduce training cost, model training may be performed only for various unilateral control tasks and one or more commonly used multi-aspect tasks. For example, in practical applications, specific control tasks may include five unilateral control tasks such as emotion, theme, length, keyword, "toxicity removal" and the like, and a multi-aspect control task such as "emotion+theme". The above six control tasks may each correspond to a task identifier. In this way, in the stage of reasoning by using the model after training, if the control task included in the input control instruction belongs to one of the predefined, the task identifier corresponding to the predefined control task may be directly input to the gating function to generate the weight vector, thereby completing the generation of the specific text. For example, if control is required from the emotion aspect, or control is required from both the emotion and the theme aspects, the corresponding task identifier may be input to obtain a specific weight vector, and then the output information of each LoRA module is weighted. However, if the control task included in the specifically inputted control command is a combined task formed by the above-described predefined plurality of unilateral control tasks, for example, control is required from both emotion and keyword, in this case, since the control task of the type "emotion+keyword" does not exist in the predefined various control tasks, and the task identifier corresponding to the task does not exist, the weight vector corresponding to the control task of "emotion+keyword" cannot be obtained by directly inputting the task identifier. At this time, task identifiers corresponding to a plurality of unilateral control tasks may be input to the gating function, so that the gating function generates a plurality of groups of weight vectors, where the plurality of groups of weight vectors correspond to the task identifiers of the unilateral control tasks, and a plurality of weight vectors in a same group are used to correspond to the plurality of LoRA modules. For example, in the foregoing example, task identifiers of two control tasks, namely emotion and keyword, may be respectively input into a gating function to generate two sets of weight vectors, and then the two sets of weight vectors may be subjected to fusion and other processing to obtain a set of weight vectors, and then the output information of the plurality of LoRA modules is weighted by using the set of weight vectors to control generation of text content.
In summary, the embodiment of the application improves the network structure of the AI generation model, specifically, adds a gating function and a plurality of LoRA modules on the basis of the pre-trained generation model, defines a plurality of control tasks, respectively designates corresponding task identifiers, and then generates a group of weight vectors for the specific task identifiers through the gating function by combining the model structure, so as to control the control intensity of each LoRA module on the current control task.
The following describes a scheme of the embodiment of the present application specifically requiring protection.
Example 1
First, an embodiment of the present application provides a training method for an AI generation model, where the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model, as described above, and referring to fig. 2, the method may include:
S201, according to a preset training data set, determining input information of the AI generation model, wherein the input information is a text instruction containing control task information.
The training data set may include a plurality of control instructions and corresponding target text generation results, where the control instructions may specifically exist in text form, and may include description information about the control task, for example, specifically "please generate text with happy emotion," and so on. In the specific training process of the model, the control instruction in the training data can be used as input information to be input into an AI generation model, the AI generation model can perform text generation, and then model parameters are optimized through multiple rounds of iteration, so that the actually generated text content gradually approaches to a target text generation result in the training data.
In particular, in the embodiment of the present application, since a plurality of different types of control tasks may be predefined, the training data set may include a plurality of different types of control instructions, which respectively correspond to the plurality of different types of control tasks, so that the model learns how to generate weight vectors for the various different control tasks during the training process, and so on.
S202, determining a task identifier corresponding to the control task information, and inputting the task identifier to a gating function so as to determine weight vectors respectively corresponding to the plurality LoRA of modules through the gating function, wherein the task identifier is a unique identifier distributed according to a predefined plurality of control tasks, the control tasks comprise unilateral control tasks or multi-aspect control tasks, and the multi-aspect control tasks are tasks for controlling a text generation process from a plurality of aspects.
When the input information of the model is determined, a task identifier corresponding to the specific control task information included in the input information can also be determined. In a specific implementation manner, task identifier information may be added to various control instructions in advance in training data, so that when one of the control instructions is input into the AI generation model, a task identifier corresponding to the control task information included in the control instruction may be directly acquired. Or in another mode, a task identification module can be added in the AI generation model to be used for carrying out natural language understanding or keyword identification on the input control instruction so as to determine the task identifier of the control task information contained in the current input information.
And S203, weighting the output information of the plurality of LoRA modules through the weight vector, participating in the text generation process, and updating parameters of the gating function and the plurality of LoRA modules according to a preset loss function.
After determining the task identifier, the task identifier may be input to a gating function, which outputs a set of weight vectors, the number of weight vectors being the same as the number of LoRA modules. That is, assuming that there are 8 LoRA modules in total, the gating function may generate 8 weight vectors, which are used to perform weighting processing on the output information of each LoRA module, and then control the generation process of the basic generation model, so that different LoRA modules may have different control forces. Of course, during training, especially in the initial stage of training, the weight vector generated by the gating function may be inappropriate, and multiple parameter updating optimization may be performed on the gating function with multiple iterations. After multiple iterative updates, the gating function can learn how to generate the appropriate weight vector for the specific task identifier. Meanwhile, the parameters of the plurality LoRA of modules can be updated, and after a plurality of rounds of iterative updating, the plurality LoRA of modules can obtain the capability of controlling the text generation process of the pre-trained generation model according to the control task information.
In particular implementations, as previously described, a plurality of base building blocks, e.g., transformer Block, may be included in a particular pre-trained generative model, where the plurality of LoRA modules may be separately integrated in each base building block. That is, assuming a total of M Transformer Block, each Transformer Block may be augmented with one bypass, with N LoRA modules in each bypass. The weight vectors output by the gating function may have only one set, that is, N weight vectors, and the N LoRA modules corresponding to each Transformer Block may perform the weighting processing on the output information through the N weight vectors.
In a specific implementation, as shown in fig. 1, a specific basic building unit such as Transformer Block may include a self-attention layer, a layer normalization, a position-aware feedforward layer, and the like, where the plurality of LoRA modules may be integrated in a pre-trained generation model in a bypass form of the position-aware feedforward layer. Thus, in each Transformer Block, after the input sequence is processed through the self-attention layer and the layer normalization, the input sequence is input into the plurality of LoRA modules, after each LoRA module generates output information, the output information of the plurality of LoRA modules is weighted by the plurality of weight vectors generated by the gating function, and then combined with the output information of Transformer Block, and input into the next Transformer Block. And so on until the final result is obtained.
The gating function may include an embedding layer and a linear layer, and each LoRA module may be composed of a pair of low-rank matrices, and the input sequence is projected to the low-dimensional space through a first low-rank matrix, and then the task-specific features are efficiently captured by mapping the second low-rank matrix back to the original dimension.
In the specific training process, constraints can be further applied to hidden states containing attributes at the top layer and other positions of the Transformer Block and other basic building units, so that the training process is completed. In the embodiment of the application, the training process can be constrained in multiple aspects, namely, multiple training targets can be achieved. First, the most basic training target may be the original next token (which refers to the smallest unit into which the text data is partitioned before or during processing by the model, which may be a word, punctuation, sub-word (subword), etc., depending on the vocabulary of the model and word segmentation strategy) to predict the loss for maintaining the basic generative capabilities of the language model, and in particular, the model output may be aligned with the target text by calculating the autoregressive loss.
In addition, since the training data may have a problem of uneven data distribution for various different types of control tasks, for example, there are a lot of control instructions for emotion types and theme types, and a lot of control instructions for keywords, length and the like. Therefore, task adaptation loss can also be increased, and expression of multi-aspect semantics can be enhanced by reducing the distribution center differences of different data sources. That is, the specific loss function may also include a loss function for balancing the differences in different control aspects due to data maldistribution. In particular implementations, this loss can be achieved by calculating the Euclidean distance between the hidden state distribution centers of the different aspects.
Furthermore, since the same type of control task may include a plurality of different task attributes, that is, specific control tasks may correspond to different attribute spaces. For example, emotional control tasks may include, among other things, various attributes such as happiness, anger, vitality, agitation, calm, etc. A specific loss function may also include a loss function for balancing the distance between a plurality of different control attributes contained in the same control aspect.
In particular, the above-mentioned loss function for balancing the distances between a plurality of different control attributes included in the same control aspect may be further subdivided into two parts, namely an attribute exclusion loss and an attribute gap loss. Wherein, the attribute exclusion loss avoids attribute overlapping by calculating the distance between different attributes in the same type of control task, namely, the distance between different attributes is far in the attribute space formed by the same type of control task so as to avoid mutual interference; the attribute gap loss can reduce the variation by calculating the distance between the hidden state and the distribution center in the attribute space, namely, the distance between different attributes cannot deviate too far in the attribute space formed by the same type of control task.
In summary, through the embodiment of the present application, the network structure of the AI generation model is improved first, a gating function and a plurality of low-rank decomposition LoRA modules can be integrated on the basis of the pre-trained generation model, and in addition, a plurality of types of control tasks can be predefined, and corresponding task identifiers can be defined respectively. Specifically, during the training process, firstly, input information of the AI generation model can be determined according to the training data set, and the LoRA module generates output information. On the other hand, a task identifier corresponding to the current input information can be determined, a group of weight vectors are generated by a gating function and respectively correspond to each LoRA module, the weight vectors are used for carrying out weighting processing on the output information of each LoRA module, and then the text generation process of the main generation model is controlled. And then, according to a preset loss function, updating parameters of the gating function and the plurality of LoRA modules. In this way, only each LoRA module and the gating function participate in training in the specific training process, and parameters of the pre-trained basic generation model can be kept unchanged, so that training cost can be reduced. In addition, as a plurality of LoRA modules are added instead of a single LoRA module, corresponding weight vectors can be generated for each LoRA module through a gating function, so that each LoRA module can have different control forces when controlling different types of control tasks, better control effects are obtained, and the generation quality of text content is improved.
Example two
The foregoing embodiment describes the training method of the AI generation model provided by the embodiment of the present application, and after training is completed, text generation may be performed using the model. Therefore, the second embodiment also provides a text generation method using an AI generation model, wherein the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model, the gating function and the plurality of LoRA modules are trained by using a preset training data set, the gating function has the capability of generating suitable weight vectors for the plurality of LoRA modules according to an input task identifier after training is completed, the task identifier is a unique identifier distributed according to a predefined plurality of control tasks, and the predefined plurality of control tasks comprise unilateral control tasks or multi-aspect control tasks, and the multi-aspect control tasks are tasks for simultaneously controlling a text generation process from a plurality of aspects. Referring to fig. 3, the method may include:
S301, determining input information of the AI generation model, wherein the input information is a text instruction containing a target control task;
S302, determining a target task identifier corresponding to the target control task, and inputting the target task identifier into a gating function so as to generate a plurality of weight vectors for the target task identifier through the gating function, wherein the plurality of weight vectors respectively correspond to the plurality of LoRA modules;
and S303, after the weight vectors are used for carrying out weighting processing on the output information of the plurality LoRA of modules, controlling the text generation process of the pre-trained generation model so as to generate target text content.
The target control task comprises a combined task formed by a plurality of predefined unilateral control tasks, task identifiers respectively corresponding to the unilateral control tasks included in the combined task can be determined at the moment, then the task identifiers respectively corresponding to the unilateral control tasks can be input into the gating function, so that the gating function generates a plurality of groups of weight vectors, the plurality of groups of weight vectors correspond to the task identifiers of the unilateral control tasks, and a plurality of weight vectors in the same group are used for corresponding to the LoRA modules. And then, normalizing the multiple groups of weight vectors, and weighting the output information of the multiple LoRA modules by using the normalized weight vectors.
It should be noted that, the AI generation model provided by the embodiment of the present application may be applied in a plurality of different scenarios, for example, writing an article through AI, writing text content such as an email through AI, writing descriptive text or advertisement text for a commodity through AI, and so on. In the writing process, a certain theme and emotion may need to be expressed, or the text length (may be expressed through the number of characters and the like) may also need to be controlled, or some keywords may also need to be included, at this time, the AI generation model provided by the embodiment of the present application may be used to realize control from different aspects, and the gating function exists, so that each LoRA module may have different control forces when controlling different types of control tasks, thereby being beneficial to obtaining better control effects and improving the generation quality of text content.
For the undescribed parts in the second embodiment, reference may be made to the description of the first embodiment and other parts of the specification, and the description is not repeated here.
It should be noted that, in the embodiment of the present application, the use of user data may be involved, and in practical application, the user specific personal data may be used in the solution described herein within the scope allowed by the applicable legal regulations in the country under the condition of meeting the applicable legal regulations in the country (for example, the user explicitly agrees to the user to notify practically, etc.).
Corresponding to the first embodiment, the present application further provides an AI generation model training apparatus, where the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of the pre-trained generation model, and the apparatus includes:
The input information determining unit is used for determining input information of the AI generation model according to a preset training data set, wherein the input information is a text instruction containing control task information;
The weight vector generation unit is used for determining a task identifier corresponding to the control task information, and inputting the task identifier into a gating function so as to determine weight vectors respectively corresponding to the plurality of LoRA modules through the gating function, wherein the task identifier is an identifier distributed according to a predefined plurality of control tasks, the control tasks comprise unilateral control tasks or multi-aspect control tasks, and the multi-aspect control tasks are tasks for controlling a text generation process from a plurality of aspects;
And the weighting processing unit is used for participating in the text generation process after carrying out weighting processing on the output information of the plurality of LoRA modules through the weight vector, and carrying out parameter updating on the gating function and the plurality of LoRA modules according to a preset loss function.
In a specific implementation, by updating parameters of the gating function and the plurality LoRA of modules in a plurality of iteration rounds, the gating function can obtain the capability of generating appropriate weight vectors for the plurality LoRA of modules according to the input task identifier, and the plurality LoRA of modules can obtain the capability of controlling the text generation process of the pre-trained generation model according to the control task information.
The pre-trained generation model comprises a plurality of basic construction units, and each basic construction unit integrates the plurality of LoRA modules respectively.
In a specific implementation, the basic building unit comprises a self-attention layer, a layer normalization and a position-aware feedforward layer, wherein the plurality LoRA of modules are integrated in the pre-trained generation model in the form of bypasses of the position-aware feedforward layer;
The basic construction unit processes the input sequence through self-attention layer and layer normalization, inputs the processed input sequence into the plurality LoRA of modules, processes the output information of the plurality LoRA of modules through the weight vector, combines the processed output information with the output information of the basic construction unit, and inputs the processed output information into the next basic construction unit.
The loss function comprises a loss function for balancing differences caused by data maldistribution in different control aspects.
In addition, the loss function includes a loss function for balancing distances between a plurality of different control attributes included in the same control aspect.
Corresponding to the embodiment, the embodiment of the application also provides a text generation device by utilizing an AI generation model, wherein the AI generation model is formed by integrating a gating function and a plurality of low-rank decomposition LoRA modules on the basis of a pre-trained generation model, the gating function and the plurality of LoRA modules are trained by utilizing a preset training data set, the task identifier is a unique identifier distributed according to a predefined plurality of control tasks, the predefined plurality of control tasks comprise unilateral control tasks or multi-aspect control tasks, the multi-aspect control tasks are tasks for controlling a text generation process from a plurality of aspects at the same time, and the device specifically comprises:
The input information determining unit is used for determining input information of the AI generation model, wherein the input information is a text instruction containing a target control task;
The weight vector generation unit is used for determining a target task identifier corresponding to the target control task, inputting the target task identifier into a gating function, and generating a plurality of weight vectors for the target task identifier through the gating function, wherein the plurality of weight vectors respectively correspond to the plurality of LoRA modules;
and the text generation unit is used for controlling the text generation process of the pre-trained generation model after the weight vector is used for carrying out weighting processing on the output information of the plurality LoRA of modules so as to generate target text content.
Specifically, the target control task comprises a combined task formed by a plurality of predefined unilateral control tasks;
The weight vector generation unit may specifically be configured to:
Determining task identifiers respectively corresponding to a plurality of unilateral control tasks included in the combined task;
Inputting task identifiers respectively corresponding to the unilateral control tasks into the gating function so that the gating function generates a plurality of groups of weight vectors, wherein the plurality of groups of weight vectors correspond to the task identifiers of the unilateral control tasks, and a plurality of weight vectors in the same group are used for corresponding to the plurality of LoRA modules;
The text generation unit may specifically be configured to:
and carrying out fusion processing on the multiple groups of weight vectors, and carrying out weighting processing on the output information of the multiple LoRA modules by utilizing the fused weight vectors.
In addition, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method of any one of the previous method embodiments.
And an electronic device comprising:
one or more processors, and
A memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.
A computer program product comprising computer program/computer executable instructions which, when executed by a processor in an electronic device, implement the steps of the method of the preceding method embodiments.
Fig. 4 illustrates an architecture of an electronic device, which may include a processor 410, a video display adapter 411, a disk drive 412, an input/output interface 413, a network interface 414, and a memory 420, among others. The processor 410, video display adapter 411, disk drive 412, input/output interface 413, network interface 414, and memory 420 may be communicatively coupled via a communication bus 430.
The processor 410 may be implemented by a general-purpose CPU (Central Processing Unit) or a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the present application.
The Memory 420 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, etc. The memory 420 may store an operating system 421 for controlling the operation of the electronic device 400, and a Basic Input Output System (BIOS) for controlling the low-level operation of the electronic device 400. In addition, a web browser 423, a data storage management system 424, a text generation processing system 425, and the like may also be stored. The text generation processing system 425 may be an application program embodying the operations of the steps described above in embodiments of the present application. In general, when the technical solution provided by the present application is implemented by software or firmware, relevant program codes are stored in the memory 420 and invoked by the processor 410 for execution.
The input/output interface 413 is used to connect to an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 414 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 430 includes a path to transfer information between various components of the device (e.g., processor 410, video display adapter 411, disk drive 412, input/output interface 413, network interface 414, and memory 420).
It should be noted that although the above devices only show the processor 410, the video display adapter 411, the disk drive 412, the input/output interface 413, the network interface 414, the memory 420, the bus 430, etc., in the specific implementation, the device may include other components necessary to achieve normal operation. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The training of the AI generation model, the text generation method and the electronic device provided by the application are described in detail, and specific examples are applied to explain the principle and implementation of the application, and the description of the examples is only used for helping to understand the method and core idea of the application, and meanwhile, the changes in the specific implementation and application range can be made by those skilled in the art according to the idea of the application. In view of the foregoing, this description should not be construed as limiting the application.
Claims (11)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510168186.XA CN120258167A (en) | 2025-02-14 | 2025-02-14 | AI generation model training, text generation method and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510168186.XA CN120258167A (en) | 2025-02-14 | 2025-02-14 | AI generation model training, text generation method and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120258167A true CN120258167A (en) | 2025-07-04 |
Family
ID=96185971
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510168186.XA Pending CN120258167A (en) | 2025-02-14 | 2025-02-14 | AI generation model training, text generation method and electronic device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120258167A (en) |
-
2025
- 2025-02-14 CN CN202510168186.XA patent/CN120258167A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20180018555A1 (en) | System and method for building artificial neural network architectures | |
| JP6521440B2 (en) | Neural network and computer program therefor | |
| JP7070653B2 (en) | Learning devices, speech recognition ranking estimators, their methods, and programs | |
| CN118313418A (en) | Neural network model training method and device | |
| WO2023142282A1 (en) | Task amplification-based transfer attack method and apparatus | |
| CN111667069A (en) | Pre-training model compression method and device and electronic equipment | |
| US11886832B2 (en) | Operation device and operation method | |
| CN113449840A (en) | Neural network training method and device and image classification method and device | |
| JP7596559B2 (en) | Neural network with feedforward spatial transformation units | |
| WO2024058797A1 (en) | Visual prompt tuning for generative transfer learning | |
| CN120258167A (en) | AI generation model training, text generation method and electronic device | |
| Lei et al. | Revisiting Fine-Tuning: A Survey of Parameter-Efficient Techniques for Large AI Models | |
| CN118037890A (en) | Text and image alignment method and device, electronic equipment and readable storage medium | |
| Abdylgahni et al. | An Improved Image Generation Conditioned on Text Using Stable Diffusion Model | |
| JP7297286B2 (en) | Optimization method, optimization program, reasoning method, and reasoning program | |
| CN111309875B (en) | Method, device, equipment and storage medium for answering questions | |
| CN119293204B (en) | An Improved Context-Aware Adaptive Contrast Decoding Method | |
| Chen et al. | Static correlative filter based convolutional neural network for visual question answering | |
| CN118428482B (en) | Data processing method, device and storage medium | |
| WO2020054402A1 (en) | Neural network processing device, computer program, neural network manufacturing method, neural network data manufacturing method, neural network use device, and neural network downscaling method | |
| CN113298248B (en) | Processing method and device for neural network model and electronic equipment | |
| Shi et al. | CNO-Former: Chaotic Neural Oscillatory Transformer for Social Media Text Generation | |
| WO2024138177A1 (en) | Recurrent interface networks | |
| Han et al. | LLaVA-GM: Lightweight LLaVA Multimodal Architecture | |
| KR20230080131A (en) | Named entity recognition apparatus and method thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |