CN117056720A - Learning fine tuning method for pre-training language, computer device and computer readable storage medium - Google Patents
Learning fine tuning method for pre-training language, computer device and computer readable storage medium Download PDFInfo
- Publication number
- CN117056720A CN117056720A CN202310940361.3A CN202310940361A CN117056720A CN 117056720 A CN117056720 A CN 117056720A CN 202310940361 A CN202310940361 A CN 202310940361A CN 117056720 A CN117056720 A CN 117056720A
- Authority
- CN
- China
- Prior art keywords
- training
- model
- data set
- training model
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
本发明提供一种预训练语言的学习微调方法、计算机装置及计算机可读存储介质,该方法包括获取预训练模型,并制作垂直领域的垂直数据集,垂直数据集包括多组具有逻辑关系的第一训练语句,将第一训练语句进行扩充和乱序处理后,对预训练模型进行训练,获得初始训练模型;并且,应用第一训练语句对初始训练模型进行测试,根据初始训练模型对第一训练语句的答复准确率判断初始训练模型是否存在欠拟合的情况;并且判断初始训练模型是否存在过拟合的情况,根据欠拟合或者过拟合的情况对第一训练语句的扩充倍数进行调整。本发明还提供实现上述方法的计算机装置及计算机可读存储介质。本发明能避免训练模型过拟合或者欠拟合。
The present invention provides a learning fine-tuning method for pre-training language, a computer device and a computer-readable storage medium. The method includes acquiring a pre-training model and producing a vertical data set in a vertical field. The vertical data set includes multiple groups of logical relationships. a training statement, after expanding and shuffling the first training statement, the pre-training model is trained to obtain the initial training model; and the first training statement is used to test the initial training model, and the first training model is tested according to the initial training model. The response accuracy of the training statement determines whether the initial training model is under-fitting; and determines whether the initial training model is over-fitting, and the expansion multiple of the first training statement is calculated based on the under-fitting or over-fitting situation. Adjustment. The invention also provides a computer device and a computer-readable storage medium for implementing the above method. The present invention can avoid over-fitting or under-fitting of the training model.
Description
技术领域Technical field
本发明涉及大规模语言训练的技术领域,具体地,是一种针对大规模预训练语言模型的强化学习微调方法,还涉及实现该方法的计算机装置及计算机可读存储介质。The present invention relates to the technical field of large-scale language training. Specifically, it is a reinforcement learning fine-tuning method for large-scale pre-trained language models. It also relates to a computer device and a computer-readable storage medium that implement the method.
背景技术Background technique
大规模预训练语言模型是近年来取得突破性进展的人工智能技术,其中最著名的模型是Open AI的GPT,这种模型使用无标签的大规模文本数据进行预训练,从而学习到丰富的语言表示,然后,通过微调这些预训练模型,可以使其适应特定领域的任务或问题。Large-scale pre-training language model is an artificial intelligence technology that has made breakthrough progress in recent years. The most famous model is Open AI's GPT. This model uses unlabeled large-scale text data for pre-training to learn rich languages. Representations can then be adapted to domain-specific tasks or problems by fine-tuning these pre-trained models.
在强化学习领域,研究人员已经提出了多种方法来将预训练语言模型与强化学习相结合。例如,现有一种常用的方法是使用基于奖励信号的强化学习算法,通过与环境的交互来微调预训练模型,这种方法允许模型在特定任务中进行自我学习和优化。In the field of reinforcement learning, researchers have proposed various methods to combine pre-trained language models with reinforcement learning. For example, a commonly used method is to use a reinforcement learning algorithm based on reward signals to fine-tune a pre-trained model through interaction with the environment. This method allows the model to self-learn and optimize in a specific task.
关于针对特定领域的微调方法,现有一些文献提出了不同的技术和策略。例如,Smith等人在其2019年的论文“Offline Reinforcement Learning:Tutorial,Review,andPerspectives on Open Problems”中介绍了离线强化学习的方法,该方法可以在没有实时交互的情况下进行微调。另外,Li等人在其2020年的论文“Train Your Own Model(TYOM):ASelf-Supervised Model for Speech Recognition”中介绍了一种自我监督的微调方法,该方法使用自动生成的标签来微调预训练模型,以适应特定的语音识别任务。Regarding domain-specific fine-tuning methods, some existing literature proposes different techniques and strategies. For example, Smith et al. in their 2019 paper "Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems" introduced an approach to offline reinforcement learning that can be fine-tuned without real-time interaction. Additionally, Li et al. introduced a self-supervised fine-tuning method in their 2020 paper "Train Your Own Model (TYOM): ASelf-Supervised Model for Speech Recognition" that uses automatically generated labels to fine-tune pre-training model to fit a specific speech recognition task.
尽管大规模预训练语言模型在自然语言处理领域取得了巨大成功,但它们也存在一些问题和缺点。首先,这些模型往往需要大量的计算资源和时间进行预训练。其次,预训练的通用表示可能无法直接适应特定领域的细节和特征。此外,微调过程中可能会面临领域特定数据不足的挑战,特别是对于某些特定的垂直领域或任务。这些问题和困难限制了大规模预训练语言模型在特定领域应用中的效果和性能。Although large-scale pre-trained language models have achieved great success in the field of natural language processing, they also have some problems and shortcomings. First, these models often require extensive computing resources and time to pre-train. Second, pre-trained general representations may not directly adapt to domain-specific details and features. In addition, the fine-tuning process may face the challenge of insufficient domain-specific data, especially for some specific vertical fields or tasks. These problems and difficulties limit the effectiveness and performance of large-scale pre-trained language models in specific domain applications.
总而言之,目前的大规模预训练语言模型的核心能力来源于大量的训练数据和庞大的网络模型参数,但这种大规模效应在带来智能化的同时也使得模型难以被本地化部署和难以二次训练,在此背景下,微调训练被广泛的用于大规模预训练模型的知识增强。然而,目前针对微调训练并没有非常有效且具体的方式来进行。All in all, the core capabilities of current large-scale pre-trained language models come from a large amount of training data and huge network model parameters. However, this large-scale effect not only brings intelligence, but also makes the model difficult to be deployed locally and difficult to modify. In this context, fine-tuning training is widely used to enhance the knowledge of large-scale pre-trained models. However, there is currently no very effective and specific way to fine-tune training.
公开号为CN115423118A的发明专利申请公开了一种预训练语言模型微调方法,该方法针对每一类任务通过初始预训练语言模型制定文本提示模板;将训练数据整理成批次数据并进行合并;打乱合并后的批次数据顺序;通过多任务学习微调预训练语言模型的参数。然而,该方法没有考虑到模型出现过拟合或者欠拟合的处理方式,尤其是没有针对过拟合进行有效的测试,导致训练获得的模型会存在过拟合的情况,影响训练模型的质量。The invention patent application with the publication number CN115423118A discloses a pre-trained language model fine-tuning method. This method formulates text prompt templates through the initial pre-trained language model for each type of task; organizes the training data into batches of data and merges them; Shuffle the order of the merged batch data; fine-tune the parameters of the pre-trained language model through multi-task learning. However, this method does not take into account the handling of over-fitting or under-fitting in the model, and in particular does not conduct effective tests for over-fitting, resulting in over-fitting in the model obtained by training, affecting the quality of the trained model. .
发明内容Contents of the invention
本发明的第一目的是提供一种可以避免训练获得的模型过拟合或者欠拟合的预训练语言的学习微调方法。The first object of the present invention is to provide a learning fine-tuning method for pre-training language that can avoid over-fitting or under-fitting of the model obtained through training.
本发明的第二目的是提供一种实现上述预训练语言的学习微调方法的计算机装置。The second object of the present invention is to provide a computer device that implements the above-mentioned fine-tuning method for learning a pre-trained language.
本发明的第三目的是提供一种实现上述预训练语言的学习微调方法的计算机可读存储介质。The third object of the present invention is to provide a computer-readable storage medium that implements the above-mentioned fine-tuning method of learning a pre-trained language.
为实现本发明的第一目的,本发明提供的预训练语言的学习微调方法包括获取预训练模型,并制作垂直领域的垂直数据集,垂直数据集包括多组具有逻辑关系的第一训练语句,将第一训练语句进行扩充和乱序处理后,对预训练模型进行训练,获得初始训练模型;并且,应用第一训练语句对初始训练模型进行测试,根据初始训练模型对第一训练语句的答复准确率判断初始训练模型是否存在欠拟合的情况;如初始训练模型存在欠拟合的情况,则增加垂直数据集中第一训练语句的扩充倍数,对预训练模型进行重新训练;建立常识数据集,常识数据集包含有多组第二训练语句,将第二训练语句增量到垂直数据集中形成增量垂直数据集,应用增量垂直数据集对初始训练模型进行训练,获得增量训练模型,根据增量训练模型对增量垂直数据集中的训练语句的答复准确率判断初始训练模型是否存在过拟合的情况;如初始训练模型存在过拟合的情况,则减小垂直数据集中第一训练语句的扩充倍数,对预训练模型进行重新训练。In order to achieve the first object of the present invention, the pre-trained language learning fine-tuning method provided by the present invention includes obtaining a pre-training model and creating a vertical data set in a vertical field. The vertical data set includes multiple sets of first training statements with logical relationships, After the first training statement is expanded and shuffled, the pre-training model is trained to obtain an initial training model; and the first training statement is used to test the initial training model, and the response to the first training statement is based on the initial training model The accuracy rate determines whether the initial training model is under-fitting; if the initial training model is under-fitting, increase the expansion multiple of the first training statement in the vertical data set and retrain the pre-training model; establish a common sense data set , the common sense data set contains multiple sets of second training statements. The second training statements are incremented into the vertical data set to form an incremental vertical data set. The incremental vertical data set is used to train the initial training model to obtain the incremental training model. Determine whether the initial training model is overfitting based on the accuracy of the incremental training model's response to the training sentences in the incremental vertical data set; if the initial training model is overfitting, reduce the number of first training steps in the vertical data set. The expansion multiple of the statement is used to retrain the pre-trained model.
由上述方案可见,在训练获得初始训练模型后,需要对初始训练模型是否存在过拟合以及欠拟合的情况进行测试,并且针对过拟合、欠拟合的情况对扩充倍数进行动态调整,从而获得更加的训练模型,提升训练获得的模型质量。It can be seen from the above scheme that after training to obtain the initial training model, it is necessary to test whether the initial training model has overfitting and underfitting, and dynamically adjust the expansion multiple according to the overfitting and underfitting conditions. In this way, more training models can be obtained and the quality of the models obtained through training can be improved.
另外,针对过拟合测试,设置了常识数据集,通过增加第二训练语句后的数据集对增量训练模型进行测试,能够有效判断出初始训练模型是否存在过拟合的情况,从而准确的对初始训练模型进行调整。In addition, for the over-fitting test, a common sense data set is set up. By adding the data set after the second training statement to test the incremental training model, it can effectively determine whether the initial training model is over-fitting, so as to accurately Make adjustments to the initially trained model.
一个优选的方案是,制作垂直领域的垂直数据集时,应用前端界面输入第一训练语句,前端界面具有提示语输入框、问题输入框以及答案输入框。A preferred solution is to use a front-end interface to input the first training sentence when creating a vertical data set in a vertical field. The front-end interface has a prompt input box, a question input box, and an answer input box.
由此可见,在进行训练时,用户可以快速的通过前端界面输入第一训练语句,操作简单,可以提升模型的训练效率。It can be seen that during training, users can quickly input the first training statement through the front-end interface. The operation is simple and can improve the training efficiency of the model.
进一步的方案是,前端界面还设置有数据导入按钮、数据扩充按钮、数据乱序按钮。A further solution is that the front-end interface is also equipped with data import buttons, data expansion buttons, and data shuffle buttons.
这样,用户可以通过数据导入按钮、数据扩充按钮、数据乱序按钮针对第一训练语句进行自动化的扩充、乱序处理,提升测试效率。In this way, users can use the data import button, data expansion button, and data shuffle button to automatically expand and shuffle the first training statement to improve testing efficiency.
更进一步的方案是,建立常识数据集时,应用前端界面输入第二训练语句。A further solution is to use the front-end interface to input the second training sentence when creating a knowledge data set.
可见,应用相同的前端界面还可以输入第二训练语句,使得常识数据集的制作非常简单、方便。It can be seen that using the same front-end interface, you can also enter a second training sentence, making the production of common sense data sets very simple and convenient.
一个优选的方案是,前端界面还设置有欠拟合检测按钮以及过拟合检测按钮;该方法还包括:在获取欠拟合检测按钮被按下的信号时,执行欠拟合检测操作,在获取过拟合检测按钮被按下的信号时,执行过拟合检测操作。A preferred solution is that the front-end interface is also provided with an under-fitting detection button and an over-fitting detection button; the method also includes: when obtaining a signal that the under-fitting detection button is pressed, performing an under-fitting detection operation, and When the signal that the over-fitting detection button is pressed is obtained, the over-fitting detection operation is performed.
由此可见,进行过拟合测试以及欠拟合测试时,都是通过对前端界面的相应操作按钮进行的,通过前端界面就可以实现过拟合测试以及欠拟合测试,对初始训练模型的测试操作非常便捷。It can be seen that the over-fitting test and under-fitting test are performed through the corresponding operation buttons on the front-end interface. The over-fitting test and under-fitting test can be realized through the front-end interface, and the initial training model is modified. The test operation is very convenient.
优选的方案是,前端界面还设置有模型导出按钮;该方法还包括:在获取模型导出按钮被按下的信号时,导出微调后的训练模型。A preferred solution is that the front-end interface is also provided with a model export button; the method further includes: when obtaining a signal that the model export button is pressed, exporting the fine-tuned training model.
这样,在确定训练模型满足要求,即不存在过拟合以及欠拟合的时候,可以通过模型导出按钮快速的导出训练获得的模型。In this way, when it is determined that the training model meets the requirements, that is, there is no overfitting or underfitting, the model obtained by training can be quickly exported through the model export button.
进一步的方案是,对第一训练语句的扩充倍数进行增加或者减少时,增加或者减少后的扩充倍数为整数倍。A further solution is that when the expansion multiple of the first training statement is increased or decreased, the expansion multiple after the increase or decrease is an integer multiple.
这样,使得用于训练的第一训练语句与第二训练语句实现整数倍的扩充,训练效果更佳。In this way, the first training statement and the second training statement used for training are expanded by an integer multiple, and the training effect is better.
更进一步的方案是,第一训练语句为问答方式;和/或,第二训练语句为问答方式。A further solution is that the first training sentence is in a question-and-answer format; and/or the second training sentence is in a question-and-answer format.
为实现上述的第二目的,本发明提供的计算机装置包括处理器以及存储器,存储器存储有计算机程序,计算机程序被处理器执行时实现上述预训练语言的学习微调方法的各个步骤。In order to achieve the above second object, the computer device provided by the present invention includes a processor and a memory. The memory stores a computer program. When the computer program is executed by the processor, each step of the above-mentioned pre-training language learning fine-tuning method is implemented.
为实现上述的第三目的,本发明提供计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述预训练语言的学习微调方法的各个步骤。In order to achieve the above third object, the present invention provides a computer program stored on a computer-readable storage medium. When the computer program is executed by a processor, each step of the above-mentioned fine-tuning method for learning a pre-trained language is implemented.
附图说明Description of the drawings
图1是本发明预训练语言的学习微调方法实施例的流程图的第一部分。Figure 1 is the first part of the flow chart of an embodiment of the pre-training language learning fine-tuning method of the present invention.
图2是本发明预训练语言的学习微调方法实施例的流程图的第二部分。Figure 2 is the second part of the flow chart of an embodiment of the pre-training language learning fine-tuning method of the present invention.
图3是本发明预训练语言的学习微调方法实施例中前端界面的示意图。Figure 3 is a schematic diagram of the front-end interface in the embodiment of the pre-training language learning fine-tuning method of the present invention.
以下结合附图及实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and examples.
具体实施方式Detailed ways
本发明的预训练语言的学习微调方法主要应用于大规模预训练语言模型的训练,尤其是针对垂直领域的大规模预训练语言模型进行训练。本发明提供的计算机装置具有处理器以及存储器,处理器可以执行计算机程序并实现上述的预训练语言的学习微调方法。The pre-training language learning fine-tuning method of the present invention is mainly applied to the training of large-scale pre-trained language models, especially for the training of large-scale pre-trained language models in vertical fields. The computer device provided by the present invention has a processor and a memory. The processor can execute a computer program and implement the above-mentioned fine-tuning method for learning a pre-training language.
预训练语言的学习微调方法实施例:Example of learning fine-tuning method for pre-trained language:
参见图1与图2,本实施例首先执行步骤S1,获取一个需要进行垂直领域训练的预训练模型,并且将该将预训练模型下载到本地计算机上,例如将该预训练模型存储在本地计算机的存储器上。另外,还需要设计好的微调算法模型部署在本地计算机,将各种依赖包安装到本地计算机上。然后,在本地计算机上实现Low-Rank Adaptation(lora)功能,可以在保证预训练原始模型不变的情况下,形成一个新的附加模型,该模型包含了垂直领域的知识,并以参数的形式保存在本地计算机上。Referring to Figures 1 and 2, this embodiment first performs step S1 to obtain a pre-trained model that needs to be trained in a vertical domain, and download the pre-trained model to a local computer, for example, store the pre-trained model on the local computer. on the memory. In addition, the designed fine-tuning algorithm model also needs to be deployed on the local computer, and various dependency packages must be installed on the local computer. Then, by implementing the Low-Rank Adaptation (lora) function on the local computer, a new additional model can be formed while ensuring that the pre-trained original model remains unchanged. This model contains vertical domain knowledge and is in the form of parameters. Save on local computer.
然后,执行步骤S2,制作需要训练的垂直领域的数据,形成垂直数据集。在制作垂直数据集时,需要准备垂直领域的多种知识,并且需要保证训练数据的准确性。在确定用于训练的数据后,将这些数据制作成具有逻辑关系的前后语句,例如针对每一项训练数据,制作成问题和答案的形式。进一步的,还可以将每一项训练数据的问题和答案相关的描述以提示词或者说明的形式设置在问答对的前段,并按照前后顺序放入按照预定格式构成的json文件中。Then, step S2 is performed to create data in vertical fields that require training to form a vertical data set. When making a vertical data set, it is necessary to prepare a variety of knowledge in the vertical field, and the accuracy of the training data needs to be ensured. After determining the data for training, the data is made into logically related statements, for example, in the form of questions and answers for each item of training data. Furthermore, the description related to the questions and answers of each item of training data can also be set in the form of prompt words or instructions in the front part of the question and answer pair, and placed in a json file composed of a predetermined format in order.
优选的,将数据信息整理成标准json文件的过程,本实施例通过设计一个标准前端界面和后端程序直接完成,用户只需要在前端界面的对应位置输入提示词、问题和标准答案,点击形成json文件即可。Preferably, the process of organizing data information into standard json files is directly completed in this embodiment by designing a standard front-end interface and back-end program. The user only needs to enter prompt words, questions and standard answers at the corresponding positions of the front-end interface, and click to form json file is enough.
参见图3,前端界面设置有多个按钮,并且设置有三个输入框,分别是提示语输入框11、问题输入框12以及答案输入框13,在每一个输入框的上方均显示有该输入框的名称,以便于用户正确识别各个输入框,并且在各个输入框中输入正确的信息。Referring to Figure 3, the front-end interface is provided with multiple buttons and three input boxes, namely prompt input box 11, question input box 12 and answer input box 13. The input box is displayed above each input box. name so that users can correctly identify each input box and enter correct information in each input box.
此外,前端界面还设置有六个按钮,分别是数据导入按钮21、数据扩充按钮22、数据打乱按钮23、欠拟合检测按钮24、过拟合检测按钮25以及模型输出按钮26。其中,数据导入按钮21用于实现模型数据的导入,例如将预训练的模型导入到测试系统中。数据扩充按钮22用于实现对第一训练语句、第二训练语句的扩充,优选的,扩充的倍数由用户设定,例如扩充5倍或者6倍。优选的,扩充的倍数是整数,也就是第一训练语句、第二训练语句是整数倍的进行扩充。数据打乱按钮23用于对扩充后的数据进行乱序处理,即对扩充后的数据进行随机的排序,从而实现数据集的扩展。In addition, the front-end interface is also equipped with six buttons, namely data import button 21, data expansion button 22, data shuffle button 23, underfitting detection button 24, overfitting detection button 25 and model output button 26. Among them, the data import button 21 is used to import model data, for example, import a pre-trained model into the test system. The data expansion button 22 is used to expand the first training statement and the second training statement. Preferably, the expansion multiple is set by the user, such as 5 times or 6 times expansion. Preferably, the expansion multiple is an integer, that is, the first training statement and the second training statement are expanded by an integer multiple. The data shuffle button 23 is used to shuffle the expanded data, that is, randomly sort the expanded data, thereby realizing the expansion of the data set.
欠拟合检测按钮24用于发出进行欠拟合检测的指令,当用户点击该按钮后,系统将自动对初始训练模型是否存在欠拟合的情况进行测试。过拟合检测按钮25用于发出进行过拟合检测的指令,当用户点击该按钮后,系统将自动对初始训练模型是否存在过拟合的情况进行测试。而模型输出按钮26用于输出经过测试后的模型,说输出的模型保存在本地计算机的存储器中。The underfitting detection button 24 is used to issue an instruction for underfitting detection. When the user clicks this button, the system will automatically test whether there is underfitting in the initial training model. The overfitting detection button 25 is used to issue an instruction for overfitting detection. When the user clicks this button, the system will automatically test whether the initial training model has overfitting. The model output button 26 is used to output the tested model, which means that the output model is saved in the memory of the local computer.
参见图1,在获取垂直数据集后的多组具有逻辑关系的第一训练语句后,执行步骤S3,将第一训练语句进行扩充和乱序处理。优选的,可以设定默认的扩充倍数,例如默认的扩充倍数是5倍,加上垂直数据集的第一训练语句初始状态下有1000条问答对,则进行幅值5次,形成5000条问答对,但只涉及1000条内容。并且,步骤S3还对扩充后的第一训练语句进行打乱顺序的处理,并且放入到微调模型中进行训练。Referring to Figure 1, after acquiring multiple sets of first training statements with logical relationships from the vertical data set, step S3 is executed to expand and reorder the first training statements. Preferably, a default expansion multiple can be set. For example, the default expansion multiple is 5 times. In addition, in the initial state of the first training statement of the vertical data set, there are 1,000 question and answer pairs, then the amplitude is performed 5 times to form 5,000 question and answer pairs. Yes, but only 1,000 pieces of content are involved. Moreover, step S3 also performs shuffle processing on the expanded first training statement and puts it into the fine-tuning model for training.
接着,执行步骤S4,应用经过扩充、乱序处理后的第一训练语句进行训练,获得初始训练模型。在获得初始训练模型后,需要判断经过初始训练后的初始训练模型是否出现过拟合或者欠拟合的情况,并根据测试结果多初始训练模型进行微调。Then, step S4 is executed, and the first training sentence after the expansion and out-of-order processing is used for training to obtain an initial training model. After obtaining the initial training model, you need to determine whether the initial training model is overfitting or underfitting after initial training, and fine-tune the initial training model based on the test results.
然后,执行步骤S5,对初始训练模型进行是否存在欠拟合的情况的测试。具体的,首先从垂直数据集中随机选择其中一定比例的数据,例如随机选取10%的数据作为欠拟合的测试数据,提取其中的问题输入到训练好的初始训练模型中,然后获取初始训练模型针对输入的测试问的答案,将所获取的答案与第一训练语句中的标准答案进行比对,判断初始训练模型所生成的答案与标准答案的相似度,如果相似度高于90%,那么就认为初始训练模型可以获得正确答案。通过这种方式,逐一计算初始训练模型针对随机选取的10%的问题的答案中,与标准答案相似度大于90%的数量,如果与标准答案相似度大于90%的数量超过预先设定的阈值,例如该阈值为95%,则认为初始训练模型不存在欠拟合的情况。Then, step S5 is executed to test whether there is underfitting on the initial training model. Specifically, first randomly select a certain proportion of data from the vertical data set, for example, randomly select 10% of the data as underfitting test data, extract the problems and input them into the trained initial training model, and then obtain the initial training model. For the answers to the input test questions, compare the obtained answers with the standard answers in the first training sentence, and determine the similarity between the answers generated by the initial training model and the standard answers. If the similarity is higher than 90%, then It is considered that the initial training model can obtain the correct answer. In this way, the number of answers to 10% of the randomly selected questions of the initial training model that are more than 90% similar to the standard answer is calculated one by one. If the number of answers that are more than 90% similar to the standard answer exceeds the preset threshold , for example, if the threshold is 95%, it is considered that the initial training model does not have underfitting.
因此,在步骤S6中,需要判断初始训练模型是否存在欠拟合的情况,如果初始序列模型不存在欠拟合的情况,则执行步骤S9,如果初始训练模型存在欠拟合的情况,则表示步骤S3中对第一训练语句的扩充倍数不够,需要执行步骤S7,增加第一训练语句的扩充倍数。如果初始状态下扩充倍数是5倍,则步骤S7可以将扩充倍数设置为6倍或者7倍,并且执行步骤S8,再次将扩充后的第一训练语句进行打乱顺序的处理,并再次对预训练模型进行训练。Therefore, in step S6, it is necessary to determine whether the initial training model is under-fitted. If the initial sequence model is not under-fitted, step S9 is executed. If the initial training model is under-fitted, it means The expansion multiple of the first training statement in step S3 is not enough, and step S7 needs to be executed to increase the expansion multiple of the first training statement. If the expansion multiple is 5 times in the initial state, then step S7 can set the expansion multiple to 6 times or 7 times, and execute step S8 to process the expanded first training statement out of order again, and process the predetermined training statements again. Train the model to train.
在步骤S9中,需要对初始训练模型是否存在过拟合的情况进行测试。本实施例中,需要制作一个常识数据集,该常识数据集是包含有多条第二训练语句,第二训练语句是常识性的语句,也是以问答的方式呈现,例如第二训练语句的问题是“中国的国土面积有多大”等。优选的,第二训练语句的规模与第一训练语句的规模相近,如第一训练语句有1000条,则第二训练语句的数量也大约是1000条,优选在900条中1100条之间。In step S9, it is necessary to test whether the initial training model has overfitting. In this embodiment, a common sense data set needs to be created. The common sense data set contains a plurality of second training sentences. The second training sentences are common sense sentences and are also presented in the form of questions and answers, such as questions of the second training sentences. It's "How big is China's land area" and so on. Preferably, the size of the second training statement is similar to the size of the first training statement. For example, the number of the first training statement is 1,000, and the number of the second training statement is also approximately 1,000, preferably between 1,100 of 900.
在制作常识数据集后,将常识数据集的第二训练语句增加到垂直数据集中,形成增量垂直数据集,并应用增量垂直数据集对初始训练模型进行训练,获得增量训练模型。然后,从增加垂直数据集中随机获取一定比例的数据作为过拟合的测试数据,并判断增量训练模型所生成的答案与标准答案的相似度,如相似度大于90%则认为增量训练模型生成的答案正确。最后,需要统计答案正确的比例,如果答案正确的比例大于预设的阈值,则认为初始训练模型不存在过拟合的情况,否则,认为初训练模型存在过拟合的情况。After making the common sense data set, add the second training statement of the common sense data set to the vertical data set to form an incremental vertical data set, and use the incremental vertical data set to train the initial training model to obtain an incremental training model. Then, a certain proportion of data is randomly obtained from the increased vertical data set as overfitting test data, and the similarity between the answer generated by the incremental training model and the standard answer is judged. If the similarity is greater than 90%, the incremental training model is considered The generated answer is correct. Finally, it is necessary to count the proportion of correct answers. If the proportion of correct answers is greater than the preset threshold, it is considered that the initial training model is not overfitting. Otherwise, the initial training model is considered to be overfitting.
在步骤S10中,如果判断结果为是,表示初始训练模型存在欠拟合的情况,则执行步骤S11,减小第一训练语句的扩充倍数,例如初始状态下对第一训练语句的扩充倍数是5倍,则步骤S11将将扩充倍数设置为3倍或者4倍,并且执行步骤S10,再次将扩充后的第一训练语句进行打乱顺序的处理,并再次对预训练模型进行训练。In step S10, if the judgment result is yes, it means that the initial training model is under-fitting, then step S11 is executed to reduce the expansion multiple of the first training statement. For example, the expansion multiple of the first training statement in the initial state is 5 times, then step S11 will set the expansion multiple to 3 times or 4 times, and execute step S10 to shuffle the expanded first training statements again, and train the pre-training model again.
如果步骤S10的判断结果为否,在表示初始训练模型既不存在过拟合也不存在欠拟合的情况,则执行步骤S13,输出经过训练的初始训练模型,并将该初始训练模型作为最终的训练模型,并保存在本地计算机的本地存储器中。If the judgment result of step S10 is no, which means that the initial training model is neither overfitting nor underfitting, step S13 is executed to output the trained initial training model and use the initial training model as the final The trained model is saved in the local memory of the local computer.
本发明针对初始训练模型提出了进行欠拟合测试和过拟合测试的方法,尤其是针对过拟合测试的情况,通过增加常识性的第二测试语句对垂直数据集进行增量,能够有效的测试出初始训练模型是否存在过拟合的情况。此外,针对欠拟合以及过拟合的情况,通过增加或者减小扩充倍数的方式对初始训练模型进行调整,使得最终获得的训练模型的质量更佳。The present invention proposes a method for under-fitting testing and over-fitting testing for the initial training model. Especially for the case of over-fitting testing, by adding common-sense second test statements to increment the vertical data set, it can effectively Test whether there is overfitting in the initial training model. In addition, for under-fitting and over-fitting situations, the initial training model is adjusted by increasing or decreasing the expansion multiple, so that the quality of the final training model is better.
此外,本发明还构建一个前端界面,用户可以通过前端界面快速的输入第一训练语句和第二训练语句,并且能够开始的进行数据中的训练语句的扩充、乱序等处理,还能够通过点击欠拟合检测按钮24、过拟合检测按钮25以及模型输出按钮26实现欠拟合测试、过拟合检测以及训练模型的导出,操作非常方便。In addition, the present invention also constructs a front-end interface through which the user can quickly input the first training statement and the second training statement, and can start processing such as expansion and reordering of the training statements in the data, and can also click The under-fitting detection button 24, the over-fitting detection button 25 and the model output button 26 implement under-fitting testing, over-fitting detection and the export of the training model, which are very convenient to operate.
计算机装置实施例:Computer Device Examples:
本实施例的计算机装置可以是智能终端设备,也可以是台式计算机,该计算机装置具有处理器、存储器以及存储在存储器中并可在处理器上运行的计算机程序,例如用于实现上述信息处理方法的信息处理程序,处理器执行计算机程序时实现上述预训练语言的学习微调方法的各个步骤。The computer device in this embodiment can be an intelligent terminal device or a desktop computer. The computer device has a processor, a memory, and a computer program stored in the memory and capable of running on the processor, for example, used to implement the above information processing method. An information processing program, when the processor executes the computer program, implements each step of the above-mentioned pre-training language learning fine-tuning method.
例如,计算机程序可以被分割成一个或多个模块,一个或者多个模块被存储在存储器中,并由处理器执行,以完成本发明的各个模块。一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序在终端设备中的执行过程。For example, a computer program can be divided into one or more modules, and one or more modules are stored in a memory and executed by a processor to complete each module of the present invention. One or more modules may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program in the terminal device.
需要说明的是,终端设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。终端设备可包括,但不仅限于,处理器、存储器。本领域技术人员可以理解,本发明的示意图仅仅是终端设备的示例,并不构成对终端设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网络接入设备、总线等。It should be noted that the terminal device can be a computing device such as a desktop computer, a notebook, a handheld computer, and a cloud server. Terminal devices may include, but are not limited to, processors and memories. Those skilled in the art can understand that the schematic diagram of the present invention is only an example of a terminal device and does not constitute a limitation on the terminal device. It may include more or fewer components than shown in the diagram, or combine certain components, or different components. , for example, the terminal device may also include input and output devices, network access devices, buses, etc.
本发明所称处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,处理器是终端设备的控制中心,利用各种接口和线路连接整个终端设备的各个部分。The processor referred to in the present invention can be a central processing unit (CPU), or other general-purpose processor, digital signal processor (Digital Signal Processor, DSP), or application specific integrated circuit (Application Specific Integrated Circuit, ASIC). , off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general processor can be a microprocessor or the processor can be any conventional processor, etc. The processor is the control center of the terminal device and uses various interfaces and lines to connect various parts of the entire terminal device.
存储器可用于存储计算机程序和/或模块,处理器通过运行或执行存储在存储器内的计算机程序和/或模块,以及调用存储在存储器内的数据,实现终端设备的各种功能。存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据手机的使用所创建的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(FlashCard)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory can be used to store computer programs and/or modules, and the processor implements various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory, and calling data stored in the memory. The memory may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, at least one application required for a function, etc.; the stored data area may store data created based on the use of the mobile phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as hard disk, memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card , a flash memory card (FlashCard), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
计算机可读存储介质实施例:Computer-readable storage media embodiments:
计算机装置所存储的计算机程序如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述预训练语言的学习微调方法的各个步骤。If the computer program stored in the computer device is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, each step of the above-mentioned pre-trained language learning fine-tuning method can be realized.
其中,计算机程序包括计算机程序代码,计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。Among them, the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form, etc. Computer-readable media may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, mobile hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium does not include Electrical carrier signals and telecommunications signals.
最后需要强调的是,以上仅为本发明的优选实施例,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种变化和更改,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Finally, it should be emphasized that the above are only preferred embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, the present invention can have various changes and modifications, as long as the spirit and principles of the present invention are maintained. Within the scope of this invention, any modifications, equivalent substitutions, improvements, etc. shall be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310940361.3A CN117056720A (en) | 2023-07-27 | 2023-07-27 | Learning fine tuning method for pre-training language, computer device and computer readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310940361.3A CN117056720A (en) | 2023-07-27 | 2023-07-27 | Learning fine tuning method for pre-training language, computer device and computer readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117056720A true CN117056720A (en) | 2023-11-14 |
Family
ID=88659974
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310940361.3A Pending CN117056720A (en) | 2023-07-27 | 2023-07-27 | Learning fine tuning method for pre-training language, computer device and computer readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117056720A (en) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111310934A (en) * | 2020-02-14 | 2020-06-19 | 北京百度网讯科技有限公司 | Model generation method and device, electronic equipment and storage medium |
| CN112632179A (en) * | 2019-09-24 | 2021-04-09 | 北京国双科技有限公司 | Model construction method and device, storage medium and equipment |
| CN114330297A (en) * | 2021-11-30 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Language model pre-training method, language text processing method and device |
| CN115080021A (en) * | 2022-05-13 | 2022-09-20 | 北京思特奇信息技术股份有限公司 | Zero code modeling method and system based on automatic machine learning |
| CN115423118A (en) * | 2022-09-06 | 2022-12-02 | 中国人民解放军军事科学院系统工程研究院 | Method, system and device for fine tuning of pre-training language model |
| CN115689770A (en) * | 2022-11-01 | 2023-02-03 | 中国银行股份有限公司 | Construction method of asset hosting wind control model, risk assessment method and device |
| CN115917553A (en) * | 2020-06-12 | 2023-04-04 | 甲骨文国际公司 | Entity-level data augmentation to enable robust named entity recognition in chat robots |
-
2023
- 2023-07-27 CN CN202310940361.3A patent/CN117056720A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112632179A (en) * | 2019-09-24 | 2021-04-09 | 北京国双科技有限公司 | Model construction method and device, storage medium and equipment |
| CN111310934A (en) * | 2020-02-14 | 2020-06-19 | 北京百度网讯科技有限公司 | Model generation method and device, electronic equipment and storage medium |
| CN115917553A (en) * | 2020-06-12 | 2023-04-04 | 甲骨文国际公司 | Entity-level data augmentation to enable robust named entity recognition in chat robots |
| CN114330297A (en) * | 2021-11-30 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Language model pre-training method, language text processing method and device |
| CN115080021A (en) * | 2022-05-13 | 2022-09-20 | 北京思特奇信息技术股份有限公司 | Zero code modeling method and system based on automatic machine learning |
| CN115423118A (en) * | 2022-09-06 | 2022-12-02 | 中国人民解放军军事科学院系统工程研究院 | Method, system and device for fine tuning of pre-training language model |
| CN115689770A (en) * | 2022-11-01 | 2023-02-03 | 中国银行股份有限公司 | Construction method of asset hosting wind control model, risk assessment method and device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2017329098B2 (en) | Method and device for processing question clustering in automatic question and answering system | |
| CN117725183A (en) | Reordering method and device for improving retrieval performance of AI large language model | |
| CN116842152A (en) | Science and technology policy question-answering method and device for fine-tuning language big model | |
| WO2023015841A1 (en) | Sql statement generation method, apparatus, and device based on artificial intelligence, and storage medium | |
| CN118113831A (en) | Question and answer data processing method, device, electronic device and storage medium | |
| CN118779229A (en) | A SQL statement verification method and device based on large model | |
| CN118503396A (en) | ERP system large model calling method, device and medium based on open prompt words | |
| CN116860930A (en) | Dialogue generation method and device | |
| CN118503376A (en) | Data processing method and device for multi-hop problem | |
| CN119783685A (en) | A large model prompt word optimization method and related device | |
| CN118569268A (en) | Semantic similarity-based large language model control type Token initialization and fine adjustment method and device | |
| CN104199882B (en) | A kind of acquisition methods of structural knowledge and its body based on the customization of intelligent masterplate | |
| CN120069098B (en) | A method, device and medium for fine-tuning a large vertical model | |
| CN117272959A (en) | Method and system for generating form low codes based on BERT model | |
| CN119739823B (en) | Controllable multi-hop problem generation method, system and equipment based on large language model | |
| CN117056720A (en) | Learning fine tuning method for pre-training language, computer device and computer readable storage medium | |
| CN119311853A (en) | Interaction method, device and electronic device based on large language model of electric power knowledge | |
| CN117252209B (en) | Automatic grading method, system, storage medium and processing terminal for themes in science | |
| CN118551002A (en) | A method, device, equipment and medium for generating question and answer data based on a large model | |
| CN118536491A (en) | Low-resource unified information extraction method based on high-efficiency fine adjustment of large model parameters | |
| CN114781477B (en) | Method, device and storage medium for training text matching model | |
| CN116127042A (en) | Text question-answer pair extraction method and device based on multiple models | |
| CN115658885A (en) | Text intelligent labeling method, system, intelligent terminal and storage medium | |
| CN117131181B (en) | Construction method of heterogeneous knowledge question-answer model, information extraction method and system | |
| CN119808964B (en) | Post-training method, device, electronic device and storage medium for large question-answering model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |