[go: up one dir, main page]

CN114757210A - Training method of translation model, sentence translation method, apparatus, equipment, program - Google Patents

Training method of translation model, sentence translation method, apparatus, equipment, program Download PDF

Info

Publication number
CN114757210A
CN114757210A CN202210220466.7A CN202210220466A CN114757210A CN 114757210 A CN114757210 A CN 114757210A CN 202210220466 A CN202210220466 A CN 202210220466A CN 114757210 A CN114757210 A CN 114757210A
Authority
CN
China
Prior art keywords
translation
sentence
translation model
training
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210220466.7A
Other languages
Chinese (zh)
Inventor
程信
严睿
刘乐茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Renmin University of China
Original Assignee
Tencent Technology Shenzhen Co Ltd
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Renmin University of China filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210220466.7A priority Critical patent/CN114757210A/en
Publication of CN114757210A publication Critical patent/CN114757210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for training a translation model, which comprises the following steps: obtaining a source terminal statement corresponding to the target translation memory statement in a translation memory library; forming training samples by the source statement and each target translation memory statement, and forming a training sample set by different training samples; processing different training samples in the training sample set through the translation model to determine an updating parameter of the translation model; and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set. The invention also provides a device, equipment, a software program and a storage medium. The method and the device can enable the trained translation model to be higher in accuracy and better in translation effect, and the embodiment of the invention can also be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

翻译模型的训练方法、语句翻译方法、装置、设备、程序Training method of translation model, sentence translation method, apparatus, equipment, program

技术领域technical field

本发明涉及机器翻译(MT,Machine Translation)技术,尤其涉及翻译模型的训练方法、语句翻译方法、装置、设备、软件程序及存储介质。The present invention relates to machine translation (MT, Machine Translation) technology, in particular to a training method of a translation model, a sentence translation method, an apparatus, a device, a software program and a storage medium.

背景技术Background technique

目前,在人们的工作、生活中经常需要将文本或语音进行翻译,一般情况下可以利用专门的翻译应用或者通过翻译网页进行机器翻译(MT,Machine Translation),但是机器翻译有时会出现翻译错误的情况,因此在行业内使用机器翻译技术时,结合机器辅助翻译(CAT,Computer-Aided Translation)是一种被广泛使用的做法。随着MT系统的进步和完善,出现了各种高效的CAT交互方式。At present, people often need to translate text or voice in their work and life. Generally, they can use special translation applications or translate web pages for machine translation (MT, Machine Translation), but machine translation sometimes results in translation errors. Therefore, when using machine translation technology in the industry, combining machine-aided translation (CAT, Computer-Aided Translation) is a widely used practice. With the progress and improvement of the MT system, various efficient CAT interaction methods have emerged.

随着机器翻译的发展,神经网络机器翻译(NMT,Neural Machine Translation)作为新一代的翻译技术得到普通应用。神经网络机器翻译系统基于编码器-解码器框架搭建,然而,在神经网络机器翻译系统的翻译过程中,解码器兼具多重任务,如记录当前翻译的内容、以及需要翻译的内容,记录翻译的流畅度的相关信息等。翻译记忆(TMTranslation Memory)是一个存储成对的源语言和目标语言片段的数据库。翻译人员可以在进行翻译的时候查阅此数据库来提升翻译的效率和一致性。在机器翻译社区,早起的工作主要聚焦于将翻译记忆融入到统计机器翻译模型中。近年来,随着神经机器翻译模型在各项翻译任务上取得了卓越的效果,越来越多的研究旨在将翻译记忆融入到神经翻译模型(NMT Neural Machine Translation)中,但是复杂的模型结构和冗余的翻译记忆影响了翻译模型的训练精度与训练速度,不利于翻译模型的广泛推广使用。With the development of machine translation, Neural Machine Translation (NMT) is commonly used as a new generation of translation technology. The neural network machine translation system is built based on the encoder-decoder framework. However, in the translation process of the neural network machine translation system, the decoder has multiple tasks, such as recording the content currently translated and the content that needs to be translated, and recording the translated content. Fluency-related information, etc. A translation memory (TMTranslation Memory) is a database that stores pairs of source and target language fragments. Translators can consult this database when translating to improve translation efficiency and consistency. In the machine translation community, early work focused on incorporating translation memory into statistical machine translation models. In recent years, as neural machine translation models have achieved excellent results in various translation tasks, more and more researches aim to integrate translation memory into neural translation models (NMT Neural Machine Translation), but the complex model structure And redundant translation memory affects the training accuracy and training speed of the translation model, which is not conducive to the widespread promotion and use of the translation model.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明实施例提供一种翻译模型的训练方法、装置、设备、软件程序及存储介质,能够减小翻译模型的模型复杂程度,通过对比检索选择与待翻译语句相似的翻译记忆语句,可以减少相关技术中额外的记忆网络带来的网络结构复杂,影响训练速度的问题以及使用时翻译时间过长的问题,同时针对翻译记忆语句的冗余,可以通过翻译记忆融合,利用注意力机制来捕捉不同翻译机制相似性,保证翻译记忆的多样性(也就是训练样本的多样性),使得经过训练翻译模型的精确度更高,翻译效果更好,提升用户的使用体验。In view of this, the embodiments of the present invention provide a training method, device, equipment, software program and storage medium for a translation model, which can reduce the model complexity of the translation model, and select translation memory sentences similar to the sentences to be translated through comparison and retrieval. , which can reduce the complex network structure caused by the additional memory network in the related technology, the problem of affecting the training speed and the problem of too long translation time in use. Mechanism to capture the similarity of different translation mechanisms, to ensure the diversity of translation memory (that is, the diversity of training samples), so that the accuracy of the trained translation model is higher, the translation effect is better, and the user experience is improved.

本发明实施例的技术方案是这样实现的:The technical solution of the embodiment of the present invention is realized as follows:

本发明实施例提供了一种翻译模型的训练方法,所述训练方法包括:An embodiment of the present invention provides a training method for a translation model, and the training method includes:

获取目标翻译记忆语句;Get the target translation memory sentence;

在翻译记忆库中获取与所述目标翻译记忆语句对应的源端语句;Obtaining the source-end sentence corresponding to the target translation memory sentence in the translation memory;

将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合;Forming the source-end sentence and each target translation memory sentence into a training sample, and forming different training samples into a training sample set;

获取翻译模型的初始参数;Get the initial parameters of the translation model;

响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数;In response to the initial parameters of the translation model, different training samples in the training sample set are processed by the translation model to determine the update parameters of the translation model;

根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新。According to the update parameters of the translation model, the encoder parameters and decoder parameters of the translation model are iteratively updated through different training samples in the training sample set.

本发明实施例提供了一种包括:The embodiment of the present invention provides a method comprising:

通过翻译模型的编码器,确定与待翻译语句所对应的至少一个词语级的隐变量;Determine at least one word-level latent variable corresponding to the sentence to be translated through the encoder of the translation model;

通过所述翻译模型的解码器,根据所述至少一个词语级的隐变量,生成与所述词语级的隐变量相对应的翻译词语以及所述翻译词语的被选取概率;Through the decoder of the translation model, according to the at least one word-level latent variable, a translation word corresponding to the word-level latent variable and a selection probability of the translated word are generated;

根据所述翻译结果的被选取概率,选取至少一个翻译词语组成与所述待翻译语句相对应的翻译结果;According to the selected probability of the translation result, at least one translation word is selected to form a translation result corresponding to the to-be-translated sentence;

输出所述翻译结果。The translation result is output.

本发明实施例还提供了一种翻译模型的训练装置,所述训练装置包括:The embodiment of the present invention also provides a training device for a translation model, the training device includes:

数据传输模块,用于获取目标翻译记忆语句;The data transmission module is used to obtain the target translation memory sentence;

翻译模型训练模块,用于在翻译记忆库中获取与所述目标翻译记忆语句对应的源端语句;a translation model training module, used to obtain the source-end sentence corresponding to the target translation memory sentence in the translation memory;

所述翻译模型训练模块,用于将所述源端语句和所述目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合;The translation model training module is used to form a training sample from the source sentence and the target translation memory sentence, and form a training sample set from different training samples;

所述翻译模型训练模块,用于获取翻译模型的初始参数;The translation model training module is used to obtain the initial parameters of the translation model;

所述翻译模型训练模块,用于响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数;the translation model training module, configured to process different training samples in the training sample set by the translation model in response to the initial parameters of the translation model, and determine the update parameters of the translation model;

所述翻译模型训练模块,用于根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新。The translation model training module is configured to iteratively update the encoder parameters and decoder parameters of the translation model through different training samples in the training sample set according to the update parameters of the translation model.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于获取所述待翻译语句的最大长度与任一翻译记忆语句的最大长度;The translation model training module is used to obtain the maximum length of the statement to be translated and the maximum length of any translation memory statement;

所述翻译模型训练模块,用于获取所述待翻译语句与所述任一翻译记忆语句的词元距离,The translation model training module is used to obtain the word element distance between the to-be-translated statement and any translation memory statement,

所述翻译模型训练模块,用于基于所述词元距离、所述待翻译语句的最大长度和所述任一翻译记忆语句的最大长度,确定所述待翻译语句与所述任一翻译记忆语句的相似度;The translation model training module is used to determine the to-be-translated sentence and any of the translation-memory sentences based on the morphological distance, the maximum length of the sentence to be translated and the maximum length of any translation memory sentence similarity;

所述翻译模型训练模块,用于当所述相似度大于等于相似度阈值时,确定所述任一翻译记忆语句为所述待翻译语句对应的原始翻译记忆语句。The translation model training module is configured to determine that any translation memory sentence is the original translation memory sentence corresponding to the to-be-translated sentence when the similarity is greater than or equal to a similarity threshold.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于通过注意力函数计算每一个翻译记忆语句对应的注意力值;The translation model training module is used to calculate the attention value corresponding to each translation memory sentence through the attention function;

所述翻译模型训练模块,用于将相同注意力值的翻译记忆语句融合为同一翻译记忆语句;或者The translation model training module is used to fuse translation memory sentences with the same attention value into the same translation memory sentence; or

所述翻译模型训练模块,用于向相同注意力值的翻译记忆语句融合为训练样本子集中的不同训练样本。The translation model training module is used to fuse translation memory sentences with the same attention value into different training samples in the training sample subset.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于确定与所述翻译模型的使用环境相匹配的动态噪声阈值;The translation model training module is used to determine a dynamic noise threshold matching the use environment of the translation model;

所述翻译模型训练模块,用于根据所述动态噪声阈值对所述训练样本集合进行去噪处理,以形成与所述动态噪声阈值相匹配的去噪训练样本集合;或者,the translation model training module, configured to perform denoising processing on the training sample set according to the dynamic noise threshold to form a denoising training sample set matching the dynamic noise threshold; or,

所述翻译模型训练模块,用于确定与所述翻译模型相对应的固定噪声阈值,并根据所述固定噪声阈值对所述训练样本集合进行去噪处理,以形成与所述固定噪声阈值相匹配的去噪训练样本集合。The translation model training module is configured to determine a fixed noise threshold corresponding to the translation model, and perform denoising processing on the training sample set according to the fixed noise threshold, so as to form a match with the fixed noise threshold The set of denoising training samples.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于对所述训练样本集合进行负例处理,以形成与所述训练样本集合相对应的负例样本集合,其中,所述负例样本集合用于对所述翻译模型的编码器参数和解码器参数调整。The translation model training module is configured to perform negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for the translation Encoder parameter and decoder parameter tuning of the model.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于确定所述训练样本集合中;The translation model training module is used to determine the training sample set;

所述翻译模型训练模块,用于确定所述翻译模型对应的监督函数;The translation model training module is used to determine the supervision function corresponding to the translation model;

所述翻译模型训练模块,用于调整所述监督函数的温度系数;The translation model training module is used to adjust the temperature coefficient of the supervision function;

所述翻译模型训练模块,用于基于所述训练集合中任意两个翻译记忆语句的向量相似度和不同的温度系数,通过所述监督函数对对所述训练样本集合进行负例处理,以形成与所述训练样本集合相对应的负例样本集合。The translation model training module is configured to perform negative processing on the training sample set through the supervision function based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training set to form A set of negative samples corresponding to the set of training samples.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于将所述翻译模型的解码器中待输出语句进行随机组合,以形成与所述训练样本集合相对应的负例样本集合;或者,The translation model training module is used to randomly combine the sentences to be output in the decoder of the translation model to form a negative sample set corresponding to the training sample set; or,

所述翻译模型训练模块,用于对所述翻译模型的解码器中待输出语句进行随机删除处理或替换处理以形成与所述训练样本集合相对应的负例样本集合。The translation model training module is configured to randomly delete or replace the sentences to be output in the decoder of the translation model to form a negative sample set corresponding to the training sample set.

上述方案中,In the above scheme,

所述翻译模型训练模块,用于将所述训练样本集合中不同训练样本,代入由所述翻译模型的编码器和所述解码器构成的自编码网络对应的损失函数;The translation model training module is used to substitute different training samples in the training sample set into the loss function corresponding to the auto-encoding network formed by the encoder of the translation model and the decoder;

所述翻译模型训练模块,用于确定所述损失函数满足收敛条件时对应所述翻译模型中编码器的参数和相应的解码器参数作为所述翻译模型的更新参数。The translation model training module is used to determine that when the loss function satisfies the convergence condition, the parameters corresponding to the encoder and the corresponding decoder parameters in the translation model are used as update parameters of the translation model.

本发明实施例还提供了一种语句翻译装置,装置包括:The embodiment of the present invention also provides a sentence translation device, the device includes:

编码器模块,用于通过翻译模型的编码器,确定与待翻译语句所对应的至少一个词语级的隐变量;The encoder module is used to determine at least one word-level latent variable corresponding to the sentence to be translated through the encoder of the translation model;

解码器模块,用于通过所述翻译模型的解码器,根据所述至少一个词语级的隐变量,生成与所述词语级的隐变量相对应的翻译词语以及所述翻译词语的被选取概率;A decoder module, configured to generate a translation word corresponding to the word-level latent variable and a selected probability of the translated word according to the at least one word-level latent variable through the decoder of the translation model;

所述解码器模块,用于根据所述翻译结果的被选取概率,选取至少一个翻译词语组成与所述待翻译语句相对应的翻译结果;The decoder module is configured to select at least one translation word to form a translation result corresponding to the statement to be translated according to the selected probability of the translation result;

所述解码器模块,用于输出所述翻译结果。The decoder module is used for outputting the translation result.

本发明实施例还提供了一种电子设备,所述电子设备包括:The embodiment of the present invention also provides an electronic device, the electronic device includes:

存储器,用于存储可执行指令;memory for storing executable instructions;

处理器,用于运行所述存储器存储的可执行指令时,实现前序的翻译模型的训练方法。The processor is configured to implement the training method of the pre-order translation model when executing the executable instructions stored in the memory.

本发明实施例还提供了一种计算机可读存储介质,存储有可执行指令,其特征在于,所述可执行指令被处理器执行时实现前序的翻译模型的训练方法,或者前序的语句翻译方法。Embodiments of the present invention also provide a computer-readable storage medium storing executable instructions, characterized in that, when the executable instructions are executed by a processor, a training method for a pre-order translation model, or a pre-order statement is implemented. translation method.

本发明实施例具有以下有益效果:The embodiment of the present invention has the following beneficial effects:

本发明所提供的技术方案通过获取待翻译语句,并基于所述待翻译语句通过对比检索获取至少两个原始翻译记忆语句;对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到目标翻译记忆语句;基于每一个目标翻译记忆语句,从翻译记忆库中获取对应的源端语句;将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合;获取翻译模型的初始参数;响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数;根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新,由此,通过对比检索选择与待翻译语句相似的翻译记忆语句,可以减少相关技术中额外的记忆网络带来的网络结构复杂,影响训练速度的问题以及使用时翻译时间过长的问题,同时针对翻译记忆语句的冗余,可以通过翻译记忆融合,利用注意力机制来捕捉不同翻译机制相似性,保证翻译记忆的多样性(也就是训练样本的多样性),使得经过训练翻译模型的精确度更高,翻译效果更好,提升用户的使用体验;同时还可以有效充分利用已有的翻译记忆语句对模型训练的增益,使得翻译模型能够适应不同的使用场景。The technical solution provided by the present invention is by obtaining sentences to be translated, and obtaining at least two original translation memory sentences through comparative retrieval based on the sentences to be translated; and performing translation memory fusion processing on the at least two obtained original translation memory sentences to obtain target translation memory sentence; based on each target translation memory sentence, obtain the corresponding source end sentence from the translation memory bank; form the source end sentence and each target translation memory sentence into a training sample, and combine different training samples forming a training sample set; obtaining the initial parameters of the translation model; in response to the initial parameters of the translation model, processing different training samples in the training sample set by the translation model to determine the update parameters of the translation model; According to the update parameters of the translation model, the encoder parameters and decoder parameters of the translation model are iteratively updated through different training samples in the training sample set. Translation memory sentences can reduce the complex network structure caused by the additional memory network in related technologies, the problems that affect the training speed, and the problem that the translation time is too long during use. The attention mechanism is used to capture the similarity of different translation mechanisms to ensure the diversity of translation memory (that is, the diversity of training samples), so that the trained translation model has higher accuracy, better translation effect, and improved user experience; At the same time, it can effectively make full use of the gain of the existing translation memory sentences for model training, so that the translation model can adapt to different usage scenarios.

附图说明Description of drawings

图1为本发明实施例提供的翻译模型训练方法的使用场景示意图;1 is a schematic diagram of a usage scenario of a translation model training method provided by an embodiment of the present invention;

图2为本发明实施例提供的翻译模型的训练装置的组成结构示意图;2 is a schematic diagram of the composition and structure of a training device for a translation model provided by an embodiment of the present invention;

图3为传统方案中生成翻译结果的示意图;Fig. 3 is the schematic diagram of generating translation result in the traditional scheme;

图4为本发明实施例提供的翻译模型的训练方法一个可选的流程示意图;4 is an optional schematic flowchart of a training method for a translation model provided by an embodiment of the present invention;

图5为本发明实施例提供的翻译模型的训练方法一个可选的流程示意图;5 is an optional schematic flowchart of a training method for a translation model provided by an embodiment of the present invention;

图6为本发明实施例提供的翻译模型的训练方法一个可选的流程示意图;6 is an optional schematic flowchart of a training method for a translation model provided by an embodiment of the present invention;

图7为翻译模型的精确度测试结果示意图;Fig. 7 is a schematic diagram of the accuracy test result of the translation model;

图8为翻译模型的训练效率测试结果示意图;Fig. 8 is the schematic diagram of the training efficiency test result of the translation model;

图9为本申请翻译模型训练方法中不同翻译记忆语句数量的测试结果示意图;9 is a schematic diagram of the test results of the number of different translation memory sentences in the translation model training method of the application;

图10为本发明实施例所提供的翻译模型的前端显示界面示意图。FIG. 10 is a schematic diagram of a front-end display interface of a translation model provided by an embodiment of the present invention.

图11为本发明实施例中翻译模型一个可选的结构示意图;11 is an optional structural schematic diagram of a translation model in an embodiment of the present invention;

图12为本发明实施例中翻译模型一个可选的翻译过程示意图;12 is a schematic diagram of an optional translation process of the translation model in the embodiment of the present invention;

图13为本发明实施例中翻译模型中编码器一个可选的结构示意图;13 is an optional structural schematic diagram of an encoder in a translation model according to an embodiment of the present invention;

图14为本发明实施例中翻译模型中编码器的向量拼接示意图;14 is a schematic diagram of vector splicing of an encoder in a translation model according to an embodiment of the present invention;

图15为本发明实施例中翻译模型中编码器的编码过程示意图;15 is a schematic diagram of an encoding process of an encoder in a translation model according to an embodiment of the present invention;

图16为本发明实施例中翻译模型中解码器的解码过程示意图;16 is a schematic diagram of a decoding process of a decoder in a translation model according to an embodiment of the present invention;

图17为本发明实施例中翻译模型的输出效果示意图。FIG. 17 is a schematic diagram of an output effect of a translation model in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,所描述的实施例不应视为对本发明的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. All other embodiments obtained under the premise of creative work fall within the protection scope of the present invention.

在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same or a different subset of all possible embodiments, and Can be combined with each other without conflict.

对本发明实施例进行进一步详细说明之前,对本发明实施例中涉及的名词和术语进行说明,本发明实施例中涉及的名词和术语适用于如下的解释。Before further describing the embodiments of the present invention in detail, the terms and terms involved in the embodiments of the present invention are described. The terms and terms involved in the embodiments of the present invention are applicable to the following explanations.

1)待翻译语句,在进行语言转换之前输入翻译模型的对应某种自然语言的语句。1) For the sentence to be translated, the sentence corresponding to a certain natural language of the translation model is input before the language conversion is performed.

2)翻译结果,翻译模型输出的对源语句进行语言转换之后的对应某种自然语言的语句。2) The translation result, which is a sentence corresponding to a certain natural language output by the translation model after the language conversion of the source sentence.

3)参考语句,预先设置的对源语句进行语言转换之后的对应某种自然语言的参考标准。3) Reference sentence, a preset reference standard corresponding to a certain natural language after the language conversion of the source sentence.

4)忠实度,表征目标语句的内容与源语句内容接近程度的介于0和1之间的参数,作为评价对源语句翻译准确度高低的标准,值越大表明目标语句的内容与源语句内容接近程度越高,即翻译的准确度越高。4) Loyalty, a parameter between 0 and 1 that characterizes the closeness of the content of the target sentence to the content of the source sentence, as a criterion for evaluating the accuracy of the translation of the source sentence. The closer the content is, the more accurate the translation will be.

5)翻译,将一种自然语言的语句转换成另一种自然语言的语句。5) Translation, converting sentences in one natural language into sentences in another natural language.

6)神经网络(Neural Network,NN):人工神经网络(Artificial Neural Network,ANN),简称神经网络或类神经网络,在机器学习和认知科学领域,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。6) Neural Network (NN): Artificial Neural Network (ANN), referred to as neural network or neural-like network, in the field of machine learning and cognitive science, is a kind of imitation biological neural network (animal center) Mathematical or computational models of the structure and function of the nervous system, especially the brain, used to estimate or approximate functions.

7)机器翻译(Machine Translation,MT):属于计算语言学的范畴,其研究借由计算机程序将文字或演说从一种自然语言翻译成另一种自然语言。神经网络机器翻译(Neural Machine Translation,NMT)是使用神经网络技术进行机器翻译的一种技术。7) Machine Translation (MT): It belongs to the category of computational linguistics, which studies the translation of text or speech from one natural language into another natural language by means of computer programs. Neural Machine Translation (NMT) is a technology that uses neural network technology for machine translation.

9)语音翻译(Speech Translation):又称自动语音翻译,是通过计算机将一种自然语言的语音翻译为另一种自然语言的文本或语音的技术,一般可以由语音识别和机器翻译两阶段组成。9) Speech Translation: Also known as automatic speech translation, it is a technology that translates speech in a natural language into text or speech in another natural language through a computer. Generally, it can be composed of two stages of speech recognition and machine translation. .

10)编码器-解码器结构:机器翻译技术常用的网络结构。由编码器和解码器两部分组成,编码器将输入的文本转换为一系列能够表达输入文本特征的上下文向量,解码器接收编码器输出的结果作为自己的输入,输出对应的另一种语言的文本序列。10) Encoder-decoder structure: The network structure commonly used in machine translation technology. It consists of an encoder and a decoder. The encoder converts the input text into a series of context vectors that can express the characteristics of the input text. The decoder receives the result of the encoder output as its own input and outputs the corresponding text in another language. text sequence.

11)翻译记忆库(translation memory),记录自然语言句子(或句子片段) 及其译文的数据库。作为机器辅助翻译系统的核心部件,随着用户的使用而建立并扩大,用来消除重复翻译而提高工作效率,举例来说,翻译记忆库中可以存储双语互译句对(样本对),其中的双语互译句对可以是人工翻译的,也可以是通过其他途径收集而来的(例如通过网络爬虫程序收集的网页信息中的翻译信息),如果用户提供的待翻译语句是“我想吃汉堡”,则可以翻译记忆库中检索到对应的原始翻译记忆语句“我喜欢吃汉堡”,“我想吃薯条”,以及与原始翻译记忆语句对应的源端语句“Ilike eat Hamburger”,“I want to eat Frenchfries”,可以将上述检索到的{我喜欢吃汉堡,I like eat Hamburger},{我想吃薯条,Iwant to eat French fries}编码到翻译模型中,来指导翻译模型解码,以得到翻译模型输出的与待翻译语句对应的翻译结果“I want to eat Hamburger”。在一些实施例中,也可以将源端语句和目标翻译记忆语句组成训练样本,对翻译模型进行训练,以提升翻译模型的翻译准确性。11) Translation memory, a database that records natural language sentences (or sentence fragments) and their translations. As the core component of the machine-assisted translation system, it is established and expanded with the use of users to eliminate duplicate translations and improve work efficiency. For example, translation memory can store bilingual translation sentence pairs (sample pairs), in which The bilingual translation sentence pairs can be manually translated or collected through other means (such as translation information in web page information collected by web crawler programs). If the sentence to be translated provided by the user is "I want to eat Hamburger", the corresponding original translation memory sentences "I like to eat hamburger", "I want to eat French fries", and the source-side sentences corresponding to the original translation memory sentences "Ilike eat Hamburger", " I want to eat Frenchfries", the above retrieved {I like to eat hamburger, I like eat Hamburger}, {I want to eat French fries, Iwant to eat French fries} can be encoded into the translation model to guide the translation model to decode, In order to obtain the translation result "I want to eat Hamburger" outputted by the translation model and corresponding to the sentence to be translated. In some embodiments, the source sentence and the target translation memory sentence may also be composed of training samples to train the translation model, so as to improve the translation accuracy of the translation model.

图1为本发明实施例提供的翻译模型训练方法的使用场景示意图,参考图1,终端(包括终端10-1和终端10-2)上设置有翻译软件的客户端,用户通过所设置的翻译软件客户端可以输入相应的待翻译语句,聊天客户端也可以接收相应的翻译结果,并将所接收的翻译结果向用户进行展示;终端通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。FIG. 1 is a schematic diagram of a usage scenario of a translation model training method provided by an embodiment of the present invention. Referring to FIG. 1 , a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a client of translation software. The software client can input the corresponding sentence to be translated, and the chat client can also receive the corresponding translation result, and display the received translation result to the user; the terminal is connected to the server 200 through the network 300, and the network 300 can be a wide area network or a local area network, Or a combination of the two, using a wireless link for data transmission.

作为一个示例,服务器200用于布设所述翻译模型并对所述翻译模型进行训练,以更新所述翻译模型中编码器网络的参数和解码器网络的参数,以实现将通过翻译模型中编码器网络和解码器网络生成针对目标待翻译语句的翻译结果,并通过终端(终端10-1和/或终端10-2)展示翻译模型所生成的与待翻译语句相对应的翻译结果。为了更好地理解本申请实施例提供的方法,首先对人工智能、人工智能的各个分支,以及本申请实施例提供的方法所涉及的应用领域、云技术和人工智能云服务进行说明。As an example, the server 200 is configured to deploy the translation model and train the translation model, so as to update the parameters of the encoder network and the decoder network in the translation model, so as to realize The network and the decoder network generate translation results for the target sentence to be translated, and display the translation results corresponding to the sentence to be translated generated by the translation model through the terminal (terminal 10-1 and/or terminal 10-2). In order to better understand the methods provided by the embodiments of the present application, artificial intelligence, various branches of artificial intelligence, and application fields, cloud technologies, and artificial intelligence cloud services involved in the methods provided by the embodiments of the present application will be described first.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。以下对各个方向分别进行说明。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning. Each direction will be described below.

自然语言处理(NLP,Nature Language processin)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。Natural language processing (NLP, Nature Language processing) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.

机器学习(ML,Machine Learning)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习等技术。Machine Learning (ML, Machine Learning) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other techniques.

云技术(Cloud technology)是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。云技术基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。Cloud technology refers to a kind of hosting technology that unifies a series of resources such as hardware, software, and network in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. Cloud technology is based on the general term of network technology, information technology, integration technology, management platform technology, application technology, etc. applied in the cloud computing business model. It can form a resource pool, which can be used on demand and is flexible and convenient. Cloud computing technology will become an important support. Background services of technical network systems require a lot of computing and storage resources, such as video websites, picture websites and more portal websites. With the high development and application of the Internet industry, in the future, each item may have its own identification mark, which needs to be transmitted to the back-end system for logical processing. Data of different levels will be processed separately, and all kinds of industry data need to be strong. The system backing support can only be achieved through cloud computing.

所谓人工智能云服务,一般也被称作是AI即服务(AIaaS,AI as a Service),是目前主流的一种人工智能平台的服务方式,具体来说AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过API接口的方式来接入使用平台提供的一种或者是多种人工智能服务,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自己专属的云人工智能服务。The so-called artificial intelligence cloud service, also known as AI as a Service (AIaaS, AI as a Service), is the current mainstream service mode of artificial intelligence platform. Specifically, the AIaaS platform will provide several types of common AI services. Split and provide standalone or packaged services in the cloud. This service model is similar to opening an AI-themed mall: all developers can access one or more artificial intelligence services provided by the platform through API interfaces, and some senior developers can also use the platform Provided AI framework and AI infrastructure to deploy and operate their own exclusive cloud AI services.

本申请实施例提供的方案涉及人工智能的自然语言处理、机器学习、人工智能云服务等技术,具体通过如下实施例进行说明。The solutions provided in the embodiments of the present application involve technologies such as artificial intelligence natural language processing, machine learning, and artificial intelligence cloud services, and are specifically described by the following embodiments.

将结合本申请实施例提供的终端的示例性应用和实施,说明本申请实施例提供的基于翻译模型训练方法。The translation model-based training method provided by the embodiment of the present application will be described with reference to the exemplary application and implementation of the terminal provided by the embodiment of the present application.

当然在通过翻译模型对目标待翻译语句进行处理以生成相应的翻译结果之前,还需要对翻译模型进行训练,具体包括:获取待翻译语句,并基于所述待翻译语句通过对比检索获取至少两个原始翻译记忆语句;对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到目标翻译记忆语句;基于每一个目标翻译记忆语句,从翻译记忆库中获取对应的源端语句;将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合;获取翻译模型的初始参数;响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数;根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新。Of course, before the target sentence to be translated is processed by the translation model to generate the corresponding translation result, the translation model needs to be trained, which specifically includes: obtaining the sentence to be translated, and obtaining at least two sentences based on the sentence to be translated through comparative retrieval original translation memory sentences; perform translation memory fusion processing on at least two obtained original translation memory sentences to obtain target translation memory sentences; based on each target translation memory sentence, obtain the corresponding source-end sentences from the translation memory; The source sentence and each target translation memory sentence form a training sample, and different training samples form a training sample set; obtain the initial parameters of the translation model; in response to the initial parameters of the translation model, through the translation model Process different training samples in the training sample set to determine the update parameters of the translation model; according to the update parameters of the translation model, encode the translation model through different training samples in the training sample set Iteratively update the decoder parameters and decoder parameters.

下面对本发明实施例的翻译模型的训练装置的结构做详细说明,翻译模型的训练装置可以各种形式来实施,如本申请实施例中的电子设备可以为带有翻译模型训练功能的专用终端,也可以为设置有翻译模型训练功能的服务器,例如前序图1中的服务器200。图2为本发明实施例提供的翻译模型的训练装置的组成结构示意图,可以理解,图2仅仅示出了翻译模型的训练装置的示例性结构而非全部结构,根据需要可以实施图2示出的部分结构或全部结构。The structure of the training device for the translation model according to the embodiment of the present invention will be described in detail below. The training device for the translation model may be implemented in various forms. For example, the electronic device in the embodiment of the present application may be a dedicated terminal with a translation model training function. It may also be a server provided with a translation model training function, such as the server 200 in the preceding figure. FIG. 2 is a schematic diagram of the composition and structure of a training device for a translation model provided by an embodiment of the present invention. It can be understood that FIG. 2 only shows an exemplary structure of the training device for a translation model, but not the entire structure. part or all of the structure.

本发明实施例提供的翻译模型的训练装置包括:至少一个处理器201、存储器202、用户接口203和至少一个网络接口204。翻译模型的训练装置中的各个组件通过总线系统205耦合在一起。可以理解,总线系统205用于实现这些组件之间的连接通信。总线系统205除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统205。The apparatus for training a translation model provided by the embodiment of the present invention includes: at least one processor 201 , a memory 202 , a user interface 203 , and at least one network interface 204 . The various components in the training apparatus of the translation model are coupled together by the bus system 205 . It will be understood that the bus system 205 is used to implement the connection communication between these components. In addition to the data bus, the bus system 205 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, the various buses are labeled as bus system 205 in FIG. 2 .

其中,用户接口203可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。The user interface 203 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch pad or a touch screen, and the like.

可以理解,存储器202可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。本发明实施例中的存储器202能够存储数据以支持终端(如10-1)的操作。这些数据的示例包括:用于在终端(如10-1)上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。It will be appreciated that the memory 202 may be either volatile memory or non-volatile memory, and may include both volatile and non-volatile memory. The memory 202 in the embodiment of the present invention can store data to support the operation of the terminal (eg 10-1). Examples of such data include: any computer program used to operate on the terminal (eg 10-1), such as operating systems and applications. Among them, the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks. Applications can contain various applications.

在一些实施例中,本发明实施例提供的翻译模型的训练装置可以采用软硬件结合的方式实现,作为示例,本发明实施例提供的翻译模型训练装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本发明实施例提供的翻译模型训练方法。例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,ApplicationSpecific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable LogicDevice)、复杂可编程逻辑器件(CPLD, Complex Programmable Logic Device)、现场可编程门阵列(FPGA, Field-Programmable Gate Array)或其他电子元件。In some embodiments, the apparatus for training translation models provided by the embodiments of the present invention may be implemented in a combination of software and hardware. As an example, the apparatus for training translation models provided in the embodiments of the present invention may be processed in the form of hardware decoding processors. It is programmed to execute the translation model training method provided by the embodiment of the present invention. For example, the processor in the form of a hardware decoding processor may adopt one or more Application Specific Integrated Circuits (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.

作为本发明实施例提供的翻译模型的训练装置采用软硬件结合实施的示例,本发明实施例所提供的翻译模型的训练装置可以直接体现为由处理器201 执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器202,处理器201读取存储器202中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器201以及连接到总线205的其他组件)完成本发明实施例提供的翻译模型训练方法。As an example in which the apparatus for training the translation model provided by the embodiment of the present invention adopts a combination of software and hardware, the apparatus for training the translation model provided in the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, and the software modules may be located in the In the storage medium, the storage medium is located in the memory 202, and the processor 201 reads the executable instructions included in the software module in the memory 202, and implements the implementation of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205). Example of the translation model training method provided.

作为示例,处理器201可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。As an example, the processor 201 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices , discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor, or the like.

作为本发明实施例提供的翻译模型的训练装置采用硬件实施的示例,本发明实施例所提供的装置可以直接采用硬件译码处理器形式的处理器201来执行完成,例如,被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,ComplexProgrammable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable GateArray)或其他电子元件执行实现本发明实施例提供的翻译模型训练方法。As an example in which the apparatus for training the translation model provided by the embodiment of the present invention is implemented in hardware, the apparatus provided by the embodiment of the present invention may be directly executed by the processor 201 in the form of a hardware decoding processor, for example, by one or more Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field- Programmable GateArray) or other electronic components to implement the translation model training method provided by the embodiment of the present invention.

本发明实施例中的存储器202用于存储各种类型的数据以支持翻译模型的训练装置的操作。这些数据的示例包括:用于在翻译模型的训练装置上操作的任何可执行指令,如可执行指令,实现本发明实施例的从翻译模型训练方法的程序可以包含在可执行指令中。The memory 202 in the embodiment of the present invention is used to store various types of data to support the operation of the training apparatus of the translation model. Examples of these data include: any executable instructions for operating on the translation model training device, such as executable instructions, a program implementing the method for training from translation models according to embodiments of the present invention may be included in the executable instructions.

在另一些实施例中,本发明实施例提供的翻译模型的训练装置可以采用软件方式实现,图2示出了存储在存储器202中的翻译模型的训练装置,其可以是程序和插件等形式的软件,并包括一系列的模块,作为存储器202中存储的程序的示例,可以包括翻译模型的训练装置,翻译模型的训练装置中包括以下的软件模块:数据传输模块2081,翻译模型训练模块2082。当翻译模型的训练装置中的软件模块被处理器201读取到RAM中并执行时,将实现本发明实施例提供的翻译模型训练方法,下面介绍本发明实施例中翻译模型的训练装置中各个软件模块的功能,其中,In other embodiments, the training device for the translation model provided by the embodiments of the present invention may be implemented in software. FIG. 2 shows the training device for the translation model stored in the memory 202, which may be in the form of programs and plug-ins. The software includes a series of modules. As an example of the program stored in the memory 202, it may include a training device for a translation model. The training device for a translation model includes the following software modules: a data transmission module 2081, and a translation model training module 2082. When the software module in the translation model training device is read into the RAM by the processor 201 and executed, the translation model training method provided by the embodiment of the present invention will be implemented. The following describes each of the translation model training device in the embodiment of the present invention. software module functions, which,

数据传输模块2081,用于获取目标翻译记忆语句。The data transmission module 2081 is used to obtain the target translation memory sentence.

翻译模型训练模块2082,用于在翻译记忆库中获取与所述目标翻译记忆语句对应的源端语句。The translation model training module 2082 is configured to acquire the source-end sentence corresponding to the target translation memory sentence in the translation memory.

所述翻译模型训练模块2082,用于基于每一个目标翻译记忆语句,从翻译记忆库中获取对应的源端语句。The translation model training module 2082 is configured to acquire the corresponding source sentence from the translation memory based on each target translation memory sentence.

所述翻译模型训练模块2082,用于将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合。The translation model training module 2082 is configured to compose the source sentence and each target translation memory sentence into a training sample, and compose different training samples into a training sample set.

所述翻译模型训练模块2082,用于获取翻译模型的初始参数。The translation model training module 2082 is used to obtain initial parameters of the translation model.

所述翻译模型训练模块2082,用于响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数。The translation model training module 2082 is configured to, in response to the initial parameters of the translation model, process different training samples in the training sample set through the translation model to determine the update parameters of the translation model.

所述翻译模型训练模块2082,用于根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新。The translation model training module 2082 is configured to iteratively update the encoder parameters and decoder parameters of the translation model through different training samples in the training sample set according to the update parameters of the translation model.

在本发明的一些实施例中,经过训练的翻译模型进行部署时,实施例中的电子设备还可以包括语句翻译装置,具体来说,语句翻译装置包括:In some embodiments of the present invention, when the trained translation model is deployed, the electronic device in the embodiment may further include a sentence translation apparatus. Specifically, the sentence translation apparatus includes:

编码器模块,用于通过翻译模型的编码器,确定与待翻译语句所对应的至少一个词语级的隐变量。解码器模块,用于通过所述翻译模型的解码器,根据所述至少一个词语级的隐变量,生成与所述词语级的隐变量相对应的翻译词语以及所述翻译词语的被选取概率。所述解码器模块,用于根据所述翻译结果的被选取概率,选取至少一个翻译词语组成与所述待翻译语句相对应的翻译结果。所述解码器模块,用于输出所述翻译结果。The encoder module is used for determining at least one word-level latent variable corresponding to the sentence to be translated through the encoder of the translation model. The decoder module is configured to generate, through the decoder of the translation model, a translation word corresponding to the word-level latent variable and a selection probability of the translated word according to the at least one word-level latent variable. The decoder module is configured to select at least one translation word to form a translation result corresponding to the to-be-translated sentence according to the selected probability of the translation result. The decoder module is used for outputting the translation result.

结合图2示出的翻译模型的训练装置说明本发明实施例提供的翻译模型的训练方法,在介绍本发明实施例提供的翻译模型的训练方法之前,首先介绍本申请中翻译模型根据待翻译语句生成相应翻译结果的过程中,图3为传统方案中生成翻译结果的示意图,其中,以Transformer框架作为翻译模型的结构为例,主要包括encoder-embeddings layer(编码器词向量层)、encoder/decoder ith(1≦ ith≦6)layer(编码器/解码器1~6层)、decoder-embeddings layer(解码器词向量层)和decoder softmax layer层(解码器输出层)等功能组件。其中 encoder/decoder每一层也是由其他一些基本单元构成。所有部件有机地组合在一起构成网络中的一层,然后层层堆叠构成整个网络。其中encoder各层将输入句子(源语言)转换成语义向量,而decoder各层将语义向量转换为输出句子(目标语言),这一过程,在使用翻译记忆帮助训练时,首先进行TM检索。这一步会在数据库中通过某种相似性度量的方式进行检索和当前输入相似的翻译片段。之后使用额外的双编码器(Dual-Encoder)结构或者直接用一个统一的词表将TM和X结合起来,输入给翻译模型进行训练。这种方式的缺陷在于从数据库中依次挑选和当前翻译的句子最相似的句子,但是忽略了已检索到的句子,导致所有检索的句子相互之间彼此重复,冗余的内容带来了较少的信息增益信息,影响了模型训练效率,同时,相关技术的翻译模型将检索到的多个TM单独处理,而不是将它们看成一个统一的整体,影响了翻译模型的翻译效果。The training method of the translation model provided by the embodiment of the present invention is described in conjunction with the training device of the translation model shown in FIG. 2 . Before introducing the training method of the translation model provided by the embodiment of the present invention, the translation model in this application is first introduced according to the sentence to be translated. In the process of generating the corresponding translation result, Figure 3 is a schematic diagram of generating the translation result in the traditional scheme, in which, taking the Transformer framework as the structure of the translation model as an example, it mainly includes the encoder-embeddings layer (encoder word vector layer), encoder/decoder ith (1≦ ith≦6) layer (encoder/decoder 1 to 6 layers), decoder-embeddings layer (decoder word vector layer) and decoder softmax layer (decoder output layer) and other functional components. Among them, each layer of encoder/decoder is also composed of some other basic units. All components are organically combined to form a layer in the network, and then stacked layer by layer to form the entire network. Among them, each layer of encoder converts the input sentence (source language) into a semantic vector, and each layer of the decoder converts the semantic vector into an output sentence (target language). In this process, when using translation memory to help training, TM retrieval is first performed. This step searches the database for translation fragments that are similar to the current input by means of a similarity measure. Then use an additional dual-encoder (Dual-Encoder) structure or directly combine TM and X with a unified vocabulary, and input it to the translation model for training. The disadvantage of this method is that the sentences most similar to the currently translated sentence are selected in turn from the database, but the retrieved sentences are ignored, resulting in all the retrieved sentences repeating each other, and redundant content brings less information gain information, which affects the efficiency of model training. At the same time, the translation model of the related art processes the retrieved multiple TMs individually instead of treating them as a unified whole, which affects the translation effect of the translation model.

为解决这一相关技术中的缺陷,参考图4,图4为本发明实施例提供的翻译模型的训练方法一个可选的流程示意图,可以理解地,图4所示的步骤可以由运行翻译模型训练装置的各种电子设备执行,例如可以是如带有模型训练功能的专用终端、带有翻译模型训练功能的服务器或者服务器集群。下面针对图4 示出的步骤进行说明。In order to solve the defect in this related art, with reference to FIG. 4, FIG. 4 is an optional schematic flowchart of the training method of the translation model provided by the embodiment of the present invention. It is understood that the steps shown in FIG. 4 can be executed by running the translation model. Various electronic devices of the training apparatus are executed, such as a dedicated terminal with a model training function, a server with a translation model training function, or a server cluster. The steps shown in FIG. 4 will be described below.

步骤401:翻译模型训练装置获取目标翻译记忆语句,并在翻译记忆库中获取与所述目标翻译记忆语句对应的源端语句。Step 401: The translation model training apparatus obtains a target translation memory sentence, and obtains a source-end sentence corresponding to the target translation memory sentence in a translation memory database.

在本发明的一些实施例中,获取目标翻译记忆语句时,首先获取待翻译语句;之后基于所述待翻译语句与原始翻译记忆语句的语句相似度进行检索,得到与所述待翻译语句相匹配的至少两个原始翻译记忆语句;最后对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到所述目标翻译记忆语句。其中,翻译记忆库可以保存不同搞得原始翻译记忆语句,例如翻译记忆库可以存储历史用户提供的翻译信息,即每当用户每发起一次翻译请求,翻译记忆库可以响应于用户的翻译请求,对用户提供的翻译信息进行相应存储,每一个被存储的翻译信息都可以作为原始翻译记忆语句;或者由经过训练的翻译模型对爬虫程序所收集的不同语言的网页进行自动翻译,并将翻译结果保存在翻译记忆库中,作为原始翻译记忆语句,对此本申请不做具体限制。In some embodiments of the present invention, when obtaining the target translation memory sentence, first obtain the sentence to be translated; then search based on the sentence similarity between the sentence to be translated and the original translation memory sentence, and obtain a sentence matching the sentence to be translated At least two original translation memory sentences are obtained; finally, the obtained at least two original translation memory sentences are subjected to translation memory fusion processing to obtain the target translation memory sentences. Among them, the translation memory can store different original translation memory sentences. For example, the translation memory can store the translation information provided by historical users, that is, every time the user initiates a translation request, the translation memory can respond to the user's translation request. The translation information provided by the user is stored accordingly, and each stored translation information can be used as the original translation memory sentence; or the trained translation model automatically translates the web pages in different languages collected by the crawler, and saves the translation results In the translation memory, as the original translation memory sentence, there is no specific limitation on this in this application.

在本发明的一些实施例中,以游戏显示界面中的语句翻译为例,例如对于A 游戏中显示界面所呈现的的日文语句,以及经过翻译的中文语句,可以作为目标翻译记忆语句和对应的源端语句,并保存在翻译记忆库中,当需要对B游戏中显示界面所呈现的的日文语句记性翻译时,可以在翻译记忆库中获取与B游戏中的待翻译语句相匹配的目标翻译记忆语句以及对应的源端语句完成对翻译模型的训练。In some embodiments of the present invention, taking sentence translation in the game display interface as an example, for example, Japanese sentences and translated Chinese sentences displayed on the display interface in game A can be used as target translation memory sentences and corresponding The source sentence is stored in the translation memory. When you need to memorize the translation of the Japanese sentence displayed on the display interface in the B game, you can obtain the target translation in the translation memory that matches the to-be-translated sentence in the B game. The memorized sentences and the corresponding source-side sentences complete the training of the translation model.

在本发明的一些实施例中,基于所述待翻译语句与原始翻译记忆语句的语句相似度进行检索,得到与所述待翻译语句相匹配的至少两个原始翻译记忆语句,可以通过以下方式实现:In some embodiments of the present invention, retrieval is performed based on the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain at least two original translation memory sentences matching the to-be-translated sentence, which can be achieved in the following manner :

获取所述待翻译语句的最大长度与任一翻译记忆语句的最大长度;获取所述待翻译语句与所述任一翻译记忆语句的词元距离,基于所述词元距离、所述待翻译语句的最大长度和所述任一翻译记忆语句的最大长度,确定所述待翻译语句与所述任一翻译记忆语句的相似度;当所述相似度大于等于相似度阈值时,确定所述任一翻译记忆语句为所述待翻译语句对应的原始翻译记忆语句。其中,输入的待翻译的句子为x,y为任一翻译记忆语句,通过公式1,利用对比检索的思基于待翻译语句与原始翻译记忆语句的语句相似度进行检索,得到与待翻译语句相匹配的至少两个原始翻译记忆语句,例如可以在翻译记忆库中找到k 个(k为大于等于2的整数)翻译记忆语句:Obtain the maximum length of the statement to be translated and the maximum length of any translation memory statement; obtain the morpheme distance between the statement to be translated and any translation memory statement, based on the morpheme distance, the statement to be translated The maximum length of the maximum length and the maximum length of the any translation memory sentence, determine the similarity between the to-be-translated sentence and the any translation memory sentence; when the similarity is greater than or equal to the similarity threshold, determine the any The translation memory sentence is the original translation memory sentence corresponding to the to-be-translated sentence. Wherein, the input sentence to be translated is x, and y is any translation memory sentence. According to formula 1, the method of comparative retrieval is used to retrieve the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain the sentence similar to the sentence to be translated. At least two original translation memory sentences that match, for example, k (k is an integer greater than or equal to 2) translation memory sentences can be found in the translation memory:

Figure BDA0003537070940000161
Figure BDA0003537070940000161

其中,(x,xi)和xi,xj为两组待翻译语句与原始翻译记忆语句,例如可以输入的待翻译语句为“我想吃苹果”则原始翻译记忆语句为“我想吃草莓”和“我打算吃泡面”。通过翻译记忆融合处理,得到目标翻译记忆语句可以为“我想吃草莓,打算吃泡面”。Among them, (x, x i ) and x i , x j are the two groups of sentences to be translated and the original translation memory sentence. For example, the input sentence to be translated is "I want to eat an apple", and the original translation memory sentence is "I want to eat."Strawberry" and "I'm going to eat instant noodles". Through translation memory fusion processing, the target translation memory sentence can be obtained as "I want to eat strawberries and plan to eat instant noodles".

在进行相似度检索时,具体的计算方式参考公式2其中Sim(x,y)为待翻译语句与任一翻译记忆语句之间的相似度度量函数:When performing similarity retrieval, the specific calculation method refers to formula 2, where Sim(x, y) is the similarity measurement function between the sentence to be translated and any translation memory sentence:

Figure BDA0003537070940000162
Figure BDA0003537070940000162

其中,公式2中的分子项Dedit(x,y)为任意两句话之间的词元(Token)的编辑距离。分母项max(|x|,|y|)为待翻译语句与任一翻译记忆语句的长度最大值,通过待翻译语句与任一翻译记忆语句的最大长度可以实现对相似度的归一化处理,由此,通过对比待翻译语句与不同原始翻译记忆语句之间的语句相似度,可以获得与待翻译语句匹配的更高的原始翻译记忆语句,保证翻译模型的准确性,减少翻译模型的错误翻译对用户的影响。Among them, the numerator item D edit (x, y) in Formula 2 is the edit distance of the token (Token) between any two sentences. The denominator term max(|x|,|y|) is the maximum length of the sentence to be translated and any translation memory sentence. The similarity can be normalized by the maximum length of the sentence to be translated and any translation memory sentence. Therefore, by comparing the sentence similarity between the sentence to be translated and different original translation memory sentences, a higher original translation memory sentence matching the sentence to be translated can be obtained, ensuring the accuracy of the translation model and reducing the errors of the translation model The impact of translation on users.

步骤402:翻译模型训练装置对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到目标翻译记忆语句。Step 402: The translation model training apparatus performs translation memory fusion processing on the at least two acquired original translation memory sentences to obtain a target translation memory sentence.

在本发明的一些实施例中,对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到目标翻译记忆语句,可以通过以下方式实现:In some embodiments of the present invention, translation memory fusion processing is performed on at least two obtained original translation memory sentences to obtain a target translation memory sentence, which can be achieved in the following ways:

通过注意力函数计算每一个翻译记忆语句对应的注意力值;将相同注意力值的翻译记忆语句融合为同一翻译记忆语句;或者将相同注意力值的翻译记忆语句融合为训练样本子集中的不同训练样本。其中,每一个翻译记忆语句对应的注意力值可以通过注意力函数计算得到,具体参考公式3:The attention value corresponding to each translation memory sentence is calculated by the attention function; the translation memory sentences with the same attention value are fused into the same translation memory sentence; or the translation memory sentences with the same attention value are fused into different training sample subsets Training samples. Among them, the attention value corresponding to each translation memory sentence can be calculated by the attention function. For details, please refer to formula 3:

Figure BDA0003537070940000171
Figure BDA0003537070940000171

其中

Figure BDA0003537070940000172
为翻译记忆节点vi在t+1时刻的状态,又该节点的邻居节点在t时刻的状态和该节点在t时刻的状态表示经过自注意力机制更新而来。同时可以为每个翻译记忆库配置设一个超节点,超节点的作用是为了让不同翻译记忆语句之间可以进行信息沟通和交互,同时通一个翻译记忆内部库的节点是无法看到周围节点的内容,保证了语义的完整性,便于在不同的使用环境中调用原始的翻译记忆语句。通过翻译记忆融合,利用注意力机制来捕捉不同翻译机制相似性,保证翻译记忆的多样性(也就是训练样本的多样性),使得经过训练翻译模型的精确度更高,翻译效果更好。in
Figure BDA0003537070940000172
is the state of translation memory node v i at time t+1, and the state of the node’s neighbor nodes at time t and the state of this node at time t are updated by the self-attention mechanism. At the same time, a super node can be configured for each translation memory. The function of the super node is to allow information communication and interaction between different translation memory sentences. At the same time, the nodes of a translation memory internal database cannot see the surrounding nodes. The content ensures the integrity of the semantics and facilitates the recall of the original translation memory sentences in different usage environments. Through translation memory fusion, the attention mechanism is used to capture the similarity of different translation mechanisms, so as to ensure the diversity of translation memory (that is, the diversity of training samples), so that the accuracy of the trained translation model is higher and the translation effect is better.

在本发明的一些实施例中,为了加快翻译模型的训练速度,还可以对相同注意力值的翻译记忆语句进行舍弃,仅保留不同相同注意力值的翻译记忆语句,以减少训练样本集合的体积。In some embodiments of the present invention, in order to speed up the training speed of the translation model, the translation memory sentences with the same attention value can also be discarded, and only the translation memory sentences with different and the same attention value are retained, so as to reduce the volume of the training sample set .

步骤403:翻译模型训练装置基于每一个目标翻译记忆语句,从翻译记忆库中获取对应的源端语句。Step 403: The translation model training apparatus acquires the corresponding source-end sentence from the translation memory database based on each target translation memory sentence.

步骤404:翻译模型训练装置将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合。Step 404 : The translation model training apparatus composes the source sentence and each target translation memory sentence into a training sample, and composes different training samples into a training sample set.

在本发明的一些实施例中,当训练样本集合中的训练样本数量超过训练样本数量阈值时,还需要对训练样本集合进行除噪,具体包括:In some embodiments of the present invention, when the number of training samples in the training sample set exceeds the threshold of the number of training samples, the training sample set needs to be de-noised, specifically including:

确定与所述翻译模型的使用环境相匹配的动态噪声阈值;determining a dynamic noise threshold that matches the context in which the translation model is used;

根据所述动态噪声阈值对所述训练样本集合进行去噪处理,以形成与所述动态噪声阈值相匹配的去噪训练样本集合;或者,Perform denoising processing on the training sample set according to the dynamic noise threshold to form a denoising training sample set matching the dynamic noise threshold; or,

确定与所述翻译模型相对应的固定噪声阈值,并根据所述固定噪声阈值对所述训练样本集合进行去噪处理,以形成与所述固定噪声阈值相匹配的去噪训练样本集合。其中由于翻译模型的使用环境不同,与所述翻译模型的使用环境相匹配的动态噪声阈值也不相同,例如,学术翻译的使用环境中,与所述翻译模型的使用环境相匹配的动态噪声阈值需要小于文章阅读环境中的动态噪声阈值。A fixed noise threshold corresponding to the translation model is determined, and the training sample set is denoised according to the fixed noise threshold to form a denoised training sample set matching the fixed noise threshold. Wherein, due to the different usage environments of the translation models, the dynamic noise thresholds that match the usage environments of the translation models are also different. For example, in the usage environments of academic translations, the dynamic noise thresholds that match the usage environments of the translation models Need to be less than the dynamic noise threshold in the article reading environment.

在本发明的一些实施例中,当翻译模型固化于相应的硬件机构中,使用环境为口语化翻译时,通过固定翻译模型相对应的固定噪声阈值,能够有效提神翻译模型的训练速度,减少用户的等待时间。In some embodiments of the present invention, when the translation model is fixed in the corresponding hardware mechanism, and the use environment is oral translation, by fixing the fixed noise threshold corresponding to the translation model, the training speed of the translation model can be effectively improved, and users can be reduced. waiting time.

确定训练样本集合之后,继续执行步骤405。After the training sample set is determined, step 405 is continued.

步骤405:翻译模型训练装置获取翻译模型的初始参数。Step 405: The translation model training apparatus obtains initial parameters of the translation model.

步骤406:翻译模型训练装置响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数。Step 406 : The translation model training apparatus, in response to the initial parameters of the translation model, processes different training samples in the training sample set through the translation model, and determines the update parameters of the translation model.

步骤407:翻译模型训练装置根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新。Step 407: The translation model training device iteratively updates the encoder parameters and decoder parameters of the translation model through different training samples in the training sample set according to the update parameters of the translation model.

在本发明的一些实施例中,初始化所述解码器网络以更新所述解码器网络的参数,可以通过以下方式实现:In some embodiments of the present invention, initializing the decoder network to update the parameters of the decoder network can be implemented in the following ways:

通过所述解码器网络的编码器对所述待翻译语句进行编码,形成所述待翻译语句的编码结果;通过所述解码器网络的解码器,对所述待翻译语句的编码结果进行解码;当解码得到与所述待翻译语句相对应的翻译结果的被选取概率时,确定所述解码器网络的参数。例如:初始化训练训练后的翻译模型的解码器网络根据相应的待翻译语句生成了3个生成翻译结果词语,分别是翻译结果a (概率为0.45)、翻译结果b(概率为0.5)以及翻译结果c(概率为0.45),那么概率分布为{0.45,0.5,0.45}。Encoding the sentence to be translated by the encoder of the decoder network to form the encoding result of the sentence to be translated; decoding the encoding result of the sentence to be translated by the decoder of the decoder network; When the selected probability of the translation result corresponding to the sentence to be translated is obtained by decoding, the parameters of the decoder network are determined. For example, the decoder network that initializes the trained translation model generates three words that generate translation results according to the corresponding sentences to be translated, namely translation result a (with probability 0.45), translation result b (with probability 0.5) and translation result c (probability is 0.45), then the probability distribution is {0.45, 0.5, 0.45}.

由此,可以实现所述翻译模型能够根据相应的待翻译语句输出损失值最小的翻译结果。Thus, it can be realized that the translation model can output the translation result with the smallest loss value according to the corresponding sentence to be translated.

在本发明的一些实施例中,响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数,可以通过以下方式实现:In some embodiments of the present invention, in response to the initial parameters of the translation model, different training samples in the training sample set are processed by the translation model, and the update parameters of the translation model are determined by the following way to achieve:

将所述训练样本集合中不同训练样本,代入由所述翻译模型的编码器和所述解码器构成的自编码网络对应的损失函数;确定所述损失函数满足收敛条件时对应所述翻译模型中编码器的参数和相应的解码器参数作为所述翻译模型的更新参数。其中,编码器网络的损失函数表示为:Substitute different training samples in the training sample set into the loss function corresponding to the auto-encoding network composed of the encoder and the decoder of the translation model; when it is determined that the loss function satisfies the convergence condition, it corresponds to the translation model. The parameters of the encoder and the corresponding decoder parameters are used as the update parameters of the translation model. Among them, the loss function of the encoder network is expressed as:

loss_A=∑(decoder_A(encoder(warp(x1)))-x1)2;具体来说,decoder_A 为解码器A,warp为待翻译语句的函数,x1为待翻译语句,encoder为编码器。loss_A=∑(decoder_A(encoder(warp(x1)))-x1)2; specifically, decoder_A is the decoder A, warp is the function of the sentence to be translated, x1 is the sentence to be translated, and encoder is the encoder.

在迭代训练的过程中,通过将待翻译语句代入编码器网络的损失函数,求解损失函数按照梯度(例如最大梯度)下降时编码器A和解码器A的参数,当损失函数收敛时(即确定能够形成与所述待翻译语句所对应的词语级的隐变量时),结束训练。In the process of iterative training, by substituting the sentence to be translated into the loss function of the encoder network, the parameters of encoder A and decoder A when the loss function descends according to the gradient (such as the maximum gradient) are solved. When the loss function converges (that is, determine the When the word-level latent variable corresponding to the sentence to be translated can be formed), the training is ended.

对编码器网络的训练过程中,编码器网络的损失函数表示为:loss_B=∑(decoder_B(encoder(warp(x2)))-x2)2;其中,decoder_B为解码器B, warp为待翻译语句的函数,x2为待翻译语句,encoder为编码器。In the training process of the encoder network, the loss function of the encoder network is expressed as: loss_B=∑(decoder_B(encoder(warp(x2)))-x2)2; wherein, decoder_B is the decoder B, and warp is the sentence to be translated function, x2 is the sentence to be translated, and encoder is the encoder.

在迭代训练的过程中,通过将待翻译语句代入编码器网络的损失函数,求解损失函数按照梯度(例如最大梯度)下降时编码器B和解码器B的参数;当损失函数收敛时(即当解码得到与所述待翻译语句相对应的翻译结果的被选取概率时),结束训练。In the process of iterative training, by substituting the sentence to be translated into the loss function of the encoder network, the parameters of encoder B and decoder B are solved when the loss function descends according to the gradient (such as the maximum gradient); when the loss function converges (that is, when the loss function converges) When the selected probability of the translation result corresponding to the sentence to be translated is obtained by decoding), the training is ended.

由此,可以实现所述翻译模型能够根据相应的待翻译语句输出损失值最小的翻译结果,保证翻译结果的精确性。Therefore, it can be realized that the translation model can output the translation result with the smallest loss value according to the corresponding sentence to be translated, so as to ensure the accuracy of the translation result.

在本发明的一些实施例中,所述方法还包括:In some embodiments of the present invention, the method further includes:

对所述训练样本集合进行负例处理,以形成与所述训练样本集合相对应的负例样本集合,其中,所述负例样本集合用于调整所述翻译模型的编码器参数和解码器参数调整。Negative example processing is performed on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used to adjust the encoder parameters and decoder parameters of the translation model Adjustment.

为了进一步地说明负例样本的获取过程,继续参考图5,图5为本发明实施例提供的翻译模型的训练方法一个可选的流程示意图,可以理解地,图5所示的步骤可以由运行翻译模型训练装置的各种电子设备执行,例如可以是如带有模型训练功能的专用终端、带有翻译模型训练功能的服务器或者服务器集群。下面针对图5示出的步骤进行说明。In order to further illustrate the process of obtaining negative samples, continue to refer to FIG. 5 , which is an optional schematic flowchart of a training method for a translation model provided by an embodiment of the present invention. It is understood that the steps shown in FIG. 5 can be executed by running Various electronic devices of the translation model training apparatus can execute, for example, a dedicated terminal with a model training function, a server with a translation model training function, or a server cluster. The steps shown in FIG. 5 will be described below.

步骤501:确定所述翻译模型对应的监督函数。Step 501: Determine the supervision function corresponding to the translation model.

步骤502:调整所述监督函数的温度系数。Step 502: Adjust the temperature coefficient of the supervisory function.

步骤503:基于所述训练集合中任意两个翻译记忆语句的向量相似度和不同的温度系数,通过所述监督函数对对所述训练样本集合进行负例处理,以形成与所述训练样本集合相对应的负例样本集合。其中,监督函数参考公式4:Step 503: Based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training set, perform negative processing on the training sample set through the supervisory function to form the same as the training sample set. The corresponding negative sample set. Among them, the supervision function refers to formula 4:

Figure BDA0003537070940000201
Figure BDA0003537070940000201

其中,sim(y,y’)表示的是任意两个翻译记忆语句的向量的相似程度,相似度计算可以是机器翻译评测的标准方法(BLEU,Bilingual Evaluation Understudy)、余弦相似度(Cosine Similarity)等方式,具体相似度计算方式本申请不做限制,τ是温度系数,用于控制对比学习中的正负样例的区分难度,通过对温度系数的调整,可以灵活控制负例样本的比例。Among them, sim(y, y') represents the similarity of the vectors of any two translation memory sentences, and the similarity calculation can be the standard method of machine translation evaluation (BLEU, Bilingual Evaluation Understudy), cosine similarity (Cosine Similarity) The specific similarity calculation method is not limited in this application. τ is the temperature coefficient, which is used to control the difficulty of distinguishing positive and negative samples in comparative learning. By adjusting the temperature coefficient, the proportion of negative samples can be flexibly controlled.

在本发明的一些实施例中,翻译模型对应的编码器和对应的解码器还可以为双向网络模型,例如可以均选用Bi-GRU双向GRU模型作为对应的编码器和对应的解码器,此处的Bi-GRU双向GRU模型是一种可以识别倒装句结构的模型。由于用户在输入对话语句时,可能使得该对话语句为倒装句结构,即与正常的语句结构不一样,例如用户输入的对话语句为“天气怎么样今天”,而正常的语句结构为“今天天气怎么样”,采用Bi-GRU双向GRU模型可以识别出倒装句结构的对话语句,从而可以丰富训练后的模型的功能,进而可以提高最终训练得到的目标模型的鲁棒性。In some embodiments of the present invention, the encoder corresponding to the translation model and the corresponding decoder may also be a bidirectional network model. For example, the Bi-GRU bidirectional GRU model may be selected as the corresponding encoder and the corresponding decoder. Here The Bi-GRU bidirectional GRU model is a model that can recognize inverted sentence structures. When the user enters a dialogue sentence, the dialogue sentence may be an inverted sentence structure, which is different from the normal sentence structure. For example, the dialogue sentence input by the user is "how is the weather today", while the normal sentence structure is "today." How is the weather?”, the Bi-GRU bidirectional GRU model can identify the dialogue sentences with the inverted sentence structure, which can enrich the functions of the trained model and improve the robustness of the final trained target model.

结合图2示出的翻译模型训练装置继续说明本发明实施例提供的翻译模型的文本语句处理方法,参见图6,图6为本发明实施例提供的翻译模型的文本语句处理方法一个可选的流程示意图,可以理解地,图6示的步骤可以由运行翻译模型训练装置的各种电子设备执行,例如可以是如带有待翻译语句处理功能的专用终端、带有待翻译语句处理功能的服务器或者服务器集群。下面针对图6 示出的步骤进行说明。Continuing to describe the text sentence processing method of the translation model provided by the embodiment of the present invention in conjunction with the translation model training device shown in FIG. 2, referring to FIG. 6, FIG. 6 is an optional text sentence processing method of the translation model provided by the embodiment of the present invention. A schematic flowchart, it is understood that the steps shown in FIG. 6 can be executed by various electronic devices running the translation model training device, for example, can be a dedicated terminal with a function of processing sentences to be translated, a server with a function of processing sentences to be translated, or a server cluster. The steps shown in FIG. 6 will be described below.

步骤601:通过翻译模型的编码器,确定与待翻译语句所对应的至少一个词语级的隐变量。Step 601: Determine at least one word-level latent variable corresponding to the sentence to be translated through the encoder of the translation model.

步骤602:通过所述翻译模型的解码器,根据所述至少一个词语级的隐变量,生成与所述词语级的隐变量相对应的翻译词语以及所述翻译词语的被选取概率。Step 602 : Generate a translation word corresponding to the word-level latent variable and a selection probability of the translated word according to the at least one word-level latent variable through the decoder of the translation model.

步骤603:根据所述翻译结果的被选取概率,选取至少一个翻译词语组成与所述待翻译语句相对应的翻译结果。Step 603: According to the selected probability of the translation result, select at least one translation word to form a translation result corresponding to the to-be-translated sentence.

步骤604:输出所述翻译结果。Step 604: Output the translation result.

在对经过训练的翻译模型进行测试时,参考图7至图9,其中,相关技术的模型训练方法为:Vaswani et al.,2017、Gu et al.,2018、Zhang et al.,2018、Xu et al., 2020*、Xia et al.,2019、He et al.,2021(@s)、Cai et al.,2021(#2),本申请的训练方法为T-Full,图7为翻译模型的精确度测试结果示意图,图8为翻译模型的训练效率测试结果示意图,图9为本申请翻译模型训练方法中不同翻译记忆语句数量的测试结果示意图,T-Full的精确度得分为58.69至67.76,明显高于相关技术中的训练方法,说明本申请可以效充分利用已有的翻译记忆语句对模型训练的增益,使得翻译模型能够适应不同的使用场景。When testing the trained translation model, refer to Figures 7 to 9, wherein the model training methods of the related art are: Vaswani et al., 2017, Gu et al., 2018, Zhang et al., 2018, Xu et al., 2020*, Xia et al., 2019, He et al., 2021(@s), Cai et al., 2021(#2), the training method of this application is T-Full, and Figure 7 is the translation A schematic diagram of the accuracy test results of the model, FIG. 8 is a schematic diagram of the training efficiency test results of the translation model, and FIG. 9 is a schematic diagram of the test results of different translation memory sentence numbers in the translation model training method of the application, and the accuracy score of T-Full is 58.69 to 67.76, which is significantly higher than the training method in the related art, indicating that the present application can effectively make full use of the gains of the existing translation memory sentences for model training, so that the translation model can adapt to different usage scenarios.

图10为本发明实施例所提供的翻译模型的前端显示界面示意图,通过本实施例所示的翻译模型能够对待翻译语句进行处理,生成相应的翻译文本。其中,目标待翻译为用户通过即时通讯客户端的翻译小程序所输入的“当我能找到时间的时候,我想去野营,和我的朋友们一起点篝火。”FIG. 10 is a schematic diagram of a front-end display interface of a translation model provided by an embodiment of the present invention. The translation model shown in this embodiment can process sentences to be translated and generate corresponding translated texts. Among them, the target to be translated is "When I can find the time, I want to go camping and light a bonfire with my friends" entered by the user through the translation applet of the instant messaging client.

通过所述翻译模型的处理,形成相应的翻译文本供用户选择以及所述翻译结果的被选取概率。Through the processing of the translation model, corresponding translation texts are formed for the user to select and the selection probability of the translation result.

根据所述翻译结果的被选取概率,选取翻译结果组成与所述待翻译语句相对应的翻译结果包括以下三种组合:According to the selected probability of the described translation result, the selected translation result constitutes the translation result corresponding to the statement to be translated and includes the following three combinations:

1)“When I can find time,I want to go camping and light a bonfire withmy friends”。1) "When I can find time, I want to go camping and light a bonfire with my friends".

2)“When I find time,go camping and light a bonfire with yourfriends”。2) "When I find time, go camping and light a bonfire with your friends".

3)“When you find time,go camping friends and light a campfire”。3) "When you find time, go camping friends and light a campfire".

由此,通过本发明所提供的翻译模型,可以实现根据同一个待翻译语句可以生成多种不同的翻译结果。Thus, through the translation model provided by the present invention, it can be realized that a plurality of different translation results can be generated according to the same sentence to be translated.

下面结合具体的翻译模型的结构,对本发明实施例所提供的翻译模型使用过程进行说明,其中,由于游戏中虚拟对象和虚拟场景使用英文或者日文,国内用户往往不能及时理解虚拟对象和虚拟场景的含义,因此,通过翻译模型,可以及时的获取虚拟对象和虚拟场景的外文含义,经过训练的翻译模型可以对日本游戏服务器的游戏场景中的文字信息进行翻译,翻译模型为transformer 结构。The following describes the use process of the translation model provided by the embodiment of the present invention in conjunction with the specific structure of the translation model. Since the virtual objects and virtual scenes in the game are in English or Japanese, domestic users often cannot understand the meanings of virtual objects and virtual scenes in time. Therefore, through the translation model, the foreign language meaning of virtual objects and virtual scenes can be obtained in time. The trained translation model can translate the text information in the game scene of the Japanese game server. The translation model is a transformer structure.

继续参考图11,图11本发明实施例中翻译模型一个可选的结构示意图,其中,Encoder包括:N=6个相同的layers组成,每一层包含两个sub-layers。第一个sub-layer就是多头注意力层(multi-head attention layer)然后是一个简单的全连接层。其中每个sub-layer都加了残差连接(residual connection)和归一化(normalisation)。Continuing to refer to FIG. 11, FIG. 11 is an optional structural schematic diagram of a translation model in an embodiment of the present invention, wherein the Encoder includes: N=6 identical layers, and each layer includes two sub-layers. The first sub-layer is a multi-head attention layer followed by a simple fully connected layer. Each of the sub-layers adds residual connection and normalisation.

Decoder包括:由N=6个相同的Layer组成,其中layer和encoder并不相同,这里的layer包含了三个sub-layers,其中有一个self-attention layer, encoder-decoderattention layer最后是一个全连接层。前两个sub-layer都是基于 multi-headattention layer。具体的,左侧的Nx表示编码器的其中一层的结构,这一层中包括两个子层,第一个子层为多头注意力层,第二个子层为前向传播层。每个子层的输入和输出都存在关联,当前子层的输出作为下一个子层的一个输入数据。每个子层的后面紧接着一个归一化操作,归一化操作能够提高模型的收敛速度。右侧的Nx表示解码器的其中一层的结构,解码器的一层中包括三个子层,第一个子层为mask矩阵控制的多头注意力子层,用来建模已经生成的目标端句子向量,在训练的过程中,需要一个mask矩阵来控制,使得每次多头注意力计算的时候,只计算到前t-1个词。第二个子层是多头注意力子层,是编码器和解码器之间的注意力机制,也就是在源文本中查找相关的语义信息,该层的计算使用了点积的方式。第三个子层是前向传播子层,与编码器中的前向传播子层的计算方式一致。解码器的每个子层之间也都存在着关联,当前子层的输出作为下一个子层的一个输入数据。并且解码器的每个子层之后同样紧接着一个归一化操作,以加快模型收敛。Decoder includes: It consists of N=6 identical layers, where the layer and the encoder are not the same. The layer here contains three sub-layers, including a self-attention layer. The encoder-decoderattention layer is finally a fully connected layer. . The first two sub-layers are based on the multi-headattention layer. Specifically, Nx on the left represents the structure of one of the layers of the encoder, which includes two sublayers, the first sublayer is a multi-head attention layer, and the second sublayer is a forward propagation layer. The input and output of each sublayer are associated, and the output of the current sublayer is used as an input data for the next sublayer. Each sublayer is followed by a normalization operation, which can improve the convergence speed of the model. The Nx on the right represents the structure of one of the layers of the decoder. One layer of the decoder includes three sub-layers. The first sub-layer is the multi-head attention sub-layer controlled by the mask matrix, which is used to model the generated target. The sentence vector, in the training process, needs a mask matrix to control, so that each time the multi-head attention is calculated, only the first t-1 words are calculated. The second sub-layer is the multi-head attention sub-layer, which is the attention mechanism between the encoder and the decoder, that is, to find relevant semantic information in the source text. The calculation of this layer uses the dot product method. The third sublayer is the forward propagation sublayer, which is calculated in the same way as the forward propagation sublayer in the encoder. There is also an association between each sub-layer of the decoder, and the output of the current sub-layer is used as an input data for the next sub-layer. And each sub-layer of the decoder is also followed by a normalization operation to speed up the model convergence.

继续参考图12,图12为本发明实施例中翻译模型一个可选的翻译过程示意图,其中,其中,encoder和decoder部分都包含了6个encoder和decoder。进入到第一个encoder的inputs结合embedding和positional embedding。通过了6 个encoder之后,输出到了decoder部分的每一个decoder中;输入待翻译语句为日文“無双の魔呂布は言った。「これから戦場は私一人で支配します。私と戦う人がいます。经过翻译模型的处理,翻译记忆库中保存有原始翻译语句“蒼天翔竜趙雲は言います。「勇者の誓いは、生死よりも甚だしい!恐れないと心に抱いて、空を飛べます。”;以及源端语句“苍天翔龙赵云说:“勇者之誓,甚于生死!心怀不惧,方能翱翔于天际!”;在翻译记忆库中获取与同样是日语的目标翻译记忆语句对应的中文源端语句后,对翻译模型进行训练,最终通过翻译模型所输出的翻译结果为:“无双之魔吕布说:“从此刻开始,战场由我一人主宰!可有人敢与我一战!”Continuing to refer to FIG. 12 , FIG. 12 is a schematic diagram of an optional translation process of the translation model in the embodiment of the present invention, wherein the encoder and decoder parts both include 6 encoders and decoders. The inputs to the first encoder are combined with embedding and positional embedding. After passing through 6 encoders, the output is output to each decoder in the decoder part; the input sentence to be translated is Japanese "Unparalleled の Demon Lu Bu は Yan っ た." After processing by the translation model, the original translation sentence "Cang Tian Xiang Long Zhao Yun は 言います" is stored in the translation memory. "The brave man's oath, life and death よ り も even だ し い! Fear れ な い and heart に hug い て, empty を flying べ ま す."; and the source sentence "Sky Xianglong Zhao Yun said: "The oath of the brave is more than life and death! Don't be afraid, you can soar in the sky! ”; after obtaining the Chinese source sentence corresponding to the target translation memory sentence that is also Japanese in the translation memory, the translation model is trained, and the final translation result output by the translation model is: “The Demon of Unparalleled Lu Bu said:” From this moment on, the battlefield will be dominated by me alone! But someone dares to fight me!"

继续参考图13,图13为本发明实施例中翻译模型中编码器一个可选的结构示意图,其中,其输入由维度为d的查询(Q)和键(K)以及维度为d的值(V) 组成,所有键计算查询的点积,并应用softmax函数获得值的权重。Continuing to refer to FIG. 13, FIG. 13 is an optional structural schematic diagram of the encoder in the translation model in the embodiment of the present invention, wherein the input consists of a query (Q) and a key (K) with a dimension of d and a value ( V) composition, all keys compute the dot product of the query, and apply the softmax function to get the weights of the values.

继续参考图13,图13为本发明实施例中翻译模型中编码器的向量示意图,其中Q,K和V的是通过输入encoder的向量x与W^Q,W^K,W^V相乘得到Q,K和V。W^Q,W^K,W^V在文章的维度是(512,64),然后假设可以inputs的维度是(m,512),其中m代表了字的个数。所以输入向量与 W^Q,W^K,W^V相乘之后得到的Q、K和V的维度就是(m,64)。Continue to refer to Fig. 13, Fig. 13 is a vector diagram of the encoder in the translation model in the embodiment of the present invention, wherein Q, K and V are multiplied by W^Q, W^K, W^V through the vector x of the input encoder Get Q, K and V. The dimension of W^Q, W^K, W^V in the article is (512, 64), and then it is assumed that the dimension of the available inputs is (m, 512), where m represents the number of words. So the dimensions of Q, K and V obtained by multiplying the input vector with W^Q, W^K, W^V are (m, 64).

继续参考图14,图14为本发明实施例中翻译模型中编码器的向量拼接示意图,其中,Z0到Z7就是对应的8个并行的head(维度是(m,64)),然后concat 这个8个head之后就得到了(m,512)维度。最后与W^O相乘之后就到了维度为(m,512)的输出的矩阵,那么这个矩阵的维度就和进入下一个encoder 的维度保持一致。Continuing to refer to FIG. 14, FIG. 14 is a schematic diagram of vector splicing of the encoder in the translation model in the embodiment of the present invention, wherein Z0 to Z7 are the corresponding 8 parallel heads (dimensions are (m, 64)), and then concat the 8 After one head, the (m, 512) dimension is obtained. Finally, after multiplying with W^O, the output matrix of dimension (m, 512) is obtained, then the dimension of this matrix is consistent with the dimension of entering the next encoder.

继续参考图15,图15为本发明实施例中翻译模型中编码器的编码过程示意图,其中,通过self-attetion的张量还需要进过残差网络和Later Norm的处理,然后进入到全连接的前馈网络中,前馈网络需要进行同样的操作,进行的残差处理和正规化。最后输出的张量才可以的进入到了下一个encoder之中,然后这样的操作,迭代经过了6次,迭代处理的结果进入到decoder中。Continuing to refer to FIG. 15, FIG. 15 is a schematic diagram of the encoding process of the encoder in the translation model in the embodiment of the present invention, wherein the tensor passing through the self-attetion also needs to go through the processing of the residual network and Later Norm, and then enter the full connection In the feed-forward network of , the feed-forward network needs to perform the same operation, the residual processing and normalization. The final output tensor can be entered into the next encoder, and then such an operation is iterated 6 times, and the result of the iterative processing enters the decoder.

继续参考图16,图16为本发明实施例中翻译模型中解码器的解码过程示意图,其中,decoder的输入输出和解码过程:Continue to refer to Figure 16, Figure 16 is a schematic diagram of the decoding process of the decoder in the translation model in the embodiment of the present invention, wherein, the input and output of the decoder and the decoding process:

输出:对应i位置的输出词的概率分布;Output: the probability distribution of the output word corresponding to the i position;

输入:encoder的输出&对应i-1位置decoder的输出。所以中间的attention 不是self-attention,它的K,V来自encoder,Q来自上一位置decoder的输出。Input: the output of the encoder & the output of the decoder corresponding to the i-1 position. So the middle attention is not self-attention, its K and V come from the encoder, and Q comes from the output of the previous position decoder.

图17为本发明实施例中翻译模型的输出效果示意图,通过本申请提供的翻译模型训练方法对翻译模型进行训练后,可以对游戏场景中的用户输入以及向用户展示的外文内容进行翻译,方便用户的理解与使用。17 is a schematic diagram of the output effect of the translation model in the embodiment of the present invention. After the translation model is trained by the translation model training method provided by the present application, the user input in the game scene and the foreign language content displayed to the user can be translated, which is convenient User's understanding and use.

综上所述,本发明实施例具有以下技术效果:To sum up, the embodiments of the present invention have the following technical effects:

本发明所提供的技术方案通过获取待翻译语句,并基于所述待翻译语句通过对比检索获取至少两个原始翻译记忆语句;对所获取的至少两个原始翻译记忆语句进行翻译记忆融合处理,得到目标翻译记忆语句;基于每一个目标翻译记忆语句,从翻译记忆库中获取对应的源端语句;将所述源端语句和所述每一个目标翻译记忆语句组成训练样本,并且将不同的训练样本组成训练样本集合;获取翻译模型的初始参数;响应于所述翻译模型的初始参数,通过所述翻译模型对所述训练样本集合中的不同训练样本进行处理,确定所述翻译模型的更新参数;根据所述翻译模型的更新参数,通过所述训练样本集合中的不同训练样本对所述翻译模型的编码器参数和解码器参数进行迭代更新,由此,通过对比检索选择与待翻译语句相似的翻译记忆语句,可以减少相关技术中额外的记忆网络带来的网络结构复杂,影响训练速度的问题以及使用时翻译时间过长的问题,同时针对翻译记忆语句的冗余,可以通过翻译记忆融合,利用注意力机制来捕捉不同翻译机制相似性,保证翻译记忆的多样性(也就是训练样本的多样性),使得经过训练翻译模型的精确度更高,翻译效果更好,提升用户的使用体验;同时还可以有效充分利用已有的翻译记忆语句对模型训练的增益,使得翻译模型能够适应不同的使用场景。The technical solution provided by the present invention is by obtaining sentences to be translated, and obtaining at least two original translation memory sentences through comparative retrieval based on the sentences to be translated; and performing translation memory fusion processing on the at least two obtained original translation memory sentences to obtain target translation memory sentence; based on each target translation memory sentence, obtain the corresponding source end sentence from the translation memory bank; form the source end sentence and each target translation memory sentence into a training sample, and combine different training samples forming a training sample set; obtaining the initial parameters of the translation model; in response to the initial parameters of the translation model, processing different training samples in the training sample set by the translation model to determine the update parameters of the translation model; According to the update parameters of the translation model, the encoder parameters and decoder parameters of the translation model are iteratively updated through different training samples in the training sample set. Translation memory sentences can reduce the complex network structure caused by the additional memory network in related technologies, the problems that affect the training speed, and the problem that the translation time is too long during use. The attention mechanism is used to capture the similarity of different translation mechanisms to ensure the diversity of translation memory (that is, the diversity of training samples), so that the trained translation model has higher accuracy, better translation effect, and improved user experience; At the same time, it can effectively make full use of the gain of the existing translation memory sentences for model training, so that the translation model can adapt to different usage scenarios.

以上所述,仅为本发明的实施例而已,并非用于限定本发明的保护范围,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only examples of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims (15)

1. A method for training a translation model, the method comprising:
acquiring a target translation memory statement;
obtaining a source terminal statement corresponding to the target translation memory statement in a translation memory library;
forming training samples by the source-end sentences and the target translation memory sentences, and forming training sample sets by different training samples;
acquiring initial parameters of a translation model;
responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set.
2. The method of claim 1, wherein obtaining the target translation memory statement comprises:
obtaining a sentence to be translated;
retrieving based on the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain at least two original translation memory sentences matched with the sentence to be translated;
and performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain the target translation memory sentence.
3. The method according to claim 2, wherein the retrieving based on the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain at least two original translation memory sentences matched with the sentence to be translated comprises:
acquiring the maximum length of the statement to be translated and the maximum length of any translation memory statement;
obtaining the word element distance between the sentence to be translated and any translation memory sentence,
determining the similarity between the sentence to be translated and any translation memory sentence based on the word element distance, the maximum length of the sentence to be translated and the maximum length of any translation memory sentence;
and when the similarity is greater than or equal to a similarity threshold value, determining that any translation memory statement is an original translation memory statement corresponding to the statement to be translated.
4. The method according to claim 1, wherein the performing translation memory fusion processing on the obtained at least two original translation memory statements to obtain a target translation memory statement comprises:
calculating an attention value corresponding to each translation memory statement through an attention function;
fusing the translation memory sentences with the same attention value into the same translation memory sentence; or
The translated memory statements of the same attention value are fused into different training samples in the subset of training samples.
5. The method of claim 1, further comprising:
determining a dynamic noise threshold value matched with the use environment of the translation model;
denoising the training sample set according to the dynamic noise threshold value to form a denoising training sample set matched with the dynamic noise threshold value; or,
and determining a fixed noise threshold corresponding to the translation model, and denoising the training sample set according to the fixed noise threshold to form a denoising training sample set matched with the fixed noise threshold.
6. The method of claim 1, further comprising:
and carrying out negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for adjusting the encoder parameters and the decoder parameters of the translation model.
7. The method of claim 6, wherein the negative case processing the set of training samples comprises:
determining a supervision function corresponding to the translation model;
adjusting a temperature coefficient of the supervisory function;
and carrying out negative example processing on the training sample set through the supervision function based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training sample set to form a negative example sample set corresponding to the training sample set.
8. The method of claim 6, wherein the negative case processing the set of training samples comprises:
randomly combining statements to be output in a decoder of the translation model to form a negative sample set corresponding to the training sample set; or,
and carrying out random deletion processing or replacement processing on the sentences to be output in a decoder of the translation model to form a negative example sample set corresponding to the training sample set.
9. The method of claim 1, wherein the determining updated parameters of the translation model by processing different training samples in the set of training samples by the translation model in response to initial parameters of the translation model comprises:
substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the translation model;
and determining parameters corresponding to an encoder and corresponding decoder parameters in the translation model when the loss function meets the convergence condition as update parameters of the translation model.
10. A sentence translation method, the method comprising:
determining at least one word-level hidden variable corresponding to a sentence to be translated through an encoder of a translation model;
generating, by a decoder of the translation model, a translated term corresponding to the word-level hidden variable and a selected probability of the translated term according to the at least one word-level hidden variable;
selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
outputting the translation result;
wherein the translation model is trained based on the method of any one of claims 1 to 9.
11. A training apparatus for a translation model, the training apparatus comprising:
the data transmission module is used for acquiring a target translation memory statement;
the translation model training module is used for acquiring a source end sentence corresponding to the target translation memory sentence from a translation memory library;
the translation model training module is used for forming training samples by the source-end sentences and the target translation memory sentences and forming training sample sets by different training samples;
the translation model training module is used for acquiring initial parameters of a translation model;
the translation model training module is used for responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and the translation model training module is used for carrying out iterative updating on the encoder parameter and the decoder parameter of the translation model through different training samples in the training sample set according to the updating parameter of the translation model.
12. A sentence translation apparatus, characterized in that the apparatus comprises:
the encoder module is used for determining at least one word-level hidden variable corresponding to the sentence to be translated through an encoder of the translation model;
a decoder module, configured to generate, by a decoder of the translation model, a translated term corresponding to the hidden variable at the term level and a selected probability of the translated term according to the hidden variable at the term level;
the decoder module is used for selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
and the decoder module is used for outputting the translation result.
13. An electronic device, characterized in that the electronic device comprises:
a memory for storing executable instructions;
a processor for implementing the method of training a translation model according to any one of claims 1 to 9 or implementing the method of sentence translation according to claim 10 when executing the executable instructions stored in the memory.
14. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the method of training a translation model according to any one of claims 1 to 9 or implement the method of sentence translation according to claim 10.
15. A computer readable storage medium storing executable instructions which, when executed by a processor, implement the method for training a translation model according to any one of claims 1 to 9, or implement the method for sentence translation according to claim 10.
CN202210220466.7A 2022-03-08 2022-03-08 Training method of translation model, sentence translation method, apparatus, equipment, program Pending CN114757210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210220466.7A CN114757210A (en) 2022-03-08 2022-03-08 Training method of translation model, sentence translation method, apparatus, equipment, program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210220466.7A CN114757210A (en) 2022-03-08 2022-03-08 Training method of translation model, sentence translation method, apparatus, equipment, program

Publications (1)

Publication Number Publication Date
CN114757210A true CN114757210A (en) 2022-07-15

Family

ID=82325860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210220466.7A Pending CN114757210A (en) 2022-03-08 2022-03-08 Training method of translation model, sentence translation method, apparatus, equipment, program

Country Status (1)

Country Link
CN (1) CN114757210A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860015A (en) * 2022-12-29 2023-03-28 北京中科智加科技有限公司 Translation memory-based transcribed text translation method and computer equipment
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860015A (en) * 2022-12-29 2023-03-28 北京中科智加科技有限公司 Translation memory-based transcribed text translation method and computer equipment
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium
CN116992894B (en) * 2023-09-26 2024-01-16 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN112487182B (en) Training method of text processing model, text processing method and device
CN113704460B (en) Text classification method and device, electronic equipment and storage medium
CN114676234A (en) A model training method and related equipment
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN112131883B (en) Language model training method, device, computer equipment and storage medium
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN110852066B (en) A method and system for multilingual entity relation extraction based on adversarial training mechanism
CN114528898A (en) Scene graph modification based on natural language commands
US20250054322A1 (en) Attribute Recognition with Image-Conditioned Prefix Language Modeling
CN113591493B (en) Translation model training method and translation model device
CN117453949A (en) A video positioning method and device
Yonglan et al. [Retracted] English‐Chinese Machine Translation Model Based on Bidirectional Neural Network with Attention Mechanism
WO2024055707A1 (en) Translation method and related device
CN115114937B (en) Text acquisition methods, devices, computer equipment, and storage media
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
CN117494815B (en) File-oriented credible large language model training and reasoning method and device
CN114757210A (en) Training method of translation model, sentence translation method, apparatus, equipment, program
Lin et al. Chinese story generation of sentence format control based on multi-channel word embedding and novel data format
Wang Semantic analysis technology of English translation based on deep neural network
Shao et al. Variational joint self‐attention for image captioning
CN118568568A (en) Content classification model training method and related equipment
CN116956950A (en) Machine translation method, apparatus, device, medium, and program product
Barati et al. Integration of the latent variable knowledge into deep image captioning with Bayesian modeling
CN117216242A (en) Training methods, devices, equipment, media and program products for abstract generation models
CN117009456A (en) Medical query text processing methods, devices, equipment, media and electronic products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination