CN118551735A

CN118551735A - Diffusion model-based diversity controllable text generation method and device

Info

Publication number: CN118551735A
Application number: CN202411008772.XA
Authority: CN
Inventors: 孙宇清; 韩雨辰; 龚斌
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2024-07-26
Filing date: 2024-07-26
Publication date: 2024-08-27
Anticipated expiration: 2044-07-26
Also published as: CN118551735B

Abstract

The invention belongs to the technical field of natural language processing, and particularly relates to a diversity controllable text generation method and device based on a diffusion model. The method comprises the following steps: constructing a diversity text generation model comprising a natural language prompt encoder, a text reconstruction self-encoder, a control information clustering model and a hidden space diffusion model; in the training stage, giving a reference text and natural language prompts containing task requirements and control requirements, and inputting the reference text and the natural language prompts into a diversity text generation model to perform model training; in the application stage, natural language prompts are given and input into a trained diversity text generation model to generate diversity texts conforming to corresponding control in a stepwise iterative mode. The invention clusters natural language prompt codes in the hidden space, converts natural language control into potential instruction control categories, distributes sampling control information according to the prompt control categories, guides the text generation process, and realizes the control of the generated text.

Description

Diversity controllable text generation method and device based on diffusion model

技术领域Technical Field

本发明属于自然语言处理的技术领域，更具体地，涉及一种基于扩散模型的多样性可控文本生成方法和装置。The present invention belongs to the technical field of natural language processing, and more specifically, relates to a method and device for generating diverse and controllable text based on a diffusion model.

背景技术Background Art

可控文本生成任务是指根据给定的控制条件，生成满足特定要求或约束的文本。这些控制条件可以通过多种形式进行描述，包括关键词、关键短语、属性标签、表格等。然而，这些控制形式难以表达复杂控制需求，例如抽象的概念、情感、观点、意图等。The task of controllable text generation is to generate text that meets specific requirements or constraints based on given control conditions. These control conditions can be described in many forms, including keywords, key phrases, attribute labels, tables, etc. However, these control forms are difficult to express complex control requirements, such as abstract concepts, emotions, opinions, intentions, etc.

以智能服务领域为例，聊天机器人根据用户的需求，自动生成适当的回答；个性化新闻生成需要结合读者的兴趣、背景知识、阅读偏好，生成用户感兴趣的新闻；创意写作服务需要结合用户给定的上下文语境，生成新颖且多样化的文本，同时还需要满足作者所要求的写作风格；软件测试用例自动生成则需要根据给定的用户测试用例，生成结构相似但内容多样的测试用例，以应用于软件测试。自然语言形式的控制条件能够表达上述应用场景中复杂的控制需求。因此，使用自然语言控制的可控文本生成任务具有重要应用价值。Taking the intelligent service field as an example, chatbots automatically generate appropriate answers based on user needs; personalized news generation needs to combine readers' interests, background knowledge, and reading preferences to generate news that users are interested in; creative writing services need to combine the context given by users to generate novel and diverse texts, while also meeting the writing style required by the author; automatic generation of software test cases needs to generate test cases with similar structures but diverse contents based on given user test cases for application in software testing. Control conditions in the form of natural language can express the complex control requirements in the above application scenarios. Therefore, the controllable text generation task controlled by natural language has important application value.

多样性文本指的是在相同控制条件下，能够生成一组语义接近但是句式与词汇表达不同的文本。多样性文本能够使生成的内容更具有吸引力和创新性，同时在个性化推荐和对话系统等应用中实现个性化和定制化内容生成，从而提升用户体验。针对多样性可控文本生成问题，现有技术领域已经公开了一些专利文献，这些方法主要可以分为基于解码阶段采样的多样性方法和基于隐藏状态随机变量的多样性方法。Diverse text refers to a set of texts that are semantically similar but have different sentence structures and vocabulary expressions under the same control conditions. Diverse text can make the generated content more attractive and innovative, and at the same time realize personalized and customized content generation in applications such as personalized recommendations and dialogue systems, thereby improving user experience. In order to solve the problem of diverse controllable text generation, some patent documents have been published in the prior art field. These methods can be mainly divided into diversity methods based on sampling in the decoding stage and diversity methods based on hidden state random variables.

基于解码阶段采样的多样性方法主要是在解码阶段引入不同的采样策略，以保证生成文本词汇多样性。The diversity method based on sampling in the decoding stage mainly introduces different sampling strategies in the decoding stage to ensure the vocabulary diversity of the generated text.

例如，中国专利文献(CN116050401A)在解码器的输出端采用谱聚类和集束搜索的解码方式，自动生成多样性的问题。中国专利文献(CN115630640A)通过在解码阶段加入基于多样性评价指标的惩罚因子对字典中目标词对应的概率进行惩罚，解决生成文本重复的问题，提升文本多样性。然而，上述这类方法在解码阶段改变输出词的概率分布，会导致生成文本的流畅度降低，影响生成文本质量。For example, the Chinese patent document (CN116050401A) uses spectral clustering and beam search decoding methods at the output of the decoder to automatically generate diversity. The Chinese patent document (CN115630640A) solves the problem of repeated generated text and improves text diversity by adding a penalty factor based on the diversity evaluation index to the probability corresponding to the target word in the dictionary during the decoding stage. However, the above-mentioned methods change the probability distribution of the output words during the decoding stage, which will reduce the fluency of the generated text and affect the quality of the generated text.

基于隐藏状态随机变量的多样性方法通过使用变分自编码器或扩散模型等随机性方法，利用隐藏状态随机变量实现生成文本的多样性。Diversity methods based on hidden state random variables use random methods such as variational autoencoders or diffusion models to achieve diversity in generated text using hidden state random variables.

例如，中国专利文献(CN116629272A)通过引入文本语义哈希方法，利用哈希码表示自然语言描述性控制变量。该发明额外引入一个条件编码器实现描述性控制变量对文本生成过程的控制，利用变分自编码器的隐藏状态随机变量实现生成文本的多样性。中国专利文献(CN116432663A)提出了一种基于要素简图的可控多样性专业文本生成方法。通过构建一个条件编码器建立观点和语义特征的关联，实现观点对文本语义的控制，通过随机采样不同的表达特征实现文本表达的多样性。中国专利文献(CN115374271A)通过根据特定的分布采样不同的模型参数，提出了基于深度LSTM的多样性文本生成方法。将LSTM模型的每一个参数由确定的值变为一组分布，使得模型具有更多的不确定性和多样性，通过改变模型实现生成文本的多样性。For example, Chinese patent document (CN116629272A) introduces a text semantic hashing method and uses hash codes to represent natural language descriptive control variables. The invention introduces an additional conditional encoder to realize the control of the text generation process by the descriptive control variable, and uses the hidden state random variables of the variational autoencoder to realize the diversity of the generated text. Chinese patent document (CN116432663A) proposes a controllable diversity professional text generation method based on element diagrams. By constructing a conditional encoder to establish the association between viewpoints and semantic features, the control of text semantics by viewpoints is realized, and the diversity of text expression is realized by randomly sampling different expression features. Chinese patent document (CN115374271A) proposes a diversity text generation method based on deep LSTM by sampling different model parameters according to a specific distribution. Each parameter of the LSTM model is changed from a certain value to a set of distributions, so that the model has more uncertainty and diversity, and the diversity of generated text is realized by changing the model.

文献“Diffusion-lm improves controllable text generation”提出了一种结合扩散模型的非自回归语言模型。该方法将高斯噪声迭代的降噪为词向量，利用扩散模型降噪过程中的随机性实现多样性文本生成。The paper "Diffusion-lm improves controllable text generation" proposes a non-autoregressive language model combined with a diffusion model. This method iteratively reduces Gaussian noise to word vectors and uses the randomness of the diffusion model noise reduction process to achieve diverse text generation.

但是上述这类方法的生成过程隐藏状态具有随机性，难以实现细粒度控制，无法在生成多样性文本的同时保证文本的可控性。However, the hidden states in the generation process of the above-mentioned methods are random, making it difficult to achieve fine-grained control and unable to ensure the controllability of the text while generating diverse texts.

发明内容Summary of the invention

本发明旨在克服上述现有技术的至少一种缺陷，提供一种基于扩散模型的多样性可控文本生成方法。The present invention aims to overcome at least one defect of the above-mentioned prior art and provide a method for generating diverse and controllable text based on a diffusion model.

本发明还提供一种实现所述基于扩散模型的多样性可控文本生成方法的装置。The present invention also provides a device for realizing the method for generating diverse controllable texts based on the diffusion model.

发明概述：Summary of the invention:

本发明针对自然语言控制的多样性文本生成问题，结合认知科学，引入控制信息聚类模型，将自然语言提示编码在隐空间中聚类，将自然语言控制转化成潜在的指令控制信息类别，实现满足自然语言提示语义约束的多样性文本生成。通过对控制信息类别的分布采样，实现生成文本的多样性；使用控制变量引导扩散模型的生成过程，保证了生成文本的可控性。Aiming at the problem of diverse text generation under natural language control, the present invention combines cognitive science, introduces a control information clustering model, encodes natural language prompts in latent space and clusters them, transforms natural language control into potential instruction control information categories, and realizes diverse text generation that meets the semantic constraints of natural language prompts. The diversity of generated text is achieved by sampling the distribution of control information categories; the control variables are used to guide the generation process of the diffusion model, which ensures the controllability of the generated text.

技术术语解释：Technical term explanation:

任务需求：在本发明中指生成模型所需要完成的任务描述。Task requirement: In this invention, it refers to the description of the task that needs to be completed by the generated model.

自然语言提示：在本发明中指采用自然语言形式描述任务需要。例如在问题回答任务中，自然语言提示为“Answer the following question: How many points does abackgammon board have?”。Natural language prompt: In the present invention, it refers to the use of natural language to describe the task requirements. For example, in the question answering task, the natural language prompt is "Answer the following question: How many points does a backgammon board have?".

控制变量：在本发明中指采用编码器对自然语言提示进行编码后的结果。Control variable: in the present invention, refers to the result of encoding the natural language prompt using an encoder.

控制信息类别：在本发明中，采用高斯混合模型将控制变量聚类后结果，每一类对应不同倾向的控制信息，简称为控制信息类别。Control information category: In the present invention, a Gaussian mixture model is used to cluster the control variables, and each category corresponds to control information with different tendencies, which is referred to as a control information category for short.

扩散模型（Diffusion Model）：是一种基于马尔科夫链推断的生成式网络结构，包括扩散过程和逆扩散过程。本发明将扩散模型应用到文本生成领域，在扩散过程对文本编码逐步施加噪声，直至文本编码完全被破坏变成标准高斯噪声。在逆扩散过程训练一个去噪网络，学习从高斯噪声还原为原始文本编码的过程。Diffusion Model: It is a generative network structure based on Markov chain inference, including diffusion process and reverse diffusion process. The present invention applies the diffusion model to the field of text generation, and gradually applies noise to the text encoding in the diffusion process until the text encoding is completely destroyed and becomes standard Gaussian noise. In the reverse diffusion process, a denoising network is trained to learn the process of restoring the original text encoding from Gaussian noise.

本发明详细的技术方案如下：The detailed technical scheme of the present invention is as follows:

一种基于扩散模型的多样性可控文本生成方法，所述方法包括：A method for generating diverse and controllable text based on a diffusion model, the method comprising:

S1、构建基于自然语言控制的多样性文本生成模型，所述多样性文本生成模型包括自然语言提示编码器、文本重构自编码器、控制信息聚类模型和隐空间扩散模型；S1. Constructing a diverse text generation model based on natural language control, wherein the diverse text generation model includes a natural language prompt encoder, a text reconstruction autoencoder, a control information clustering model and a latent space diffusion model;

S2、在训练阶段，给定参考文本以及包含任务需求和控制需求的自然语言提示，并将所述参考文本和自然语言提示输入多样性文本生成模型，以训练所述多样性文本生成模型；S2. In the training phase, given a reference text And include task requirements and control requirements Natural language prompts , and the reference text and natural language prompts Inputting a diversity text generation model to train the diversity text generation model;

S3、在应用阶段，给定自然语言提示，并将所述自然语言提示输入训练好的所述多样性文本生成模型中，以逐步迭代生成符合相应控制的多样性文本；S3. In the application phase, given a natural language prompt , and the natural language prompt Inputting the trained diverse text generation model to iteratively generate diverse texts that meet corresponding controls;

其中，所述步骤S2中，训练所述多样性文本生成模型，具体包括：Wherein, in the step S2, training the diversity text generation model specifically includes:

S21、使用所述参考文本训练所述文本重构自编码器，以使用训练后的所述文本重构自编码器对所述参考文本进行编码，并将得到的原始文本编码映射到连续的隐空间中；S21. Using the reference text Train the text reconstruction autoencoder to use the trained text reconstruction autoencoder to reconstruct the reference text Encode and encode the original text Mapped into a continuous latent space;

S22、依据所述任务需求和自然语言提示的语义信息选取正负样本作为锚定样本，采用对比学习的方式训练所述自然语言提示编码器，以使用训练后的所述自然语言提示编码器对所述自然语言提示进行编码，得到控制变量；其中，所述自然语言提示在隐空间中被分为个具有相似控制的控制信息类别；S22. Based on the task requirements and natural language prompts The semantic information of , using contrastive learning to train the natural language prompt encoder, so as to use the trained natural language prompt encoder to Encode and obtain the control variable ; Wherein, the natural language prompt In the latent space, it is divided into categories of control information with similar controls;

S23、将所述控制变量输入所述控制信息聚类模型，并使用EM算法获取所述控制变量对应的自然语言提示的控制信息类别，以从所述控制信息类别的分布中采样出相应的控制信息；S23, the control variable Input the control information clustering model and use the EM algorithm to obtain the control variables Corresponding natural language prompts Control information category , to control information from the category The corresponding control information is sampled from the distribution of ;

S24、将所述文本重构自编码器输出的原始文本编码作为所述隐空间扩散模型的起始时刻隐藏状态，构造前向扩散过程和后向逆扩散过程，并在所述前向扩散过程中逐步向所述原始文本编码增加标准高斯噪声，得到时刻的文本编码；在所述后向逆扩散过程中利用所述控制变量、控制信息以及时刻的文本编码训练去噪网络，以使用训练好的去噪网络逐步迭代生成符合相应控制的多样性文本。S24, reconstructing the text into the original text encoding output by the encoder As the hidden state at the initial moment of the latent space diffusion model, a forward diffusion process and a backward reverse diffusion process are constructed, and the original text is encoded step by step in the forward diffusion process Adding standard Gaussian noise, we get Text encoding of the time ; Using the control variable in the backward diffusion process , control information as well as Text encoding of the time The denoising network is trained to iteratively generate diverse texts that meet the corresponding control using the trained denoising network.

根据本发明优选的，所述步骤S21中，使用训练后的所述文本重构自编码器对所述参考文本进行编码，并将得到的原始文本编码映射到连续的隐空间中，具体为：Preferably, according to the present invention, in step S21, the reference text is reconstructed by using the trained text reconstruction autoencoder Encode and encode the original text Mapped into a continuous latent space, specifically:

(1)； (1);

式(1)中：表示原始文本编码；表示隐藏空间；表示文本句子长度；表示隐空间的维度；为文本重构自编码器；表示参考文本。In formula (1): Indicates the original text encoding; Indicates hidden space; Indicates the length of the text sentence; represents the dimension of the latent space; Reconstructing autoencoders for text; Indicates reference text.

根据本发明优选的，所述步骤S22中，使用训练后的所述自然语言提示编码器对所述自然语言提示进行编码，得到控制变量，具体为：Preferably, according to the present invention, in step S22, the natural language prompt encoder is used to encode the natural language prompt. Encode and obtain the control variable , specifically:

(3)； (3);

式(3)中：表示控制变量；为自然语言提示编码器；表示自然语言提示。In formula (3): represents the control variable; Prompt encoder for natural language; Represents a natural language prompt.

根据本发明优选的，所述步骤S22中，用于训练所述自然语言提示编码器的损失函数包括重构损失函数和对比损失函数，即：Preferably, according to the present invention, in step S22, the loss function used to train the natural language prompt encoder includes a reconstruction loss function and a contrast loss function, namely:

(4)； (4);

式(4)中：表示自然语言提示编码器的损失；表示重构损失；表示对比损失；表示真实的提示编码，即真实的自然语言提示对应的控制变量；表示重构的提示编码，即重构的自然语言提示对应的控制变量；表示当前选取的锚定样本，表示根据该锚定样本选取的正样本，表示根据该锚定样本选取的个负样本；In formula (4): represents the loss of the natural language cue encoder; represents the reconstruction loss; represents contrast loss; represents the real prompt encoding, that is, the control variable corresponding to the real natural language prompt; represents the reconstructed prompt encoding, that is, the control variable corresponding to the reconstructed natural language prompt; represents the currently selected anchor sample, represents the positive sample selected based on the anchor sample, Indicates the number of samples selected based on the anchor sample. negative samples;

其中，重构损失函数使用的是交叉熵损失函数，即：Among them, the reconstruction loss function uses the cross entropy loss function, that is:

(5)； (5);

式(5)中，表示真实的提示编码的第个维度，表示重构的提示编码的第个维度；In formula (5), The first dimensions, The first Dimensions;

使用n-pair loss作为对比损失函数，即：Use n-pair loss as the contrast loss function, that is:

(6)； (6);

式(6)中：表示指数函数；表示锚点样本编码的转置；表示第个负样本编码；表示正样本编码。In formula (6): represents the exponential function; represents the transpose of the anchor sample encoding; Indicates Negative sample encoding; Represents the positive sample encoding.

根据本发明优选的，所述步骤S23中，使用EM算法获取所述控制变量对应的自然语言提示的控制信息类别，以从所述控制信息类别的分布中采样出相应的控制信息，具体包括：According to a preferred embodiment of the present invention, in step S23, the control variable is obtained using the EM algorithm. Corresponding natural language prompts Control information category , to control information from the category The corresponding control information is sampled from the distribution of , specifically including:

E-step：根据当前控制信息聚类模型的参数，计算每条自然语言提示对应的控制变量属于第个控制信息类别的隶属度，即有：E-step: According to the parameters of the current control information clustering model, calculate the control variable corresponding to each natural language prompt belongs to the The degree of membership of the control information category is:

(7)； (7);

式(7)中：表示第条自然语言提示对应的控制变量属于第个控制信息类别的隶属度；表示第条自然语言提示对应的控制变量属于第个控制信息类别的概率，即表示第个控制信息类别分布的均值和方差；表示控制信息类别的总数，表示第个控制信息类别；表示自然语言提示的总数；In formula (7): Indicates Natural language prompts The corresponding control variable Belong to The degree of membership of a control information category; Indicates Natural language prompts The corresponding control variable Belong to The probability of controlling the information category, That means The mean and variance of the distribution of the control information categories; Indicates the total number of control information categories, Indicates Control information categories; represents the total number of natural language prompts;

选出概率最大的控制信息类别作为自然语言提示的控制信息类别，即：Select the control information category with the highest probability as the natural language prompt Control information category ,Right now:

(8)； (8);

M-step：利用E-step计算出的隶属度更新控制信息聚类模型的参数，即：M-step: Use the membership calculated by E-step to update the parameters of the control information clustering model, namely:

(9)； (9);

(10)； (10);

式(9)和(10)中：表示第个控制信息类别的均值；表示第个控制信息类别的方差；表示自然语言提示属于第个控制信息类别的数量，表示自然语言提示条数，表示自然语言提示的总数；表示每条自然语言提示属于第个控制信息类别的概率总和；为矩阵的转置；In formulas (9) and (10): Indicates The mean of the control information categories; Indicates The variance of the control information category; Indicates that the natural language prompt belongs to The number of control information categories, Indicates the number of natural language prompts. represents the total number of natural language prompts; Indicates that each natural language prompt belongs to The sum of the probabilities of the control information categories; For the matrix The transpose of

重复上述E-step和M-step，直到达到收敛条件为止，表示第次迭代的参数，为正数。Repeat the above E-step and M-step until the convergence condition is reached until, Indicates The parameters of the iterations, Is a positive number.

根据本发明优选的，所述步骤S24中，在所述前向扩散过程中，逐步向所述原始文本编码增加标准高斯噪声，得到时刻的文本编码，具体为：According to a preferred embodiment of the present invention, in step S24, during the forward diffusion process, the original text encoding is gradually transferred to the original text encoding. Adding standard Gaussian noise, we get Text encoding of the time , specifically:

(14)； (14);

式(14)中：表示扩散过程的分布，表示原始文本编码，即为初始时刻隐藏状态；表示时刻的文本编码，即为时刻的隐藏状态；表示高斯分布；表示全1矩阵；为超参数，且，表示第个时刻的加噪比例，取值范围在0-1之间；In formula (14): represents the distribution of the diffusion process, Represents the original text encoding, that is, the hidden state at the initial moment; express The text encoding of the moment is The hidden state of the moment; represents Gaussian distribution; represents a matrix of all 1s; is a hyperparameter, and , Indicates The noise ratio at each moment, the value range is between 0-1;

在所述后向逆扩散过程中，使用回归损失对去噪网络进行训练，即有：In the backward diffusion process, the regression loss is used to denoise the network. For training, there are:

(18)； (18);

式(18)中：表示回归损失；表示期望；表示第个自然语言提示的控制信息类别。In formula (18): represents the regression loss; express expectations; Indicates A natural language prompt for the control information category.

根据本发明优选的，所述步骤S3中，在应用阶段，使用训练好的去噪网络迭代生成符合控制需求的多样性文本，即：According to the preferred embodiment of the present invention, in step S3, in the application stage, the trained denoising network is used. Iteratively generate diverse texts that meet control requirements, namely:

(20)； (20);

其中，在不同时刻采用不同的方差，以控制采样过程的随机性，即：Among them, different variances are used at different times , to control the randomness of the sampling process, that is:

(21)； (twenty one);

式(21)中：为超参数，表示大于等于0的实数。In formula (21): is a hyperparameter, Represents a real number greater than or equal to 0.

在本发明的另一个方面当中，提供了一种实现基于扩散模型的多样性可控文本生成方法的装置，所述装置包括：In another aspect of the present invention, a device for implementing a method for generating diverse controllable text based on a diffusion model is provided, the device comprising:

构建模块，用于构建基于自然语言控制的多样性文本生成模型，所述多样性文本生成模型包括自然语言提示编码器、文本重构自编码器、控制信息聚类模型和隐空间扩散模型；A construction module is used to construct a diverse text generation model based on natural language control, wherein the diverse text generation model includes a natural language prompt encoder, a text reconstruction autoencoder, a control information clustering model, and a latent space diffusion model;

训练模块，用于基于给定的参考文本以及包含任务需求和控制需求的自然语言提示，并将所述参考文本和自然语言提示输入多样性文本生成模型，以训练所述多样性文本生成模型；Training module, used based on given reference text And include task requirements and control requirements Natural language prompts , and the reference text and natural language prompts Inputting a diversity text generation model to train the diversity text generation model;

执行模块，用于基于给定的自然语言提示，并将所述自然语言提示输入训练好的所述多样性文本生成模型中，以逐步迭代生成符合相应控制的多样性文本；Execution module, used to respond to given natural language prompts , and the natural language prompt Inputting the trained diverse text generation model to iteratively generate diverse texts that meet corresponding controls;

其中，训练所述多样性文本生成模型，具体包括：The training of the diversity text generation model specifically includes:

在本发明的另一个方面当中，还提供了一种电子设备，包括：In another aspect of the present invention, there is also provided an electronic device, comprising:

至少一个处理器；以及at least one processor; and

存储器，所述存储器存储指令，当所述指令被所述至少一个处理器执行时，使得所述至少一个处理器执行如上所述的基于扩散模型的多样性可控文本生成方法。A memory storing instructions, which, when executed by the at least one processor, enables the at least one processor to execute the diversity controllable text generation method based on the diffusion model as described above.

在本发明的另一个方面当中，还提供了一种机器可读存储介质，其存储有可执行指令，所述指令当被执行时使得所述机器执行如上所述的基于扩散模型的多样性可控文本生成方法。In another aspect of the present invention, a machine-readable storage medium is provided, which stores executable instructions, and when the instructions are executed, the machine executes the diversity controllable text generation method based on the diffusion model as described above.

与现有技术相比，本发明的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

（1）本发明提供的基于扩散模型的多样性可控文本生成方法，与现有方法相比，本发明使用自然语言指令作为控制信息能够实现例如抽象的概念、观点、意图等复杂的控制；同时自然语言提示优化了用户体验，用户能够使用自然语言灵活改变控制条件，实现对文本生成模型的灵活控制。(1) Compared with the existing methods, the method for generating diverse and controllable text based on the diffusion model provided by the present invention uses natural language instructions as control information to achieve complex control such as abstract concepts, viewpoints, and intentions. At the same time, natural language prompts optimize the user experience, and users can use natural language to flexibly change control conditions and achieve flexible control of the text generation model.

（2）本发明引入控制信息聚类模型，保证生成文本的可控性。受认知科学的启发，成百上千的个体有着相似的思想，尽管他们的表达方式不同。本发明将自然语言提示编码在隐空间中聚类，将自然语言控制转化成潜在的指令控制类别，根据提示控制类别分布采样控制信息，引导文本生成过程，实现对生成文本的控制。(2) The present invention introduces a control information clustering model to ensure the controllability of the generated text. Inspired by cognitive science, hundreds of individuals have similar ideas, although they express them differently. The present invention encodes natural language prompts and clusters them in latent space, converts natural language control into potential instruction control categories, samples control information according to the prompt control category distribution, guides the text generation process, and realizes the control of the generated text.

（3）本发明引入多样性的控制信息，保证生成文本的多样性。通过对控制信息类别的分布采样，采样出针对同一自然语言提示的多样性控制信息，引导扩散模型的生成过程，实现多样性文本生成。(3) The present invention introduces diverse control information to ensure the diversity of generated text. By sampling the distribution of control information categories, diverse control information for the same natural language prompt is sampled, guiding the generation process of the diffusion model, and achieving diverse text generation.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明所述基于扩散模型的多样性可控文本生成方法的架构图。FIG1 is an architecture diagram of the method for generating diverse and controllable text based on a diffusion model according to the present invention.

图2是本发明中微调自然语言提示编码器的过程示意图。FIG. 2 is a schematic diagram of the process of fine-tuning the natural language prompt encoder in the present invention.

图3是本发明方法的应用阶段流程示意图。FIG. 3 is a schematic flow chart of the application phase of the method of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图与实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出，以下详细说明都是示例性的，旨在对本发明提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are exemplary and are intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present invention belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本发明的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.

在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。In the absence of conflict, the embodiments of the present invention and the features of the embodiments may be combined with each other.

针对现有技术所存在的缺陷，本发明提出了一种基于扩散模型的可控多样性文本生成方法，该方法包括模型训练阶段和应用阶段。In view of the defects of the prior art, the present invention proposes a controllable diversity text generation method based on a diffusion model, which includes a model training stage and an application stage.

在训练阶段，针对给定包含任务需求和控制需求的自然语言提示，使用自然语言提示编码器进行编码，得到控制变量；控制信息聚类模型能够捕获所构造的不同自然语言提示中的潜在控制共性，从分布的角度区分不同提示控制类别。通过对潜在的控制类别的高斯分布采样，获取控制信息，作为隐空间扩散模型的控制信息。In the training phase, given the task requirements and control requirements Natural language prompts , encoded using the natural language prompt encoder, and obtained the control variable ; The control information clustering model can capture the different natural language prompts constructed The commonality of potential control in the ,distinguishes different prompt control categories from the perspective of distribution.,The control information is obtained by sampling the Gaussian distribution of potential control categories. , as the control information of the latent space diffusion model.

同时，使用文本重构自编码器对给定的参考文本编码，并将得到的原始文本编码作为隐空间扩散模型初始时刻的隐藏状态。隐空间扩散模型在文本编码空间中构造扩散过程，并将扩散过程中生成的数据作为训练逆扩散过程中的去噪网络的监督数据；使用控制变量和控制信息共同作为隐空间的控制信息，引导逆扩散过程，训练去噪网络。At the same time, the text reconstruction autoencoder is used to reconstruct the given reference text. Encode and encode the original text As the hidden state of the latent space diffusion model at the initial moment. The latent space diffusion model constructs a diffusion process in the text encoding space and uses the data generated in the diffusion process as the supervision data for training the denoising network in the inverse diffusion process; using the control variable and control information Together they serve as control information in the latent space, guiding the inverse diffusion process and training the denoising network.

在应用阶段，给定自然语言提示，使用自然语言提示编码器对其进行编码，得到控制变量；使用控制信息聚类模型获取控制信息。然后，从文本先验分布中采样作为时刻的含有噪声的隐藏状态。最后，将控制信息和控制变量作为隐空间扩散模型的输入，使用训练好的去噪网络逐步生成符合相应控制的多样性文本。In the application phase, given a natural language prompt , use the natural language prompt encoder to encode it and get the control variable ; Use the control information clustering model to obtain control information Then, we sample from the text prior distribution as Finally, the control information and control variables As the input of the latent space diffusion model, the trained denoising network is used to gradually generate diverse texts that meet the corresponding controls.

以下结合具体实施例对发明的基于扩散模型的可控多样性文本生成方法和装置作详细说明。The following is a detailed description of the controllable diversity text generation method and device based on the diffusion model of the invention in conjunction with specific embodiments.

实施例1、Embodiment 1,

本实施例提供一种基于扩散模型的可控多样性文本生成方法，所述方法包括：This embodiment provides a method for generating controllable diversity text based on a diffusion model, the method comprising:

S1、构建基于自然语言控制的多样性文本生成模型，所述多样性文本生成模型包括自然语言提示编码器、文本重构自编码器、控制信息聚类模型和隐空间扩散模型。S1. Construct a diverse text generation model based on natural language control, wherein the diverse text generation model includes a natural language prompt encoder, a text reconstruction autoencoder, a control information clustering model and a latent space diffusion model.

具体参图1，本实施例中所设计的基于自然语言控制的多样性文本生成模型包括自然语言提示编码器、文本重构自编码器、控制信息聚类模型和隐空间扩散模型四部分。自然语言提示编码器用于对自然语言提示进行编码，以便更好地反映控制信息之间的共性和差异；文本重构自编码器能够将文本编码准确地映射成相应的文本；控制信息聚类模型用于建模不同控制类型控制变量的不同分布，将自然语言提示转化为控制信息；隐空间扩散模型用于将控制信息和控制变量作为输入，引导扩散过程，生成符合控制需求的多样性文本。Specifically referring to Figure 1, the diverse text generation model based on natural language control designed in this embodiment includes four parts: natural language prompt encoder, text reconstruction autoencoder, control information clustering model and latent space diffusion model. The natural language prompt encoder is used to encode natural language prompts to better reflect the commonalities and differences between control information; the text reconstruction autoencoder can accurately map text encodings into corresponding texts; the control information clustering model is used to model the different distributions of control variables of different control types and convert natural language prompts into control information; the latent space diffusion model is used to take control information and control variables as input, guide the diffusion process, and generate diverse texts that meet control requirements.

优选地，在实施过程中，文本重构自编码器、自然语言提示编码器和隐空间扩散模型均使用神经网络结构，控制信息聚类模型使用高斯混合模型并使用EM算法更新迭代。Preferably, during the implementation, the text reconstruction autoencoder, the natural language prompt encoder and the latent space diffusion model all use a neural network structure, and the control information clustering model uses a Gaussian mixture model and uses the EM algorithm to update and iterate.

S2、在训练阶段，给定参考文本以及包含任务需求和控制需求的自然语言提示，并将所述参考文本和自然语言提示输入多样性文本生成模型，以训练所述多样性文本生成模型。S2. In the training phase, given a reference text And include task requirements and control requirements Natural language prompts , and the reference text and natural language prompts A diversity text generation model is input to train the diversity text generation model.

所述步骤S2中，训练所述多样性文本生成模型，具体包括：In the step S2, training the diversity text generation model specifically includes:

S21、使用所述参考文本训练所述文本重构自编码器，以使用训练后的所述文本重构自编码器对所述参考文本进行编码，并将得到的原始文本编码映射到连续的隐空间中。S21. Using the reference text Train the text reconstruction autoencoder to use the trained text reconstruction autoencoder to reconstruct the reference text Encode and encode the original text Mapped into a continuous latent space.

本实施例中，使用无监督方法微调训练文本重构自编码器。所述的文本重构自编码器的训练目标是准确地对文本进行编码解码，优选使用无监督的重构方式进行训练。In this embodiment, an unsupervised method is used to fine-tune the training text reconstruction autoencoder. The training goal of the text reconstruction autoencoder is to accurately encode and decode the text, and it is preferably trained using an unsupervised reconstruction method.

其中文本重构自编码器的训练输入为参考文本，输出为对于输入文本的重构文本。训练目标为输出的重构文本尽可能的重构输入的参考文本。The training input of the text reconstruction autoencoder is the reference text , the output is the reconstructed text for the input text The training target is the output reconstructed text Reconstruct the input reference text as much as possible .

此过程由文本重构自编码器完成，优选采用BART模型作为文本重构自编码器的基础模型，并使用交叉熵重构损失函数进行微调训练，该交叉熵重构损失函数与下文自然语言提示编码器训练过程中所使用的重构损失函数相同，即为公式(5)。然后，使用训练后的文本重构自编码器对参考文本进行编码，并将得到的原始文本编码映射到连续的隐空间中，即有：This process is completed by the text reconstruction autoencoder. The BART model is preferably used as the basic model of the text reconstruction autoencoder, and the cross entropy reconstruction loss function is used for fine-tuning training. The cross entropy reconstruction loss function is the same as the reconstruction loss function used in the natural language prompt encoder training process below, that is, formula (5). Then, the trained text reconstruction autoencoder is used Reference text Encode and encode the original text Mapped into a continuous latent space, we have:

(1)； (1);

S22、依据所述任务需求和自然语言提示的语义信息选取正负样本作为锚定样本，采用对比学习的方式训练所述自然语言提示编码器，以使用训练后的所述自然语言提示编码器对所述自然语言提示进行编码，得到控制变量；其中，所述自然语言提示在隐空间中被分为个具有相似控制的控制信息类别。S22. Based on the task requirements and natural language prompts The semantic information of , using contrastive learning to train the natural language prompt encoder, so as to use the trained natural language prompt encoder to Encode and obtain the control variable ; Wherein, the natural language prompt In the latent space, it is divided into categories of control information with similar controls.

具体地，针对给定的任务需求和控制需求，设计自然语言提示模板，构造自然语言提示。所述的自然语言提示是以人类可理解的自然语言的形式组织任务要求和原句信息，包含生成任务的任务需求和控制需求两部分，即有：Specifically, for a given task requirement and control requirements , design natural language prompt templates, construct natural language prompts The natural language prompt It organizes task requirements and original sentence information in the form of natural language that can be understood by humans, including the task requirements of the generated task. and control requirements Two parts, namely:

(2)； (2);

式(2)中：表示构造的自然语言提示；表示任务需求；表示控制需求；为间隔符号。In formula (2): natural language cues that represent constructs; Indicates task requirements; Indicates control needs; Is the interval symbol.

针对不同的生成任务，任务需求有着不同的构造方式。For different generation tasks, task requirements There are different ways of construction.

示范性地，以评论生成任务为例，该任务的任务需求=[Generate comment withthe information：]。而对于同义改写任务，该任务的任务需求=[Generate paraphraseof the following sentence：]。For example, taking the comment generation task as an example, the task requirements of this task are =[Generate comment with the information:]. For the synonym rewriting task, the task requirements =[Generate paraphrase of the following sentence:].

控制需求提供期望生成文本所包含的相应信息。例如，在评论生成任务中，控制需求包含评论餐馆的名称、餐厅评级、消费水平、餐厅位置等信息；而在同义改写任务中，控制需求则为需要改写的文本。Control requirements Provide the corresponding information expected to be contained in the generated text. For example, in the comment generation task, the control requirement It contains the name of the restaurant being reviewed, the restaurant rating, the consumption level, the restaurant location, and other information. In the synonymous rewriting task, the control demand This is the text that needs to be rewritten.

具体参图2，本实施例中，依据给定的任务需求和自然语言提示的语义信息，从构造的自然语言提示中选取正负样本，即锚定样本，此处优选为每一条自然语言提示选取一个正样本和个负样本，采用对比学习的方式微调训练自然语言提示编码器。Specifically refer to Figure 2. In this embodiment, according to the given task requirements and natural language prompts Semantic information of the selected positive and negative samples from the constructed natural language prompts, namely anchor samples , where each natural language prompt is preferred Select a positive sample and Negative samples , a contrastive learning approach is used to fine-tune the natural language prompt encoder.

所述的对自然语言提示编码器微调的目标是捕捉不同自然语言提示中的潜在共性，拉近相似控制的控制变量的距离，疏远不同控制的控制变量的距离。The goal of fine-tuning the natural language prompt encoder is to capture different natural language prompts The potential commonalities in the data can bring the control variables with similar controls closer together and distance the control variables with different controls further apart.

所述的自然语言提示编码器优选采用BART模型作为基础模型，然后利用训练后的自然语言提示编码器对输入的自然语言提示进行编码，以获得控制变量，即有：The natural language prompt encoder It is preferred to use the BART model as the basic model, and then use the trained natural language prompt encoder to input natural language prompts Encoding to obtain control variables , that is:

(3)； (3);

进一步地，微调训练自然语言提示编码器的损失函数由两部分组成，分别为重构损失函数和对比损失函数，即：Furthermore, the loss function for fine-tuning the natural language prompt encoder consists of two parts, namely the reconstruction loss function and the contrast loss function, namely:

(4)； (4);

式(4)中：表示自然语言提示编码器的损失；表示重构损失；表示对比损失；表示真实的提示编码，即真实的自然语言提示对应的控制变量；表示重构的提示编码，即重构的自然语言提示对应的控制变量；表示当前选取的锚定样本，表示根据该锚定样本选取的正样本，表示根据该锚定样本选取的个负样本。In formula (4): represents the loss of the natural language cue encoder; represents the reconstruction loss; represents contrast loss; represents the real prompt encoding, that is, the control variable corresponding to the real natural language prompt; represents the reconstructed prompt encoding, that is, the control variable corresponding to the reconstructed natural language prompt; represents the currently selected anchor sample, represents the positive sample selected based on the anchor sample, Indicates the number of samples selected based on the anchor sample. Negative samples.

重构损失函数能够使自然语言提示被编码后尽可能多的保留语义知识，该重构损失函数使用的是交叉熵损失函数，即：The reconstruction loss function can make natural language prompts After being encoded, the semantic knowledge is retained as much as possible. The reconstruction loss function uses the cross entropy loss function, that is:

(5)； (5);

对比损失函数能够差分不同类别控制的控制变量，此处优选使用n-pair loss作为对比损失函数，即：The contrast loss function can differentiate the control variables of different categories of control. Here, n-pair loss is preferably used as the contrast loss function, that is:

(6)； (6);

其中，负样本分为类内负样本和类间负样本，类内负样本的任务需求相同但控制需求语义差异较大；类间负样本的任务需求不同。Among them, negative samples are divided into intra-class negative samples and inter-class negative samples. The task requirements of intra-class negative samples are Same but control requirements Semantic differences are large; task requirements for negative samples between classes different.

S23、将所述控制变量输入所述控制信息聚类模型，并使用EM算法获取所述控制变量对应的自然语言提示的控制信息类别，以从所述控制信息类别的分布中采样出相应的控制信息。S23, the control variable Input the control information clustering model and use the EM algorithm to obtain the control variables Corresponding natural language prompts Control information category , to control information from the category The corresponding control information is sampled from the distribution of .

控制信息聚类模型用于建模自然语言提示，捕获自然语言提示中潜在控制共性，使具有相似控制的自然语言提示隶属于同一个控制类别。The control information clustering model is used to model natural language prompts, capture the potential control commonalities in natural language prompts, and make natural language prompts with similar controls belong to the same control category.

本实施例中，控制信息聚类模型采用高斯混合模型。In this embodiment, the control information clustering model adopts a Gaussian mixture model.

基于认知科学，假设自然语言提示可以在隐空间中被分为个具有相似控制的控制信息类别，每个控制信息类别的分布都是相互独立的高斯分布，每一个高斯分布各代表一种控制信息类别。随机选取一小部分自然语言提示，并利用这些自然语言提示建立控制信息聚类模型。Based on cognitive science, it is assumed that natural language cues can be divided into The control information categories with similar controls are distributed in Gaussian distributions that are independent of each other, and each Gaussian distribution represents a control information category. A small number of natural language prompts are randomly selected, and a control information clustering model is established using these natural language prompts.

具体地，首先对控制信息聚类模型进行初始化。已知自然语言提示可以在隐空间中被分为个具有相似控制的控制信息类别，那么随机生成的组均值和方差，可视作类潜在自然语言控制指令的初始分布。用表示自然语言提示对应的控制变量属于第个控制信息类别的概率，也就是第个控制信息类别分布的均值和方差。Specifically, the control information clustering model is first initialized. It is known that natural language prompts can be divided into control information categories with similar controls, then the randomly generated Group mean and variance , which can be regarded as The initial distribution of potential natural language control instructions. Represents natural language prompts The corresponding control variable Belong to The probability of controlling the information category, That is The mean and variance of the distribution of the control information categories.

对于高斯混合模型，由于每个控制信息类别都有未知参数、，无法使用最大似然法求导计算，需要选择使用EM算法迭代求解每个控制信息类别的均值和方差。具体步骤如下：For the Gaussian mixture model, since each control information category has unknown parameters , , it is not possible to use the maximum likelihood method to calculate the derivative, and it is necessary to use the EM algorithm to iteratively solve the mean and variance of each control information category. The specific steps are as follows:

(7)； (7);

式(7)中：表示第条自然语言提示对应的控制变量属于第个控制信息类别的隶属度；表示第条自然语言提示对应的控制变量属于第个控制信息类别的概率，即表示第个控制信息类别分布的均值和方差；表示控制信息类别的总数，表示第个控制信息类别；表示自然语言提示的总数。In formula (7): Indicates Natural language prompts The corresponding control variable Belong to The degree of membership of a control information category; Indicates Natural language prompts The corresponding control variable Belong to The probability of controlling the information category, That means The mean and variance of the distribution of the control information categories; Indicates the total number of control information categories, Indicates Control information categories; Represents the total number of natural language prompts.

(8)。 (8).

(9)； (9);

(10)； (10);

重复上述E-step和M-step，直到达到收敛条件为止，表示第次迭代的参数，是一个很小的正数。Repeat the above E-step and M-step until the convergence condition is reached until, Indicates The parameters of the iterations, is a very small positive number.

应当理解，在上述过程中，对于随机选取的一小部分自然语言提示之外的新的自然语言提示对应的新控制变量，计算新控制变量所对应的自然语言提示属于每个控制信息类别的似然概率，选取似然概率最大的控制信息类别作为该提示的控制信息类别，即有：It should be understood that in the above process, the new control variables corresponding to the new natural language prompts other than the randomly selected small number of natural language prompts are , calculate the likelihood probability that the natural language prompt corresponding to the new control variable belongs to each control information category, and select the control information category with the largest likelihood probability as the control information category of the prompt , that is:

(11)； (11);

式(11)中：表示新控制变量对应的自然语言提示属于第个控制信息类别的概率。In formula (11): Represents the new control variable The corresponding natural language prompt belongs to The probability of controlling the information category.

最后，在获取自然语言提示的控制信息类别后，依据类别分布的均值和方差采样获取该自然语言提示的控制信息，引导隐空间扩散模型生成符合对应控制的文本。针对相同的自然语言提示，可以通过多次采样生成不同的控制信息，保证生成文本的多样性。Finally, in the control information category of obtaining natural language prompts Then, the control information of the natural language prompt is obtained by sampling based on the mean and variance of the category distribution. , guiding the latent space diffusion model to generate text that meets the corresponding control. For the same natural language prompt, different control information can be generated through multiple samplings to ensure the diversity of generated text.

S24、将所述文本重构自编码器输出的原始文本编码作为所述隐空间扩散模型的起始时刻隐藏状态，构造前向扩散过程和后向逆扩散过程，并在所述前向扩散过程中逐步向所述原始文本编码增加标准高斯噪声，得到时刻的文本编码；在所述后向逆扩散过程中利用所述控制变量、控制信息以及时刻的文本编码训练去噪网络，以使用训练好的去噪网络逐步迭代生成符合相应控制的多样性文本。S24, reconstructing the text into the original text encoding output by the encoder As the initial hidden state of the latent space diffusion model, a forward diffusion process and a backward reverse diffusion process are constructed, and the original text is encoded step by step in the forward diffusion process. Adding standard Gaussian noise, we get Text encoding of the time ; Using the control variable in the backward diffusion process , control information as well as Text encoding of the time The denoising network is trained to iteratively generate diverse texts that meet the corresponding control using the trained denoising network.

该步骤的实施是在隐空间中构造扩散模型，并引入控制变量引导文本生成。The implementation of this step is to construct a diffusion model in the latent space and introduce control variables to guide text generation.

所述的隐空间扩散模型是将原始文本编码作为初始时刻隐藏状态，在此基础上构造前向扩散过程和后向逆扩散过程。The latent space diffusion model encodes the original text As the hidden state at the initial moment, the forward diffusion process and the backward reverse diffusion process are constructed on this basis.

前向扩散过程是一个无参数的加噪过程。通过向初始时刻隐藏状态中逐步增加标准高斯噪声，构建中间隐藏状态，直到完成步扩散，使时刻的隐藏状态的噪声接近于标准高斯噪声。其中，每一刻隐藏状态的维度都与文本编码的维度相同，为马尔科夫链。The forward diffusion process is a parameter-free noise adding process. By gradually adding standard Gaussian noise to the initial hidden state, the intermediate hidden state is constructed. Until completion Step diffusion The noise of the hidden state at each moment is close to standard Gaussian noise. The dimension of the hidden state at each moment is the same as the dimension of the text encoding. is a Markov chain.

从时刻的隐藏状态到时刻的隐藏状态的过程可以参数化为下述表达，即：from The hidden state at time The process of the hidden state at the moment can be parameterized as follows:

(12)； (12);

(13)； (13);

式(12)和(13)中：表示原始文本编码，即为初始时刻隐藏状态；表示时刻的文本编码，即为时刻的隐藏状态；为时刻的文本编码，即为时刻的隐藏状态；表示中间隐藏状态；表示高斯分布；超参数表示在扩散时刻增加的噪声量；表示连乘，表示全1矩阵。In formulas (12) and (13): Represents the original text encoding, that is, the hidden state at the initial moment; express The text encoding of the moment is The hidden state of the moment; for The text encoding of the moment is The hidden state of the moment; represents the intermediate hidden state; represents a Gaussian distribution; hyperparameter Indicates that it is spreading The amount of noise that is increasing at any moment; Indicates continuous multiplication, represents a matrix of all ones.

任意时刻的隐藏状态均可以通过基于原始文本编码、超参数和时间的分布中直接采样出来，而不需要根据公式(11)一步步迭代，即：The hidden state at any time can be encoded based on the original text , hyperparameters and time It can be directly sampled from the distribution of , without iterating step by step according to formula (11), that is:

(14)； (14);

式(14)中：表示扩散过程的分布，表示原始文本编码，即为初始时刻隐藏状态；表示时刻的文本编码，即为时刻的隐藏状态；表示高斯分布；表示全1矩阵；为超参数，且，表示第个时刻的加噪比例，取值范围在0-1之间。In formula (14): represents the distribution of the diffusion process, Represents the original text encoding, that is, the hidden state at the initial moment; express The text encoding of the moment is The hidden state of the moment; represents Gaussian distribution; represents a matrix of all 1s; is a hyperparameter, and , Indicates The noise ratio at each moment, the value range is between 0 and 1.

后向逆扩散过程的目的是逐步去除扩散过程中各时刻隐藏状态中的噪声，逐步恢复原始文本编码。后向逆扩散过程是从高斯噪声中恢复原始分布信息，假设后向逆扩散过程也是一个高斯分布，但是无法逐步地去拟合分布，那么则需要构建一个参数分布去进行估计，即有：The purpose of the backward inverse diffusion process is to gradually remove the noise in the hidden state at each moment of the diffusion process and gradually restore the original text encoding. The backward inverse diffusion process is to recover the original distribution information from Gaussian noise. Assuming that the backward inverse diffusion process is also a Gaussian distribution, but it is impossible to fit the distribution step by step, then it is necessary to construct a parameter distribution for estimation, that is,

(15)； (15);

(16)； (16);

(17)； (17);

式(15)~(17)中：和表示逆扩散过程的高斯分布时刻的均值和方差；为超参数，且。In formulas (15) to (17): and Gaussian distribution representing the inverse diffusion process The mean and variance of the moment; is a hyperparameter, and .

进一步地，在后向逆扩散过程中，本实施例中通过构建并训练一个去噪网络，从含有噪声的隐藏状中预测文本初始时刻状态。给定时刻的文本编码和去噪网络的输出，便可以采样出时刻的文本编码。Furthermore, in the backward diffusion process, in this embodiment, a denoising network is constructed and trained. , predict the initial state of the text from the hidden state containing noise Given Text encoding of the time and denoising network Output , we can sample Text encoding of the time .

为了实现可控文本生成，本实施例中将控制变量与根据控制变量采样出的控制信息作为隐空间提示，融入去噪网络的训练过程。给定时间，可得到时刻的文本编码，再结合控制变量和控制信息，则有，即可用来预测原始文本编码。In order to realize controllable text generation, in this embodiment, the control variable and the control information sampled from the control variable As a latent space hint, integrated into the denoising network The training process. Given time ,available Text encoding of the time , combined with the control variables and control information , then , which can be used to predict the original text encoding .

本实施例中，使用回归损失对去噪网络进行训练，即有：In this embodiment, regression loss is used to denoise the network. For training, there are:

(18)； (18);

最后，使用训练好的去噪网络，可以将逆扩散过程的分布定义为正向过程中扩散过程分布的逆过程，使用逆扩散模型中预测出的原始文本编码和时刻的文本编码，从中采样出时刻的文本编码，即有：Finally, use the trained denoising network , the distribution of the reverse diffusion process can be Defined as the diffusion process distribution in the forward process The inverse process of using the original text encoding predicted by the inverse diffusion model and Text encoding of the time ,from Sampling Text encoding of the time , that is:

(19)。 (19).

如此，使用控制变量和控制信息共同作为隐空间的控制信息，引导逆扩散过程，训练去噪网络，在应用阶段，可利用训练好的去噪网络逐步迭代生成符合相应控制的多样性文本。Thus, using the control variable and control information Together they serve as control information of the latent space, guide the inverse diffusion process, and train the denoising network. In the application stage, the trained denoising network can be used to gradually and iteratively generate diverse texts that meet the corresponding control.

S3、在应用阶段，给定自然语言提示，并将所述自然语言提示输入训练好的所述多样性文本生成模型中，以逐步迭代生成符合相应控制的多样性文本。S3. In the application phase, given a natural language prompt , and the natural language prompt The trained diverse text generation model is input to gradually and iteratively generate diverse texts that meet the corresponding controls.

上述基于自然语言控制的多样性文本生成模型训练完成后，给定一个自然语言提示，使用训练好的自然语言提示编码器、控制信息聚类模型和隐空间扩散模型生成符合控制的多样性文本。After the above-mentioned diverse text generation model based on natural language control is trained, given a natural language prompt ,Using the trained natural language cue encoder, control information clustering model and latent space diffusion model, we generate control-compliant diverse text.

具体参图3，给定自然语言提示，使用训练好的自然语言提示编码器对其编码，得到控制变量，然后将得到的控制变量输入训练好的控制信息聚类模型，从而获得该自然语言提示所隶属的控制信息类别，再根据控制信息类别的分布采样获取控制信息。See Figure 3 for details. Given a natural language prompt , use the trained natural language prompt encoder to encode it and get the control variable , and then the obtained control variable Input the trained control information clustering model to obtain the natural language prompt Control information category , and then according to the control information category The distribution sampling obtains the control information .

对于隐空间扩散模型，从标准高斯分布中采样以作为时刻的隐藏状态。根据控制变量和采样出的控制信息组成的隐空间提示，使用训练好的去噪网络迭代步生成符合控制需求的多样性文本，即：For the latent space diffusion model, we sample from a standard Gaussian distribution as The hidden state at the moment. According to the control variable and the sampled control information The latent space prompts composed of Step 1 generates a variety of texts that meet the control requirements, namely:

(20)； (20);

(21)； (twenty one);

以下给出本实施例方法应用到具体场景的示范例，如下：The following is an example of applying the method of this embodiment to a specific scenario:

在针对评论生成、问题回答和同义改写这三个常见的网络文本生成问题中，由于智能服务用户的多样性，需要生成相应的多样性文本，以满足不同用户的需求，提高用户的体验。在此场景下，本实施例通过基于自然语言提示的多样性文本生成方法针对相同的自然语言提示生成一组符合控制的多样性文本。In the three common network text generation problems of comment generation, question answering and synonym rewriting, due to the diversity of intelligent service users, it is necessary to generate corresponding diverse texts to meet the needs of different users and improve the user experience. In this scenario, this embodiment generates a set of diverse texts that meet the control for the same natural language prompt through a diverse text generation method based on natural language prompts.

上述示例中，自然语言提示包含任务需求和控制需求。例如，针对评论生成任务，任务需求=[Generate comment with the information]，控制需求为评论餐厅的名称、评级、消费水平、位置等信息。In the above example, the natural language prompt contains task requirements and control requirements. For example, for the review generation task, the task requirement =[Generate comment with the information], control requirements To review the restaurant's name, rating, consumption level, location and other information.

上述应用示例的训练过程包括预训练和训练扩散模型两个阶段。在预训练阶段，主要预训练文本重构自编码器、自然语言提示编码器和建立控制信息聚类模型。The training process of the above application example includes two stages: pre-training and training diffusion model. In the pre-training stage, the text reconstruction autoencoder, natural language prompt encoder and control information clustering model are mainly pre-trained.

文本重构自编码器使用BART作为基础模型，在预训练时输入为训练数据中的参考文本，输出为重构文本，其使用的交叉熵重构损失函数在上文描述中已经给出。The text reconstruction autoencoder uses BART as the base model, and the input during pre-training is the reference text in the training data. , the output is the reconstructed text , the cross entropy reconstruction loss function used has been given in the above description.

自然语言提示编码器同样使用BART作为基础模型，在预训练阶段输入为构造好正负样本对的自然语言提示，输出为控制变量，其使用的损失函数分别为重构损失函数和对比损失函数，在上文描述中已经给出。The natural language prompt encoder also uses BART as the basic model. In the pre-training stage, the input is the natural language prompt with constructed positive and negative sample pairs, and the output is the control variable. The loss functions used are the reconstruction loss function and the contrast loss function, which have been given in the above description.

其中，正负样本是依据任务需求和控制需求进行选择的，本实施例中选取与锚定样本任务需求相同并且控制需求相近的自然语言提示作为正样本。负样本分为类内负样本和类间负样本。类内负样本的任务需求相同但是控制需求语义差异较大，类间负样本要求任务需求不同。针对一个锚定样本，在此应用示例中选取1个正样本，10个类内负样本，20个类间负样本。Among them, positive and negative samples are based on task requirements and control requirements In this embodiment, natural language prompts with the same task requirements and similar control requirements as the anchor sample are selected as positive samples. Negative samples are divided into intra-class negative samples and inter-class negative samples. The task requirements of intra-class negative samples are the same, but the control requirements have large semantic differences, while inter-class negative samples require different task requirements. For one anchor sample, 1 positive sample, 10 intra-class negative samples, and 20 inter-class negative samples are selected in this application example.

在上述预训练过程结束后，进行隐空间扩散模型训练：使用预训练过的文本重构自编码器对参考文本进行编码，将得到的原始文本编码作为初始时刻隐藏状态构造扩散过程。使用预训练过的自然语言提示编码器对自然语言提示编码，通过控制信息聚类模型获取该提示所属的控制信息类别，采样出控制信息；将控制变量和采样出的控制信息作为输入提供给逆扩散过程中的去噪网络，其使用的损失函数在上文描述中已经给出。After the above pre-training process is completed, the latent space diffusion model training is performed: the reference text is encoded using the pre-trained text reconstruction autoencoder, and the original text encoding is used as the initial hidden state to construct the diffusion process. The natural language prompt is encoded using the pre-trained natural language prompt encoder, and the control information category to which the prompt belongs is obtained through the control information clustering model, and the control information is sampled; the control variables and the sampled control information are provided as input to the denoising network in the inverse diffusion process, and the loss function used is given in the above description.

整个训练过程结束后，给定自然语言提示，依据控制变量确定提示隶属的控制信息类别并采样出控制信息。从标准高斯噪声中随机采样作为时刻的隐藏状态，使用训练好的去噪网络逐步迭代生成符合控制条件的多样性文本。After the entire training process is completed, given a natural language prompt, the control information category to which the prompt belongs is determined based on the control variable and the control information is sampled. Random sampling from standard Gaussian noise is used as The hidden state at each moment uses the trained denoising network to iteratively generate diverse text that meets the control conditions.

实施例2、Embodiment 2,

本实施例提供一种实现基于扩散模型的多样性可控文本生成方法的装置，所述装置包括：This embodiment provides a device for implementing a method for generating diverse controllable text based on a diffusion model, the device comprising:

执行模块，用于基于给定的自然语言提示，并将所述自然语言提示输入训练好的所述多样性文本生成模型中，以逐步迭代生成符合相应控制的多样性文本；Execution module, used to execute actions based on given natural language prompts , and the natural language prompt Inputting the trained diverse text generation model to iteratively generate diverse texts that meet corresponding controls;

实施例3、Embodiment 3,

本实施例还提供一种电子设备，包括：This embodiment also provides an electronic device, including:

至少一个处理器；以及at least one processor; and

在本实施例中，电子设备可以包括但不限于：个人计算机、服务器计算机、工作站、桌面型计算机、膝上型计算机、笔记本计算机、移动计算设备、智能电话、平板计算机、蜂窝电话、个人数字助理(PDA)、手持装置、消息收发设备、可佩戴计算设备、消费电子设备等等。In this embodiment, the electronic device may include, but is not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile computing devices, smart phones, tablet computers, cellular phones, personal digital assistants (PDAs), handheld devices, messaging devices, wearable computing devices, consumer electronic devices, and the like.

实施例4、Embodiment 4,

本实施例还提供了一种机器可读存储介质，其存储有可执行指令，所述指令当被执行时使得所述机器执行如上所述的基于扩散模型的多样性可控文本生成方法。This embodiment also provides a machine-readable storage medium storing executable instructions, which, when executed, enable the machine to execute the above-mentioned method for generating diverse controllable text based on the diffusion model.

具体地，可以提供配有可读存储介质的系统或者装置，在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码，且使该系统或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。Specifically, a system or device equipped with a readable storage medium can be provided, on which software program codes that implement the functions of any of the above-mentioned embodiments are stored, and a computer or processor of the system or device can read and execute instructions stored in the readable storage medium.

在这种情况下，从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能，因此机器可读代码和存储机器可读代码的可读存储介质构成了本说明书的一部分。In this case, the program code itself read from the machine-readable medium can realize the function of any one of the above embodiments, and thus the machine-readable code and the machine-readable storage medium storing the machine-readable code constitute part of this specification.

可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘（如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW）、磁带、非易失性存储卡和ROM。可选择地，可以由通信网络从服务器计算机上或云上下载程序代码。Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tapes, non-volatile memory cards, and ROMs. Optionally, the program code may be downloaded from a server computer or a cloud via a communication network.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

显然，本发明的上述实施例仅仅是为清楚地说明本发明技术方案所作的举例，而并非是对本发明的具体实施方式的限定。凡在本发明权利要求书的精神和原则之内所做的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the technical solution of the present invention, and are not intended to limit the specific implementation methods of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the claims of the present invention shall be included in the protection scope of the claims of the present invention.

Claims

1. A method for generating diverse and controllable text based on a diffusion model, characterized in that the method comprises:

S1. Constructing a diverse text generation model based on natural language control, wherein the diverse text generation model includes a natural language prompt encoder, a text reconstruction autoencoder, a control information clustering model and a latent space diffusion model;

S2. In the training phase, given a reference text And include task requirements and control requirements Natural language prompts , and the reference text and natural language prompts Inputting a diversity text generation model to train the diversity text generation model;

S3. In the application phase, given a natural language prompt , and the natural language prompt Inputting the trained diverse text generation model to iteratively generate diverse texts that meet corresponding controls;

Wherein, in the step S2, training the diversity text generation model specifically includes:

S21. Using the reference text Train the text reconstruction autoencoder to use the trained text reconstruction autoencoder to reconstruct the reference text Encode and encode the original text Mapped into a continuous latent space;

S22. Based on the task requirements and natural language prompts The semantic information of , using contrastive learning to train the natural language prompt encoder, so as to use the trained natural language prompt encoder to Encode and obtain the control variable ; Wherein, the natural language prompt In the latent space, it is divided into categories of control information with similar controls;

S23, the control variable Input the control information clustering model and use the EM algorithm to obtain the control variables Corresponding natural language prompts Control information category , to control information from the category The corresponding control information is sampled from the distribution of ;

S24, reconstructing the text into the original text encoding output by the encoder As the hidden state at the initial moment of the latent space diffusion model, a forward diffusion process and a backward reverse diffusion process are constructed, and the original text is encoded step by step in the forward diffusion process Adding standard Gaussian noise, we get Text encoding of the time ; Using the control variable in the backward diffusion process , control information as well as Text encoding of the time The denoising network is trained to iteratively generate diverse texts that meet the corresponding control using the trained denoising network.

2. The method for generating controllable text based on a diffusion model according to claim 1, characterized in that in step S21, the text reconstruction autoencoder after training is used to reconstruct the reference text. Encode and encode the original text Mapped into a continuous latent space, specifically:

(1);

In formula (1): Indicates the original text encoding; Indicates hidden space; Indicates the length of the text sentence; represents the dimension of the latent space; Reconstructing autoencoders for text; Indicates reference text.

3. The method for generating controllable text based on a diffusion model according to claim 1, characterized in that in step S22, the natural language prompt encoder is used to generate the natural language prompt. Encode and obtain the control variable , specifically:

(3);

In formula (3): represents the control variable; Prompt encoder for natural language; Represents a natural language prompt.

4. The method for generating diverse controllable text based on a diffusion model according to claim 1, characterized in that, in the step S22, the loss function used to train the natural language prompt encoder includes a reconstruction loss function and a contrast loss function, that is:

(4);

In formula (4): represents the loss of the natural language cue encoder; represents the reconstruction loss; represents contrast loss; represents the real prompt encoding, that is, the control variable corresponding to the real natural language prompt; represents the reconstructed prompt encoding, that is, the control variable corresponding to the reconstructed natural language prompt; represents the currently selected anchor sample, represents the positive sample selected based on the anchor sample, Indicates the number of samples selected based on the anchor sample. negative samples;

Among them, the reconstruction loss function uses the cross entropy loss function, that is:

(5);

In formula (5), The first dimensions, The first Dimensions;

Use n-pair loss as the contrast loss function, that is:

(6);

In formula (6): represents the exponential function; represents the transpose of the anchor sample encoding; Indicates Negative sample encoding; Represents the positive sample encoding.

5. The method for generating diverse controllable text based on a diffusion model according to claim 1, characterized in that in step S23, the control variable is obtained by using an EM algorithm. Corresponding natural language prompts Control information category , to control information from the category The corresponding control information is sampled from the distribution of , specifically including:

E-step: According to the parameters of the current control information clustering model, calculate the control variable corresponding to each natural language prompt belongs to the The degree of membership of the control information category is:

(7);

In formula (7): Indicates Natural language prompts The corresponding control variable Belong to The degree of membership of a control information category; Indicates Natural language prompts The corresponding control variable Belong to The probability of controlling the information category, That means The mean and variance of the distribution of the control information categories; Indicates the total number of control information categories, Indicates Control information categories; represents the total number of natural language prompts;

Select the control information category with the highest probability as the natural language prompt Control information category ,Right now:

(8);

M-step: Use the membership calculated by E-step to update the parameters of the control information clustering model, namely:

(9);

(10);

In formulas (9) and (10): Indicates The mean of the control information categories; Indicates The variance of the control information category; Indicates that the natural language prompt belongs to The number of control information categories, Indicates the number of natural language prompts. represents the total number of natural language prompts; Indicates that each natural language prompt belongs to The sum of the probabilities of the control information categories; For the matrix The transpose of

Repeat the above E-step and M-step until the convergence condition is reached until, Indicates The parameters of the iterations, Is a positive number.

6. The method for generating controllable text based on a diffusion model according to claim 1, characterized in that in step S24, during the forward diffusion process, the original text encoding Adding standard Gaussian noise, we get Text encoding of the time , specifically:

(14);

In formula (14): represents the distribution of the diffusion process, Represents the original text encoding, that is, the hidden state at the initial moment; express The text encoding of the moment is The hidden state of the moment; represents Gaussian distribution; represents a matrix of all 1s; is a hyperparameter, and , Indicates The noise ratio at each moment, the value range is between 0-1;

In the backward diffusion process, the regression loss is used to denoise the network. For training, there are:

(18);

In formula (18): represents the regression loss; express expectations; Indicates A natural language prompt for the control information category.

7. The method for generating controllable text based on a diffusion model according to claim 1, characterized in that in step S3, in the application stage, a trained denoising network is used Iteratively generate diverse texts that meet control requirements, namely:

(20);

Among them, different variances are used at different times , to control the randomness of the sampling process, that is:

(twenty one);

In formula (21): is a hyperparameter, Represents a real number greater than or equal to 0.

8. A device for implementing a method for generating diverse and controllable text based on a diffusion model, characterized in that the device comprises:

A construction module is used to construct a diverse text generation model based on natural language control, wherein the diverse text generation model includes a natural language prompt encoder, a text reconstruction autoencoder, a control information clustering model, and a latent space diffusion model;

Training module, used based on given reference text And include task requirements and control requirements Natural language prompts , and the reference text and natural language prompts Inputting a diversity text generation model to train the diversity text generation model;

Execution module, used to execute actions based on given natural language prompts , and the natural language prompt Inputting the trained diverse text generation model to iteratively generate diverse texts that meet corresponding controls;

The training of the diversity text generation model specifically includes:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

A memory storing instructions, which, when executed by the at least one processor, enables the at least one processor to execute the diversity controllable text generation method based on the diffusion model as described in any one of claims 1 to 7.

10. A machine-readable storage medium, characterized in that executable instructions are stored on the machine-readable storage medium, and when the instructions are executed, the machine executes the diversity controllable text generation method based on the diffusion model as described in any one of claims 1 to 7.