CN110472242B - Text processing method, device and computer readable storage medium - Google Patents
Text processing method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN110472242B CN110472242B CN201910718851.2A CN201910718851A CN110472242B CN 110472242 B CN110472242 B CN 110472242B CN 201910718851 A CN201910718851 A CN 201910718851A CN 110472242 B CN110472242 B CN 110472242B
- Authority
- CN
- China
- Prior art keywords
- text
- feature information
- predicted
- input
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 210
- 239000012634 fragment Substances 0.000 claims abstract description 100
- 238000012545 processing Methods 0.000 claims abstract description 58
- 238000000605 extraction Methods 0.000 claims abstract description 43
- 230000006870 function Effects 0.000 claims description 28
- 238000000034 method Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 description 24
- 239000013598 vector Substances 0.000 description 21
- 239000011159 matrix material Substances 0.000 description 20
- 230000008569 process Effects 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000003058 natural language processing Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 238000013519 translation Methods 0.000 description 8
- 230000004913 activation Effects 0.000 description 6
- 210000004027 cell Anatomy 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000002364 input neuron Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
The embodiment of the application discloses a text processing method, a text processing device and a computer readable storage medium, wherein the embodiment of the application can determine a first text fragment with correct prediction and a second text fragment with incorrect prediction based on a preset initial model, and the preset initial model comprises an upper half branch network and a lower half branch network; respectively covering a first text segment and a second text segment in the current training text to obtain a first input text and a second input text; performing feature extraction on the first input text based on the upper half branch network to obtain first feature information; predicting the text fragments at each position according to the first characteristic information, the second input text and the lower branch network to obtain a predicted text; converging based on the predicted text and the current training text to obtain a target language model; and predicting the text segment based on the target language model and the text to be processed to obtain the target text. The embodiment of the application can improve the text processing speed.
Description
Technical Field
The present application relates to the technical field of neural networks, and in particular, to a text processing method, a text processing device, and a computer readable storage medium.
Background
In recent years, with the great heat of the neural network technology in the artificial intelligence field, the application of the neural network to a natural language processing system (Natural Language Processing, NLP) has also been developed, and in general, the natural language processing system is built based on a pre-trained language model, and then parameters are fine-tuned according to a specific scene of text processing. The language model needs to cover semantic and grammar characteristics in the text, a large amount of texts need to be used for pre-training the model, each text needs to be trained circularly for a plurality of times to understand the text semantic and grammar, for example, the MASS model needs to be trained for 50 ten thousand steps to obtain the language model, and therefore, a large amount of time is required to obtain a target language model needed for processing the text, so that the text processing speed is low.
Disclosure of Invention
In view of this, embodiments of the present application provide a text processing method, apparatus, and computer readable storage medium, which can increase text processing speed.
In a first aspect, an embodiment of the present application provides a text processing method, including:
based on a preset initial model, carrying out initial prediction on text fragments at each position in a predicted text, and determining a first text fragment with correct prediction and a second text fragment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
covering a first text segment in the current training text to obtain a first input text;
covering a second text segment in the current training text to obtain a second input text;
performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text;
Predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text;
Converging based on the predicted text and the current training text to obtain a target language model;
And predicting the text segment based on the target language model and the text to be processed to obtain a target text.
In one embodiment, predicting text segments at each position in a predicted text based on a preset initial model, determining a first text segment with correct prediction and a second text segment with incorrect prediction includes:
Extracting features of the current training text according to the upper branch network of the preset initial model, and acquiring feature information of the current training text;
Based on the lower branch network of the preset initial model and the characteristic information of the current training text, performing initial text prediction on text fragments at each position in a predicted text to obtain the predicted text;
And determining a first text segment with correct prediction and a second text segment with incorrect prediction in the predicted text according to the predicted text and the current training text.
In an embodiment, before the initial prediction is performed on the text segments at each position in the predicted text based on the preset initial model, determining the first text segment with correct prediction and the second text segment with incorrect prediction, the method further includes:
Acquiring a current training text from a preset text set;
Extracting the characteristics of the current training text based on the upper branch network of a preset initial model, and acquiring the characteristic information of the current training text;
According to the characteristic information, the current training text and the lower branch network of the preset initial model, text prediction is carried out on text fragments at all positions in the predicted text, and the predicted text is obtained;
and converging based on the predicted text and the current training text.
In an embodiment, the converging is performed based on the predicted text and the current training text to obtain a target language model, which includes:
Acquiring cross entropy loss of the predicted text and the current training text according to a preset loss function;
Based on the cross entropy loss, adjusting parameters in the preset initial model to obtain a current preset initial model after the training of the current training text;
And acquiring a target language model based on the current preset initial model.
In an embodiment, based on the cross entropy loss, adjusting parameters in the preset initial model to obtain a preset initial model after training the current training text, including:
If the cross entropy loss does not meet the preset condition, adjusting parameters in the preset initial model;
updating the first input text and the second input text according to the predicted text;
the step of carrying out feature extraction on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text is carried out, until the cross entropy loss meets preset conditions;
and acquiring a current preset initial model after the current training text is trained.
In an embodiment, based on the current preset initial model, obtaining the target language model includes:
deleting the current training text from a preset text set;
returning to the step of acquiring the text from the preset text set, and updating the current training text into the acquired text;
Training the current preset initial model based on the current training text until the texts in the preset text set are trained, and obtaining a target language model.
In an embodiment, feature extraction is performed on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text, including:
Performing position feature extraction and lexical feature extraction on the text fragments of the first input text to obtain semantic feature information corresponding to each text fragment;
performing convolution operation on the semantic feature information, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information;
Weighting operation is carried out on the semantic feature information and the corresponding semantic related information to obtain the local feature information corresponding to the first input text;
and carrying out full-connection operation on the local characteristic information corresponding to the first input text to obtain the first characteristic information.
In an embodiment, predicting text segments at each position in the predicted text according to the first feature information, the second input text and a lower branch network of a preset training model to obtain the predicted text includes:
Extracting features of the second input text to obtain local feature information corresponding to the second input text;
extracting grammar related features of the local feature information corresponding to the second input text and the first feature information to obtain global feature information corresponding to the text segments at each position;
performing full-connection operation on the global feature information to obtain probability distribution information of text fragments at all positions;
Acquiring text fragments of all positions in a predicted text according to the probability distribution information and a preset word list;
and acquiring the predicted text based on the text fragments of the positions.
In an embodiment, the predicting the text segment based on the target language model and the text to be processed to obtain the target text includes:
Acquiring a text to be processed;
Extracting characteristics of the text to be processed based on the upper branch network of the target language model obtained through training to obtain characteristic information of the text to be processed;
And predicting the text fragments at each position in the target text according to the characteristic information of the text to be processed and the lower branch network of the target language model to obtain the target text.
In an embodiment, feature extraction is performed on the text to be processed based on the upper branch network of the target language model obtained through training, so as to obtain feature information of the text to be processed, including:
extracting position features and lexical features of the text fragments of the text to be processed to obtain semantic feature information corresponding to each text fragment;
performing convolution operation on the semantic feature information, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information;
Weighting operation is carried out on the semantic feature information and the corresponding semantic related information to obtain the local feature information corresponding to the text to be processed;
And carrying out full-connection operation on the local characteristic information corresponding to the text to be processed to obtain the characteristic information of the text to be processed.
In an embodiment, predicting text segments at each position in the target text according to the feature information of the text to be processed and the lower branch network of the target language model to obtain the target text includes:
Extracting semantic related features of a text segment at a current position in a target text and the text to be processed based on historical probability distribution information and feature information of the text to be processed to obtain current global feature information corresponding to the text segment at the current position, wherein the historical probability distribution information is probability distribution information corresponding to the text segment at a position before the current position;
Performing full-connection operation on the current global feature information to obtain current probability distribution information of a text segment at a current position;
Acquiring a text segment at the current position in the target text according to the current probability distribution information and a preset word list;
and acquiring the target text based on the text segment at the current position.
In an embodiment, obtaining the target text based on the text segment at the current location includes:
updating the current position to be the position next to the current position;
Returning to execute the characteristic information based on the historical probability distribution information and the text to be processed, extracting the text segment at the current position in the target text and the semantic related characteristics of the text to be processed, and obtaining the current global characteristic information corresponding to the text segment at the current position until the current global characteristic information is a termination characteristic;
and acquiring target text based on the text fragments at all the positions.
In a second aspect, the present application also provides a text processing apparatus, including:
The segment acquisition unit is used for carrying out initial prediction on the text segments at each position in the predicted text based on a preset initial model, and determining a first text segment with correct prediction and a second text segment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
the first text acquisition unit is used for covering the first text segment in the current training text to obtain a first input text;
the second text acquisition unit is used for covering a second text segment in the current training text to obtain a second input text;
The first training unit is used for extracting the characteristics of the first input text based on the upper half branch network to obtain first characteristic information of the first input text;
the second training unit is used for predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text;
the convergence unit is used for converging based on the predicted text and the current training text to obtain a target language model;
And the text processing unit is used for predicting the text fragments based on the target language model and the text to be processed to obtain a target text.
In a third aspect, embodiments of the present application provide a text processing computer readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the text processing method as provided in any of the embodiments above.
According to the embodiment of the application, the text segments at each position in the predicted text can be initially predicted based on the preset initial model, and the first text segment with correct prediction and the second text segment with incorrect prediction are determined; covering a first text segment in the current training text to obtain a first input text; covering a second text segment in the current training text to obtain a second input text; performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text; predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text; converging based on the predicted text and the current training text to obtain a target language model; and predicting the text segment based on the target language model and the text to be processed to obtain a target text. According to the embodiment of the application, the first input text which covers the correct predicted fragment is used for training the upper half branch network, so that the text understanding capability of the first branch network can be improved, the second input text which covers the incorrect predicted fragment is used for training the lower half branch network, the language modeling capability of the lower half branch network can be improved, and a target language model required by text processing can be obtained more quickly, so that the text processing speed can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a text processing method according to an embodiment of the present invention;
Fig. 2a is a schematic flow chart of a text processing method according to an embodiment of the present invention;
FIG. 2b is a schematic flow chart of another text processing method according to an embodiment of the present invention;
fig. 3a is a schematic structural diagram of a text processing device according to an embodiment of the present invention;
Fig. 3b is another schematic structural diagram of a text processing device according to an embodiment of the present invention;
Fig. 4 is a schematic diagram of a network device according to an embodiment of the present application.
FIG. 5a is a schematic diagram of the internal flow of a text processing model provided by the present application;
FIG. 5b is a schematic diagram of a acquaintance reading phase of the preset initial model training provided by the present application;
fig. 5c is a schematic illustration of the recitation phase of the preset initial model training provided by the present application.
FIG. 5d is a schematic diagram of a network architecture for model training provided by the present application;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a text processing method, a text processing device and a computer readable storage medium.
The text processing includes various scenes such as natural language processing scenes of translation, text generation, machine question-answering and the like. Among them, natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
The embodiment of the invention provides a text processing system, which comprises the text processing device provided by any one of the embodiments of the invention, wherein the text processing device can be integrated in a server.
In addition, the text processing system may also include other devices, such as terminals and the like. Wherein, the terminal may include: a cell phone, tablet, notebook or personal computer (PC, personal Computer), etc.
Referring to fig. 1, the text processing system includes a terminal and a server, the terminal and the server being linked through a network. The network comprises network entities such as a router, a gateway and the like.
In an embodiment, the text processing device may be integrated in a terminal, training a preset initial model to obtain a target language model may be performed in a server, when text processing is required, the terminal may download the target language model from the server through a network, and then perform feature extraction and text segment prediction on the text to be processed in the terminal based on the target language model.
In another embodiment, the text processing device may be integrated in a server, training a preset initial model in the server to obtain a target language model, when text processing is required, the server may obtain a text to be processed from a terminal through a network, and then in the server, feature extraction and text segment prediction are performed on the text to be processed.
Referring to fig. 1, in the scheme, initial prediction is performed on text segments at various positions in a predicted text based on a preset initial model, and a first text segment with correct prediction and a second text segment with incorrect prediction are determined; covering a first text segment in the current training text to obtain a first input text; covering a second text segment in the current training text to obtain a second input text; performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text; predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text; converging based on the predicted text and the current training text to obtain a target language model; and predicting the text segment based on the target language model and the text to be processed to obtain a target text.
Therefore, the text understanding capability of the upper branch network can be improved, the language modeling capability of the lower branch network can be improved, and a target language model required by text processing can be obtained more quickly, so that the text processing efficiency can be improved.
The following will describe in detail. The order of the following examples is not limited to the preferred order of the examples.
In this embodiment, description will be made from the viewpoint of a text processing apparatus which can be integrated in a network device such as a terminal or a server.
Referring to fig. 2a, the specific flow of the text processing method may be as follows:
101. and carrying out initial prediction on the text fragments at each position in the predicted text based on a preset initial model, and determining a first text fragment with correct prediction and a second text fragment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network.
The text segment refers to a unit length of text processing, can be expressed as an English word or a Chinese character, and the first text segment refers to a text segment with the same predicted text as the current training text in response; the second text segment refers to a text segment in which the predicted text is different from the current training text.
The preset initial model may be set according to the requirements of practical applications, for example, the preset initial model may include an upper half branch network and a lower half branch network, where the upper half branch network and the lower half branch network include the same structure, but do not share weights.
Specifically, the upper branch network of the preset initial model performs feature extraction on the input text, understands the semantics and grammar of the text, and outputs feature information, and meanwhile, the lower branch network calculates information reflecting probability distribution of text fragments at each position according to the feature information input by the encoder and the current training text input by the decoder, so that the text fragments at each position are predicted, and a training task can be understood as that the upper branch network of the preset initial model can accurately extract text features by optimizing parameters of the preset initial model, and the lower branch network can obtain the input text according to the extracted features.
Taking the preset initial model structure as a transducer model structure as an example, as shown in fig. 5d, the structure may include an encoding module and a decoding module, where the encoding module is configured to perform semantic and grammar feature extraction on the current training text based on an upper branch network of the preset initial model to obtain feature information of the current training text, and the decoding module is configured to obtain a predicted text according to the semantic feature information and the current training text, and the lower branch network of the preset initial model is configured to obtain the predicted text.
In an embodiment, the initial prediction is performed on the text segments at each position in the predicted text based on a preset initial model, and the determination of the first text segment with correct prediction and the second text segment with incorrect prediction may specifically include the following steps:
Extracting features of the current training text according to the upper branch network of the preset initial model, and acquiring feature information of the current training text;
Based on the lower branch network of the preset initial model and the characteristic information of the current training text, performing initial text prediction on text fragments at each position in a predicted text to obtain the predicted text;
And determining a first text segment with correct prediction and a second text segment with incorrect prediction in the predicted text according to the predicted text and the current training text.
The method comprises the steps of checking the ability of a preset initial model to understand texts and language modeling, obtaining a first text segment with correct prediction text and a second text segment with incorrect prediction, and training the preset initial model in a targeted manner according to the first text segment with correct prediction text and the second text segment with incorrect prediction text so as to obtain a target language model as soon as possible.
In an embodiment, in performing initial prediction on text segments at each position in a predicted text based on a preset initial model, determining a first text segment with correct prediction and a second text segment with incorrect prediction, performing preliminary training on the preset initial model based on a preset text set may specifically include the following steps:
Acquiring a current training text from a preset text set;
extracting semantic and grammar characteristics of the current training text based on a first branch network of a preset initial model, and acquiring characteristic information of the current training text;
acquiring a predicted text according to the semantic feature information and the current training text, wherein the lower branch network of the initial model is preset;
and converging based on the predicted text and the current training text.
The current training text is obtained from a preset text set, and the preset text set can be downloaded from the Internet through network connection or can be stored in a memory of a local server.
The training is to perform preliminary training on a preset initial model through a preset task to obtain a set of model parameters, initialize the preset initial model through the set of parameters, and perform targeted training according to a current training text to obtain a target language model. The purpose of training is to let the language model master the semantic and grammatical features in the text. In this embodiment, the process of performing preliminary training on the preset initial model may be referred to as a perusal stage, and the process of performing targeted training according to the current training text to obtain the target language model may be referred to as a recitation stage.
The process of determining the first text segment with correct prediction and the second text segment with incorrect prediction can be understood as checking the text understanding capability of the preset initial model after training in the acquaintance mode, and performing targeted training according to the checked result.
Wherein the target language model is a model that has been trained to have knowledge of the ability to understand text. The Language Model is a probability distribution of a text segment. In particular, the language model is operative to determine a probability distribution P for a length m of text, indicating the likelihood that the segment of text exists. In order to make the target language model more adaptive to the application scene, the target language model can be trained by using the text to be processed before formally using the target language model for text processing, and the parameters of the target language model can be fine-tuned in the training process.
Taking the preset initial model structure as a transducer model structure as an example, as shown in fig. 5d, the structure may include an encoding module and a decoding module, where the encoding module is configured to perform semantic and grammatical feature extraction on the current training text based on an upper branch network of the preset initial model to obtain feature information of the current training text, and the decoding module is configured to obtain a predicted text according to the semantic feature information and the current training text, and the lower branch network of the preset initial model is specifically as follows:
In an embodiment, for improving the performance, referring to fig. 5b and fig. 5c, the upper branch network is formed by stacking a plurality of coding modules (stack) with the same structure, and the lower branch network is formed by stacking a plurality of decoding modules with the same structure. The input of each coding module is the output of the next coding module, the input of the lowest coding module is the current training text, the lower branch network is similar, and the output of the last coding module is input to each decoding module.
In the following description, one encoding module and one decoding module are taken as examples, except for explicit descriptions.
In an embodiment, before the lowest coding module is input, the upper half branch network may divide the input current training text to obtain a plurality of text segments, and the upper half branch network further includes a word embedded coding layer and a position coding layer, which are used for mapping the input current training text into a feature matrix.
In one embodiment, the Word embedding encoding layer includes a preset Word embedding encoding algorithm, the position encoding layer includes a preset position encoding algorithm, an input text is converted into a Word embedding vector by the preset Word embedding algorithm (such as Word2Vec, etc.), and in order to enable a preset initial model to understand the sequence of text fragments in the text, the text fragments are encoded by using the preset position encoding algorithm, so as to obtain a position encoding vector. And then vector addition is carried out on the word embedding vector and the position coding vector, and lexical features and position features are fused to obtain a feature matrix.
The position coding can be implemented by the following formula:
PE(pos,2i)=sin(pos/100002i/d)
PE(pos,2i+1)=sin(pos/100002i/d)
Where pos represents a text segment at a location and i represents dimension i. Using sin and cos to encode position information allows the encoding between text segments at two positions to be linearly represented by each other, i.e., the relative positions of the text segments can be included. Thus, given the location of a text segment, we can encode it as a vector of dimension d.
In one embodiment, each encoding module includes a self-attention layer and a position fully connected feedforward neural network layer; each decoding module comprises a self-attention layer, an attention layer and a position full-connection feedforward neural network layer, and the output of the last encoding module can be input into the second self-attention layer of each decoding module, specifically as follows:
Self-attention layer: the understanding of all relevant text fragments can be integrated into the text fragment currently being processed through convolution operation, so that word dependency relations inside sentences are learned, and the internal structure of the sentences is captured. Specifically, the feature matrix may be multiplied by a weight matrix whose parameters are not shared to obtain three matrices, which may be respectively denoted as q, k, and v, and in an embodiment, the similarity between two matrices (such as q and k) may be used to represent the weight of another matrix (such as v), where the weight represents the semantic correlation degree and the importance degree between the current text segment and other text segments, and the weight may be multiplied by another matrix to obtain the local feature information of each text segment. Among them, the common similarity functions are dot product, splice, perceptron, etc.
Wherein, when the initial model is preset for the first training, the weight matrix is randomly generated, and the weight matrix can be optimized by multiple training.
In an embodiment, in order to enable the model to obtain relevant information among text segments in different subspaces, the feature matrix may be projected through h different linear transformations, that is, convolution operation and weighting operation are performed on the feature matrix by using preset attention functions with h parameters not shared, and finally, different attention function output results are spliced together to obtain a local feature matrix, and the local feature matrix is output to the position fully-connected feedforward neural network. The self-attention layer, in which h different linear transformation steps are added, can be defined as a multi-head self-attention layer, which can focus on the association of the current text segment with other text segments in the sentence from different levels.
In one embodiment, the similarity between q and k is calculated by dot product, and the predetermined attention function can be expressed by the following formula:
Attention(q,k,v)=softmax(qk/√d)v
Wherein softmax is an activation function, q, k and v respectively represent three matrixes obtained by projection of a feature matrix, the dot product of q and k is used for representing the similarity of q and k, and the obtained similarity is divided by ≡d to play a role in adjusting so that the inner product is not too large, so to speak, so as to realize normalization.
And extracting semantic and grammar characteristics of the current training text based on a first branch network of a preset initial model, and acquiring characteristic information of the current training text, namely, the current training text is realized by the self-attention layer.
Position full connection feedforward neural network layer: the location-fully-connected feedforward neural network may include a plurality, in particular, the same number as the number of text segments, each feedforward neural network processing one local feature matrix. Each node of the position full-connection feedforward neural network layer is connected with all nodes output by the upper layer (such as the self-attention layer), wherein one node of the position full-connection feedforward neural network layer is called one neuron in the position full-connection feedforward neural network layer, and the number of the neurons in the position full-connection feedforward neural network layer can be determined according to the actual application requirement. Each feedforward neural network at least comprises two layers of linear activation functions, and in order to improve the expression capability of the model, one layer of activation function can be added to add nonlinear factors, in the embodiment of the invention, the activation functions are 'relu (linear rectification function, RECTIFIED LINEAR Unit)', if the output of the self-attention layer is denoted as Z, the output of the word embedding coding layer is denoted as x, the processing of the position fully connected feedforward neural network is expressed as follows:
FFN(x)=max(0,xW1+b1)W2+b2
Where b1, b2 represent the offset number and W1 and W2 represent the weight matrix. The values of b1, b2, W1 and W2 can be continuously adjusted in the process of multiple training, so that the coding module can extract the characteristics of the current training text more accurately, and the predicted text is more similar to the current training text.
Wherein the decoding module further comprises an attention layer having a structure substantially identical to the internal algorithm and the self-attention layer, but not sharing weights. The attention layer acquires local feature information output by a self-attention layer of the decoding module, acquires the weight of each local feature information by using first feature information output by the encoding module, performs weighting operation on each local feature information according to the weight to obtain global feature information, and outputs the global feature information to a position full-connection feedforward neural network layer to perform full-connection operation to obtain probability distribution information.
The upper half branch network further comprises two normalization full-connection layers which are respectively arranged behind the self-attention layer and are fully connected with the feedforward neural network layer, the normalization full-connection layers comprise a hierarchical normalization function, the hierarchical normalization function can greatly reduce the covariance deviation problem by correcting the mean value and the variance of the activation value in each layer, and in order to prevent excessive attenuation when the counter-propagating gradient signal is transmitted to a lower layer, the input and the output of the self-attention layer need to be added before the hierarchical normalization function is input.
In one embodiment, the hierarchical normalization function is used to comprehensively consider the input of all dimensions of a layer, calculate the average input value and the input variance of the layer, and then convert the input of each dimension by using the same normalization operation, which can be specifically expressed by the following formula:
Where i enumerates all of the input neurons of the layer. In the standard formula, four large parameters mu, sigma, g and b are scalar, and all inputs share one normalized transformation.
After the characteristic extraction of the input current training text by the upper branch network is completed, the lower branch network starts to conduct text prediction according to the characteristic information output by the upper branch network and the input current training text of the lower branch network.
Because the parameters in the preset initial model are randomly initialized during the first training, the capability of the upper branch network for extracting the characteristics is insufficient, and the current training text needs to be input to the lower branch network to help the lower branch network to predict the text segment.
In one embodiment, the parameters of the pre-set initial model may be optimized based on a back propagation algorithm (Backpropagation algorithm, BP), and the above operations may be repeated a plurality of times to continuously optimize the parameters of the pre-set initial model.
In one embodiment, a back-propagation algorithm may be used to define an error er (often a certain norm between the output result and the expected result) and then find the weight vector that satisfies the minimum error. If the error is considered as a continuous function (functional), finding the bias of 0 for each component of the weight vector may be considered as converging, but in reality it is discrete, so we need to use iteration (i.e. multiple training) to find the minimum gradient. Convergence may also be demonstrated when the number of iterations is large enough that the weight vector is equal to a certain solution.
The upper branch network of the preset initial model obtained after training in the acquaintance reading stage has the capability of accurately extracting text features after training for many times, and the lower branch network has the capability of predicting text fragments according to the extracted features. And then determining a first text segment with correct prediction and a second text segment with incorrect prediction, checking the text understanding capability of a preset initial model after training in the acquaintance mode, and carrying out targeted training according to the checked result.
102. And covering the first text segment in the current training text to obtain a first input text.
The first input text is obtained by covering the text fragments predicted to be correct in the current training text according to the last training result, so that the text understanding capability of the upper half branch network can be improved.
103. And covering the second text segment in the current training text to obtain a second input text.
The second input text is obtained by covering the text segment with the wrong prediction in the current training text according to the last training result, so that the language modeling capability of the lower branch network according to the feature information can be improved.
104. And extracting the characteristics of the first input text based on the upper half branch network to obtain first characteristic information of the first input text.
Wherein the first characteristic information is an abstract operational representation of the first input text.
In an embodiment, feature extraction is performed on the first input text based on the upper half branch network to obtain first feature information of the first input text, including:
Performing position feature extraction and lexical feature extraction on the text fragments of the first input text to obtain semantic feature information corresponding to each text fragment;
performing convolution operation on the semantic feature information, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information;
Weighting operation is carried out on the semantic feature information and the corresponding semantic related information to obtain the corresponding local feature information of the first input text;
And carrying out full-connection operation on the local feature information corresponding to the first input text to obtain first feature information.
For the structure of the upper branch network, refer to the above embodiment of the preset initial model, and will not be described again.
And carrying out convolution operation on the semantic feature information by using a self-attention layer, extracting semantic related features between the current semantic feature information and other semantic feature information, and carrying out weighting operation on the semantic feature information and the corresponding semantic related information to obtain the corresponding local feature information of the first input text.
And performing full-connection operation on the local characteristic information corresponding to the first input text by using a position full-connection feedforward neural network to obtain first characteristic information.
If the upper branch network includes a plurality of coding modules connected in series, referring to fig. 5b, 5c and 5d, the operation of the first encoder is identical to that described above, except that the self-attention layer inputs of the other coding modules are all the outputs of the last encoder.
The text prediction in the recitation stage is actually that a decoding module is trained to splice the first input text and the second input text according to the requirements of semantic continuity and grammar structure according to the obtained text understanding capability, so as to obtain a predicted text.
105. Predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and a lower branch network of a preset training model to obtain the predicted text, wherein the method comprises the following steps:
Extracting features of the second input text to obtain local feature information corresponding to the second input text;
Extracting grammar related features of the local feature information corresponding to the second input text and the first feature information to obtain global feature information corresponding to text fragments at all positions;
Performing full-connection operation on the global feature information to obtain probability distribution information of text fragments at all positions;
acquiring text fragments of each position in a predicted text according to the probability distribution information and a preset word list;
And acquiring the predicted text based on the text fragments at the positions.
The global feature information may be represented as a feature matrix including semantic features and grammatical features of the first input text and the second input text, and associated features of the first input text and the second input text.
If the upper branch network includes a plurality of decoding modules connected in series, referring to fig. 5b, 5c and 5d, the operation of the first decoder is identical to that described above, except that the self-attention layer inputs of the other decoding modules are all the outputs of the last decoder.
The structure of the lower branch network is the same as that of the previous embodiment about the preset initial model, and will not be described again.
The self-attention layer of the lower branch network is used for extracting the characteristics of the second input text and acquiring local characteristic information corresponding to the second input text; the specific step of obtaining the local feature information is referred to the previous embodiment about the preset initial model, and will not be described again.
The attention layer is used for extracting grammar related features of the local feature information corresponding to the second input text and the first feature information to obtain global feature information corresponding to the text fragments at each position.
And the position full-connection feedforward neural network layer is used for carrying out full-connection operation on the global characteristic information.
In an embodiment, the decoding module further comprises a linear transformation layer and a softmax full-connection layer, and probability distribution information of the text segment at the current position is obtained according to the output of the decoding module.
In an embodiment, the output of the last coding module passes through a full-connection layer with an activation function of softmax to obtain probability distribution information of the text segment at the current position.
In one embodiment, the last decoding module outputs a real vector. In the lower half of the branching network, a linear transformation layer is connected between the last decoding modules, and the linear transformation layer is a simple fully-connected neural network, which can project the real number vector generated by the decoding modules into a vector which is much larger than the vector and is called logarithmic probability (logits). It is not a matter of course to assume that our model learns ten thousand different text segments from a preset text set (the "preset vocabulary" of our model). The logarithmic probability vector is therefore a vector of ten thousand cells in length—each cell corresponds to a score of a certain text segment. The next Softmax full connectivity layer may turn the score into a probability (both positive, upper limit 1.0). The cell with the highest probability is selected and its corresponding text segment is taken as the text segment at the current location.
Therefore, when each text segment is generated, the decoding module calculates according to the most relevant information of the encoding module, so that the generated text semantic is coherent and reasonable.
106. And converging the predicted text and the current training text based on the predicted text to obtain a target language model.
In an embodiment, based on the convergence of the predicted text and the current training text, the obtaining the target language model may specifically include the following steps:
Acquiring cross entropy loss of the predicted text and the current training text according to a preset loss function;
Based on the cross entropy loss, adjusting parameters in the preset initial model to obtain a current preset initial model after the training of the current training text;
And acquiring a target language model based on the current preset initial model.
Wherein the cross entropy loss is a parameter for measuring the similarity between the predicted text and the current training text, and in an embodiment, the closer the cross entropy loss is to 0, the closer the predicted text is to the target training function.
In an embodiment, based on the cross entropy loss, parameters in the preset initial model are adjusted to obtain a preset initial model after the training of the current training text, which specifically includes the following steps:
If the cross entropy loss does not meet the preset condition, adjusting parameters in the preset initial model;
updating the first input text and the second input text according to the predicted text;
the step of carrying out feature extraction on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text is carried out, until the cross entropy loss meets preset conditions;
and acquiring a current preset initial model after the current training text is trained.
In an embodiment, if the cross entropy loss does not meet a preset condition, adjusting parameters in the preset initial model may include: obtaining partial derivatives of the cross entropy on each parameter according to a preset back propagation algorithm;
And updating parameters in the preset training model according to the partial derivative.
In an embodiment, based on the current preset initial model, the method may specifically include the following steps:
deleting the current training text from a preset text set;
returning to the step of acquiring the text from the preset text set, and updating the current training text into the acquired text;
Training the current preset initial model based on the current training text until the texts in the preset text set are trained, and obtaining a target language model.
From the above, it can be seen that, in the embodiment of the present invention, the first input text covering the predicted correct segment is used to train the upper half branch network, so that the text understanding capability of the first branch network can be improved, and the second input text covering the predicted incorrect segment is used to train the lower half branch network, so that the language modeling capability of the lower half branch network can be improved, and thus, the text processing efficiency can be improved.
107. And predicting the text segment based on the target language model and the text to be processed to obtain a target text.
In an embodiment, the predicting the text segment based on the target language model and the text to be processed to obtain the target text may specifically include the following steps:
Acquiring a text to be processed;
Extracting characteristics of the text to be processed based on the upper branch network of the target language model obtained through training to obtain characteristic information of the text to be processed;
And predicting the text fragments at each position in the target text according to the characteristic information of the text to be processed and the lower branch network of the target language model to obtain the target text.
The text to be processed can be acquired through the text acquisition instruction, an instruction interface can be arranged on a terminal interface used by a user for conveniently triggering the text acquisition instruction, and the instruction interface can be in various forms such as an input box, a selection box, a button, an icon and the like.
The text to be processed may also take various forms, for example, in a machine translation scenario, a sentence to be translated may be obtained as the text to be processed, in a summary generation scenario, an article may be obtained as the text to be processed, and in a machine question-answering scenario, a question input by a user may be obtained as the text to be processed.
The feature information of the text to be processed is an abstract operation representation of the text to be processed, and can be understood as a hidden layer state which contains semantic and grammar structure information of the text to be processed through the upper half branch network.
The upper half branch network of the target language model is obtained by training the upper half branch network of the preset initial model, the upper half branch network comprises a word embedded coding layer, a position feature coding layer and a plurality of coding modules connected in series, each coding model module comprises a self-attention layer and a full-connection feedforward neural network layer, and each decoding module comprises a self-attention layer, an attention layer and a full-connection feedforward neural network layer.
In an embodiment, the feature extraction is performed on the text to be processed by the upper branch network based on the target language model obtained by training, so as to obtain feature information of the text to be processed, which can be realized specifically by the following steps:
Performing position feature extraction and lexical feature extraction on the text fragments of the text to be processed through a word embedding coding layer and a position feature coding layer to obtain semantic feature information corresponding to each text fragment;
Carrying out convolution operation on the semantic feature information through a self-attention layer, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information; weighting operation is carried out on the semantic feature information and the corresponding semantic related information, so that the local feature information corresponding to the text to be processed is obtained;
And carrying out full-connection operation on the local characteristic information corresponding to the text to be processed through a full-connection feedforward neural network layer to obtain the characteristic information of the text to be processed.
The specific operation method of each layer is referred to in the previous embodiments, and will not be described again.
The target text is a text obtained by processing the text to be processed by the target language model, and can be expressed in different forms under different application scenes, for example, can be expressed as a sentence of another language with the same meaning as that of the sentence to be translated obtained after translation under a translation scene, can be expressed as a abstract representing the basic content of the article to be processed under an abstract generating scene, and can be expressed as an answer to the question to be processed under a machine question-answering scene.
In an embodiment, the predicting the text segment at each position in the target text according to the feature information of the text to be processed and the lower branch network of the target language model to obtain the target text may be implemented by the following steps:
Extracting features of the historical probability distribution information through the self-attention layer to obtain a historical associated feature vector containing semantic correlation relations between text fragments before the current position;
Extracting semantic related features of a text segment at a current position in a target text and the text to be processed by an attention layer based on the historical related feature vector and feature information of the text to be processed, and obtaining current global feature information corresponding to the text segment at the current position;
Performing full-connection operation on the current global feature information through a full-connection feedforward neural network layer and a Softmax full-connection layer to obtain current probability distribution information of a text segment at a current position;
acquiring a text segment at the current position in a target text according to the current probability distribution information and a preset word list;
updating the current position to be the position next to the current position;
Returning to the step of executing the probability distribution information corresponding to the text segment before the current position is extracted and the feature information semantic related features of the text to be processed to obtain the current global feature information corresponding to the text segment at the current position until the current global feature information is a termination feature;
And acquiring target text based on the text fragments of the positions.
In one embodiment, the text segments may be combined into the target text in a positional order.
The historical probability distribution information is probability distribution information corresponding to a text segment at a position before the current position.
The specific operation method of each layer is referred to the previous embodiments, and will not be described again.
The termination feature is a termination symbol generated by the decoding module after the target text is generated, and correspondingly, a start symbol of a text segment of the first target text is generated by the decoding module.
The target language model comprises an upper half branch network and a lower half branch network, wherein the upper half branch network is formed by stacking a plurality of coding modules (stack) with the same structure, and the lower half branch network is formed by stacking a plurality of decoding modules with the same structure. The input of each coding module is the output of the next coding module, the input of the lowest coding module is the text to be processed, the lower branch network is similar, but the output of the last coding module is input to the decoding module, the output of the last decoding module can be input to the decoding module, the coding modules can output the characteristic information corresponding to all text fragments in the text to be processed at the same time, and the decoding module can predict the text fragments at the next position according to the output of the decoding module and the output of the last coding module. The text processing process, i.e. each encoding module (Encoder) encodes a text fragment of the text to be processed into a feature vector list containing a plurality of feature vectors, and then a decoding module (Decoder) extracts the semantic and grammatical structural information hidden in the encoded feature vector list by means of an Attention mechanism (Attention) to autoregressively generate the predicted text Y.
The following details the Attention (Attention) mechanism based on the translation scenario:
The encoding module receives the sentence to be translated word by word, integrates the information in the sentence to be translated and generates a piece of local feature information with the help of the attention function, and the decoder generates a sentence of another language after translation word by word based on the local feature information. The prediction of each word at the current location is based on the word from the last translation. Referring to fig. 5a, the specific structure is as follows:
Specifically, attention weight a ij may be introduced here to measure the correlation between hidden layer state h j generated by the word at the j position in the sentence to be translated in the encoder and hidden layer state S i generated by the word at the i position in the translated sentence in the decoder, attention weight a ij may be represented by the similarity of two matrices obtained after transformation of hidden layer state h j, transformation of hidden layer state h j may be implemented by multiplying hidden layer state h j by a weight matrix not shared by two parameters, and finally, the attention weight a ij corresponding to the word at the i position output by the decoding module corresponds to the weighted sum of hidden layer state S i-1 corresponding to the word at the i-1 position of the translated sentence. Or the word at the i-th position output by the final decoding module may also be from the weighted sum of hidden layer states S i-1…S0 corresponding to the word before the i-th position of the translated sentence.
In the pre-training model of the embodiment of the application, the weight matrix represents a matrix required for calculating the q, k and v vectors, and the weight matrix and the weight parameters in the position feed-forward network are obtained by continuous optimization in the training process.
From the above, the embodiment of the application can train the preset initial model based on the preset text set to obtain the target language model; acquiring a text to be processed; extracting characteristics of the text to be processed based on the upper branch network of the target language model obtained through training to obtain characteristic information of the text to be processed; and predicting the text fragments at each position in the target text according to the characteristic information of the text to be processed and the lower branch network of the target language model to obtain the target text. And further, the text processing speed can be improved.
The method described in the previous examples is described in further detail below by way of example.
Referring to fig. 2b, in the present embodiment, description will be made in that the text processing apparatus may be integrated in a server in particular.
The embodiment of the application also provides another text processing method, which comprises the following specific processes:
201. the server determines a first text segment with correct prediction and a second text segment with incorrect prediction based on a preset initial model and a current training text.
In an embodiment, the server needs to train the preset initial model for a preset number of times based on the current training text, continuously optimizes parameters of the preset initial model, and then inputs the current training text into the upper branch network of the preset initial model to obtain the predicted text.
Specifically, the server may perform feature extraction on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text, and predict text segments at each position in the predicted text according to the first feature information and the second input text and the lower half branch network in the preset training model to obtain the predicted text. The specific steps are referred to the previous embodiments, and will not be repeated.
The predicted text is then compared to the current training text to determine a first text segment that predicts correctly and a second text segment that predicts incorrectly.
202. And the server acquires a first input text and a second input text according to the first text fragment and the second text fragment.
The server can acquire a text covering instruction, and cover the first text segment in the current training text according to the instruction to obtain a first input text; and covering the second text segment in the current training text to obtain a second input text.
The specific steps are referred to the previous embodiments, and will not be repeated.
203. The server trains the initial language model based on the first input text and the second input text.
The specific steps are referred to the previous embodiments, and will not be repeated.
204. And the server converges based on the predicted text and the current training text to obtain a target language model.
In an embodiment, the training is repeated, and parameters of the preset initial model are optimized, so that the preset initial model can better understand the current training text.
In one embodiment, when the cross entropy loss meets a preset value, the preset initial model may be considered to have understood the current training text.
Selecting the next current training text from the preset text set, respectively inputting the current training text into the upper branch network and the lower branch network of the current preset initial model, repeating the training until the preset initial model understands all the texts in the preset sample set, wherein the number of the preset sample sets is large, and when the preset initial model understands all the texts in the preset sample set, the model can be considered to cover necessary semantic and grammar characteristics and can be used as a target language model.
The specific steps are referred to the previous embodiments, and will not be repeated.
205. And the server predicts the text segment based on the target language model and the text to be processed to obtain a target text.
The server may obtain the text to be processed from the terminal through the network link, and then process the text to be processed based on the trained target language model to obtain the target text, where specific steps refer to the above embodiments, and are not repeated.
As can be seen from the above, the embodiment of the present application may perform initial prediction on the text segments at each position in the predicted text based on the preset initial model, and determine the first text segment with correct prediction and the second text segment with incorrect prediction, where the preset initial model includes an upper branch network and a lower branch network; covering a first text segment in the current training text to obtain a first input text; covering a second text segment in the current training text to obtain a second input text; performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text; predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text; converging based on the predicted text and the current training text to obtain a target language model; and predicting the text segment based on the target language model and the text to be processed to obtain a target text.
From the above, it can be seen that, in the embodiment of the present invention, the first input text covering the predicted correct segment is used to train the upper half branch network, so that the text understanding capability of the first branch network can be improved, the second input text covering the predicted incorrect segment is used to train the lower half branch network, so that the language modeling capability of the lower half branch network can be improved, and the target language model required by text processing can be obtained more quickly, thereby improving the text processing speed. .
The apparatus and methods described herein are preferably implemented in software, but may of course also be implemented in hardware, all within the scope of the application.
In order to better implement the above method, the embodiment of the present application further provides a text processing apparatus, where the text processing apparatus may be specifically integrated into a network device, such as a terminal or a server, and each will be described in detail below.
For example, fig. 3a is a schematic structural diagram of a text processing device according to an embodiment of the present application, where the structure of the text processing device includes a segment obtaining unit 301, a first text obtaining unit 302, a second text obtaining unit 303, a first training unit 304, a second training unit 305, a convergence unit 306, and a text processing unit 307, as follows:
The segment obtaining unit 301 is configured to perform initial prediction on the text segments at each position in the predicted text based on a preset initial model, and determine a first text segment with correct prediction and a second text segment with incorrect prediction, where the preset initial model includes an upper branch network and a lower branch network.
In some embodiments, the fragment acquisition unit 301 may be specifically configured to:
Acquiring a current training text from a preset text set;
Extracting features of the current training text according to the upper branch network of the preset initial model, and acquiring feature information of the current training text;
Based on the lower branch network of the preset initial model and the characteristic information of the current training text, text prediction is carried out on text fragments at all positions in a predicted text, so that the predicted text is obtained;
And acquiring a first text fragment with correct prediction and a second text fragment with incorrect prediction in the prediction text according to the prediction text and the current training text.
And (II) a first text obtaining unit 302, configured to mask a first text segment in the current training text, so as to obtain a first input text.
And (III) a second text obtaining unit 303, configured to mask a second text segment in the current training text, so as to obtain a second input text.
And fourth, a first training unit 304, configured to perform feature extraction on the first input text based on the upper half branch network in the preset initial model, so as to obtain first feature information of the first input text.
In some embodiments, the first training unit 304 may be specifically configured to:
Performing position feature extraction and lexical feature extraction on the text fragments of the first input text to obtain semantic feature information corresponding to each text fragment;
performing convolution operation on the semantic feature information, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information;
Weighting operation is carried out on the semantic feature information and the corresponding semantic related information to obtain the corresponding local feature information of the first input text;
And carrying out full-connection operation on the local feature information corresponding to the first input text to obtain first feature information.
And (fifth) a second training unit 305, configured to predict text segments at each position in the predicted text according to the first feature information and the second input text, and a lower branch network of a preset training model, so as to obtain the predicted text.
In some embodiments, the second training unit 305 may be specifically configured to:
Extracting features of the second input text to obtain local feature information corresponding to the second input text;
Extracting grammar related features of the local feature information corresponding to the second input text and the first feature information to obtain global feature information corresponding to text fragments at all positions;
Performing full-connection operation on the global feature information to obtain probability distribution information of text fragments at all positions;
acquiring text fragments of each position in a predicted text according to the probability distribution information and a preset word list;
and acquiring the predicted text based on the text fragments of the positions.
And (six) a convergence unit 306, configured to converge the predicted text and the current training text, so as to obtain a target language model.
In some embodiments, referring to fig. 3b, the convergence unit 306 may include a calculation unit 3061, an optimization unit 3062, and a return unit 3063.
The calculating subunit 3061 is configured to obtain a cross entropy loss of the predicted text and the current training text according to a preset loss function.
And the optimizing subunit 3062 is configured to adjust parameters in the preset initial model based on the cross entropy loss, so as to obtain a preset initial model after the training of the current training text.
In some embodiments, the optimization subunit 3062 may be specifically configured to:
If the cross entropy loss does not meet the preset condition, adjusting parameters in the preset initial model;
updating the first input text and the second input text according to the predicted text;
the step of carrying out feature extraction on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text is carried out, until the cross entropy loss meets preset conditions;
and acquiring a current preset initial model after the current training text is trained.
And a returning subunit 3063, configured to obtain a target language model based on the current preset initial model.
In some embodiments, return subunit 3063 may be specifically configured to:
deleting the current training text from a preset text set;
returning to the step of acquiring the text from the preset text set, and updating the current training text into the acquired text;
Training the current preset initial model based on the current training text until the texts in the preset text set are trained, and obtaining a target language model.
And (seventh) a text processing unit 307, configured to perform text segment prediction based on the target language model and the text to be processed, so as to obtain a target text.
In some embodiments, the text processing unit 307 further includes a text acquisition subunit, a feature extraction subunit, and a text prediction subunit, as follows:
and the text acquisition subunit is used for acquiring the text to be processed.
And the feature extraction subunit is used for carrying out feature extraction on the text to be processed based on the upper half branch network of the target language model obtained through training, so as to obtain feature information of the text to be processed.
And the text prediction subunit is used for predicting the text fragments at each position in the target text according to the characteristic information of the text to be processed and the lower branch network of the target language model to obtain the target text.
In some embodiments, the feature extraction subunit may be specifically configured to:
extracting position features and lexical features of the text fragments of the text to be processed to obtain semantic feature information corresponding to each text fragment;
performing convolution operation on the semantic feature information, extracting semantic related features between the current semantic feature information and other semantic feature information, and obtaining semantic related information corresponding to each semantic feature information;
Weighting operation is carried out on the semantic feature information and the corresponding semantic related information to obtain the local feature information corresponding to the text to be processed;
And carrying out full-connection operation on the local characteristic information corresponding to the text to be processed to obtain the characteristic information of the text to be processed.
In some embodiments, the text prediction subunit may be specifically configured to:
Extracting semantic related features of a text segment at a current position in a target text and the text to be processed based on historical probability distribution information and feature information of the text to be processed to obtain current global feature information corresponding to the text segment at the current position, wherein the historical probability distribution information is probability distribution information corresponding to the text segment at a position before the current position;
Performing full-connection operation on the current global feature information to obtain current probability distribution information of a text segment at a current position;
Acquiring a text segment at the current position in the target text according to the current probability distribution information and a preset word list;
updating the current position to be the position next to the current position;
Returning to execute the characteristic information based on the historical probability distribution information and the text to be processed, extracting the text segment at the current position in the target text and the semantic related characteristics of the text to be processed, and obtaining the current global characteristic information corresponding to the text segment at the current position until the current global characteristic information is a termination characteristic;
And acquiring target text based on the text fragments of the positions.
As can be seen from the above, the embodiment of the present application may perform initial prediction on the text segments at each position in the predicted text based on the preset initial model, and determine the first text segment with correct prediction and the second text segment with incorrect prediction, where the preset initial model includes an upper branch network and a lower branch network; covering a first text segment in the current training text to obtain a first input text; covering a second text segment in the current training text to obtain a second input text; performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text; predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text; converging based on the predicted text and the current training text to obtain a target language model; and predicting the text segment based on the target language model and the text to be processed to obtain a target text. According to the embodiment of the application, the first input text which covers the correct predicted fragment is used for training the upper half branch network, so that the text understanding capability of the first branch network can be improved, the second input text which covers the incorrect predicted fragment is used for training the lower half branch network, the language modeling capability of the lower half branch network can be improved, and a target language model required by text processing can be obtained more quickly, so that the text processing speed can be improved.
The embodiment of the invention also provides a network device which can be a server or a terminal and the like, and integrates any text processing device provided by the embodiment of the invention. As shown in fig. 4, a schematic structural diagram of a network device according to an embodiment of the present invention is shown, specifically:
The network device may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, a power supply 403, and an input unit 404, among other components. Those skilled in the art will appreciate that the network device structure shown in fig. 4 is not limiting of the network device and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components. Wherein:
The processor 401 is a control center of the network device, connects various parts of the entire network device using various interfaces and lines, and performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall detection of the network device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating computer readable storage medium, user interface and application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for operating the computer-readable storage medium, at least one function, and the like; the storage data area may store data created according to the use of the network device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
The network device further includes a power supply 403 for powering the various components, preferably, the power supply 403 may be logically connected to the processor 401 by a power management computer-readable storage medium, so that functions such as charge, discharge, and power consumption management may be performed by the power management computer-readable storage medium. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a rechargeable computer readable storage medium, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The network device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the network device may further include a display unit or the like, which is not described herein. In this embodiment, the processor 401 in the network device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:
based on a preset initial model, carrying out initial prediction on text fragments at each position in a predicted text, and determining a first text fragment with correct prediction and a second text fragment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
covering a first text segment in the current training text to obtain a first input text;
covering a second text segment in the current training text to obtain a second input text;
performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text;
Predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text;
Converging based on the predicted text and the current training text to obtain a target language model;
And predicting the text segment based on the target language model and the text to be processed to obtain a target text.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored on a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a text processing computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the text processing methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:
based on a preset initial model, carrying out initial prediction on text fragments at each position in a predicted text, and determining a first text fragment with correct prediction and a second text fragment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
covering a first text segment in the current training text to obtain a first input text;
covering a second text segment in the current training text to obtain a second input text;
performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text;
Predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text;
Converging based on the predicted text and the current training text to obtain a target language model;
And predicting the text segment based on the target language model and the text to be processed to obtain a target text.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: a Read Only Memory (ROM) disk or optical disk, etc.
Because the instructions stored in the computer readable storage medium may execute the steps in any text processing method provided by the embodiments of the present application, the beneficial effects that any text processing method provided by the embodiments of the present application can achieve are detailed in the previous embodiments, and are not described herein.
The foregoing has described in detail a text processing method, apparatus and computer readable storage medium according to embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only for aiding in understanding the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.
Claims (9)
1. A text processing method, comprising:
based on a preset initial model, carrying out initial prediction on text fragments at each position in a predicted text, and determining a first text fragment with correct prediction and a second text fragment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
covering a first text segment in the current training text to obtain a first input text;
covering a second text segment in the current training text to obtain a second input text;
performing feature extraction on the first input text based on the upper half branch network to obtain first feature information of the first input text, wherein the feature extraction comprises the following steps: performing position feature extraction and lexical feature extraction on text fragments of the first input text to obtain semantic feature information corresponding to each text fragment, performing convolution operation on the semantic feature information, extracting semantic correlation features between current semantic feature information and other semantic feature information to obtain semantic correlation information corresponding to each semantic feature information, performing weighting operation on the semantic feature information and corresponding semantic correlation information to obtain local feature information corresponding to the first input text, and performing full-connection operation on the local feature information corresponding to the first input text to obtain the first feature information;
Predicting text fragments at each position in the predicted text according to the first characteristic information, the second input text and the lower branch network to obtain the predicted text, wherein the method comprises the following steps: extracting features of the second input text, obtaining local feature information corresponding to the second input text, extracting grammar related features of the local feature information corresponding to the second input text and the first feature information, obtaining global feature information corresponding to text fragments at all positions, performing full-connection operation on the global feature information, obtaining probability distribution information of the text fragments at all positions, obtaining text fragments at all positions in a predicted text according to the probability distribution information and a preset word list, and obtaining the predicted text based on the text fragments at all positions;
Converging based on the predicted text and the current training text to obtain a target language model;
And predicting the text segment based on the target language model and the text to be processed to obtain a target text.
2. The text processing method of claim 1, wherein predicting the text segment at each position in the predicted text based on the preset initial model, determining the correctly predicted first text segment and the incorrectly predicted second text segment, comprises:
Extracting features of the current training text according to the upper branch network of the preset initial model, and acquiring feature information of the current training text;
Based on the lower branch network of the preset initial model and the characteristic information of the current training text, performing initial text prediction on text fragments at each position in a predicted text to obtain the predicted text;
And determining a first text segment with correct prediction and a second text segment with incorrect prediction in the predicted text according to the predicted text and the current training text.
3. The text processing method of claim 1, wherein before the initial prediction is performed on the text segments at each position in the predicted text based on the preset initial model, determining the first text segment with the correct prediction and the second text segment with the incorrect prediction, further comprising:
Acquiring a current training text from a preset text set;
Extracting the characteristics of the current training text based on the upper branch network of a preset initial model, and acquiring the characteristic information of the current training text;
According to the characteristic information, the current training text and the lower branch network of the preset initial model, text prediction is carried out on text fragments at all positions in the predicted text, and the predicted text is obtained;
and converging based on the predicted text and the current training text to obtain the preset initial model.
4. The text processing method of claim 1, wherein the converging based on the predicted text and the current training text results in a target language model, comprising:
Acquiring cross entropy loss of the predicted text and the current training text according to a preset loss function;
Based on the cross entropy loss, adjusting parameters in the preset initial model to obtain a current preset initial model after the training of the current training text;
And acquiring a target language model based on the current preset initial model.
5. The text processing method of claim 4, wherein adjusting parameters in the pre-set initial model based on the cross entropy loss to obtain a pre-set initial model after training the current training text, comprises:
If the cross entropy loss does not meet the preset condition, adjusting parameters in the preset initial model;
updating the first input text and the second input text according to the predicted text;
the step of carrying out feature extraction on the first input text based on the upper half branch network in the preset initial model to obtain first feature information of the first input text is carried out, until the cross entropy loss meets preset conditions;
and acquiring a current preset initial model after the current training text is trained.
6. The text processing method of claim 4, wherein obtaining a target language model based on the current pre-set initial model comprises:
deleting the current training text from a preset text set;
returning to the step of acquiring the text from the preset text set, and updating the current training text into the acquired text;
Training the current preset initial model based on the current training text until the texts in the preset text set are trained, and obtaining a target language model.
7. A text processing apparatus, comprising:
The segment acquisition unit is used for carrying out initial prediction on the text segments at each position in the predicted text based on a preset initial model, and determining a first text segment with correct prediction and a second text segment with incorrect prediction, wherein the preset initial model comprises an upper branch network and a lower branch network;
the first text acquisition unit is used for covering the first text segment in the current training text to obtain a first input text;
the second text acquisition unit is used for covering a second text segment in the current training text to obtain a second input text;
The first training unit is configured to perform feature extraction on the first input text based on the upper half branch network, and obtain first feature information of the first input text, where the first training unit includes: performing position feature extraction and lexical feature extraction on text fragments of the first input text to obtain semantic feature information corresponding to each text fragment, performing convolution operation on the semantic feature information, extracting semantic correlation features between current semantic feature information and other semantic feature information to obtain semantic correlation information corresponding to each semantic feature information, performing weighting operation on the semantic feature information and corresponding semantic correlation information to obtain local feature information corresponding to the first input text, and performing full-connection operation on the local feature information corresponding to the first input text to obtain the first feature information;
The second training unit is configured to predict text segments at each position in a predicted text according to the first feature information, the second input text, and the lower branch network, so as to obtain the predicted text, where the second training unit includes: extracting features of the second input text, obtaining local feature information corresponding to the second input text, extracting grammar related features of the local feature information corresponding to the second input text and the first feature information, obtaining global feature information corresponding to text fragments at all positions, performing full-connection operation on the global feature information, obtaining probability distribution information of the text fragments at all positions, obtaining text fragments at all positions in a predicted text according to the probability distribution information and a preset word list, and obtaining the predicted text based on the text fragments at all positions;
the convergence unit is used for converging based on the predicted text and the current training text to obtain a target language model;
And the text processing unit is used for predicting the text fragments based on the target language model and the text to be processed to obtain a target text.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when run on a computer, causes the computer to perform the text processing method according to any one of claims 1 to 6.
9. A network device comprising a processor and a memory, the memory storing a software program, the processor being configured to run the software program in the memory to perform the steps in the text processing method of any one of claims 1 to 6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910718851.2A CN110472242B (en) | 2019-08-05 | 2019-08-05 | Text processing method, device and computer readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910718851.2A CN110472242B (en) | 2019-08-05 | 2019-08-05 | Text processing method, device and computer readable storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110472242A CN110472242A (en) | 2019-11-19 |
| CN110472242B true CN110472242B (en) | 2024-06-28 |
Family
ID=68511271
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910718851.2A Active CN110472242B (en) | 2019-08-05 | 2019-08-05 | Text processing method, device and computer readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110472242B (en) |
Families Citing this family (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111104789B (en) * | 2019-11-22 | 2023-12-29 | 华中师范大学 | Text scoring method, device and system |
| CN111274815B (en) * | 2020-01-15 | 2024-04-12 | 北京百度网讯科技有限公司 | Method and apparatus for mining entity attention points in text |
| CN111444311B (en) * | 2020-02-26 | 2024-11-01 | 平安科技(深圳)有限公司 | Semantic understanding model training method, device, computer equipment and storage medium |
| CN111522944B (en) * | 2020-04-10 | 2023-11-14 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
| CN111553363B (en) * | 2020-04-20 | 2023-08-04 | 北京易道博识科技有限公司 | End-to-end seal identification method and system |
| CN112069717A (en) * | 2020-08-19 | 2020-12-11 | 五邑大学 | Magnetic storm prediction method and device based on multi-mode representation learning and storage medium |
| CN113159013B (en) * | 2021-04-28 | 2024-05-07 | 平安科技(深圳)有限公司 | Paragraph identification method, device, computer equipment and medium based on machine learning |
| CN113255780B (en) * | 2021-05-28 | 2024-05-03 | 润联智能科技股份有限公司 | Reduction gearbox fault prediction method and device, computer equipment and storage medium |
| CN113741783B (en) * | 2021-07-30 | 2024-07-05 | 北京搜狗科技发展有限公司 | Key identification method and device for identifying keys |
| CN113849624B (en) * | 2021-10-15 | 2025-08-29 | 广州天宸健康科技有限公司 | A word slot extraction device and method for multi-round dialogue |
| CN114281997B (en) * | 2021-12-28 | 2025-07-25 | 维沃移动通信有限公司 | Model training method, text processing device and electronic equipment |
| CN114330279B (en) * | 2021-12-29 | 2023-04-18 | 电子科技大学 | Cross-modal semantic consistency recovery method |
| CN114547273B (en) * | 2022-03-18 | 2022-08-16 | 科大讯飞(苏州)科技有限公司 | Question answering method and related device, electronic equipment and storage medium |
| CN115062003B (en) * | 2022-05-26 | 2024-04-16 | 电子科技大学 | Cloud ERP community generation type question-answering method based on GPT2 |
| CN116861258B (en) * | 2023-08-31 | 2023-12-01 | 腾讯科技(深圳)有限公司 | Model processing method, device, equipment and storage medium |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2408819A1 (en) * | 2000-05-11 | 2001-11-15 | University Of Southern California | Machine translation techniques |
| CN108090040A (en) * | 2016-11-23 | 2018-05-29 | 北京国双科技有限公司 | A kind of text message sorting technique and system |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10372821B2 (en) * | 2017-03-17 | 2019-08-06 | Adobe Inc. | Identification of reading order text segments with a probabilistic language model |
| CN109635150B (en) * | 2018-12-19 | 2021-07-02 | 腾讯科技(深圳)有限公司 | Text generation method, device and storage medium |
| CN110046342A (en) * | 2019-02-19 | 2019-07-23 | 阿里巴巴集团控股有限公司 | A kind of text quality's detection method |
-
2019
- 2019-08-05 CN CN201910718851.2A patent/CN110472242B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2408819A1 (en) * | 2000-05-11 | 2001-11-15 | University Of Southern California | Machine translation techniques |
| CN108090040A (en) * | 2016-11-23 | 2018-05-29 | 北京国双科技有限公司 | A kind of text message sorting technique and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110472242A (en) | 2019-11-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110472242B (en) | Text processing method, device and computer readable storage medium | |
| CN110852116B (en) | Non-autoregressive neural machine translation method, device, computer equipment and medium | |
| CN112487182B (en) | Text processing model training method, text processing method and device | |
| CN116431793B (en) | A visual question answering method, device and storage medium based on knowledge generation | |
| CN110188176B (en) | Deep learning neural network, and training and predicting method, system, device and medium | |
| US11972365B2 (en) | Question responding apparatus, question responding method and program | |
| CN110796160B (en) | Text classification method, device and storage medium | |
| US20220300718A1 (en) | Method, system, electronic device and storage medium for clarification question generation | |
| CN109992773B (en) | Word vector training method, system, device and medium based on multi-task learning | |
| CN108875074B (en) | Answer selection method and device based on cross attention neural network and electronic equipment | |
| JP2025077914A (en) | Method and system for intelligent analysis of bills based on semantic graph model | |
| JP6649536B1 (en) | Dialogue processing device, learning device, dialogue processing method, learning method and program | |
| CN119096248A (en) | Compare the commentary and add a neural network | |
| CN110427625A (en) | Sentence complementing method, device, medium and dialog process system | |
| CN114492451B (en) | Text matching method, device, electronic equipment and computer readable storage medium | |
| Shi et al. | Neural natural logic inference for interpretable question answering | |
| CN118170668A (en) | Test case generation method, device, storage medium and equipment | |
| CN118349849A (en) | Training method and device for generating model | |
| Raj et al. | A study of recent advancements in deep learning for natural language processing | |
| US20250054322A1 (en) | Attribute Recognition with Image-Conditioned Prefix Language Modeling | |
| CN118536572B (en) | Method for training dialogue model, dialogue realization method and related device | |
| CN114757210A (en) | Training method of translation model, sentence translation method, apparatus, equipment, program | |
| KR20230062008A (en) | Inference method using transformer model and electronic device for performing the same | |
| CN112052320B (en) | Information processing method, device and computer readable storage medium | |
| WO2024159777A1 (en) | Model optimization method and apparatus, and computer device and computer storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |