Detailed Description
Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.
In the production of the power industry, a large amount of power production information such as scheduling instructions is involved, and the analysis of the scheduling instructions can assist in improving the production efficiency, and has a good application prospect. However, due to the unstructured nature of the power production information, there is some difficulty in extracting semantic information from the power production information.
In the related art, the text classification process can use manual classification or automatic classification to label information, but the manual classification needs huge time cost and labor cost, and with the continuous growth of text data, automatic classification of text for information labeling becomes more and more important.
Currently, text classification methods can be classified into the following three categories:
1. rule-based text classification model
The text classification model based on the rules can obtain good classification effect in the specific field, and has the advantages of low time complexity and high running speed. However, since the text classification model is described by defining a plurality of rules and the rules of the category to which the text belongs are defined by the expert knowledge base in the field, the migration capability of the text classification model is poor.
2. Text classification model based on machine learning
Text classification models based on machine learning, for example, may include NB models (Naive Bayes Model, naive bayes), random forest models, SVM (Support Vector Machine ) models, and the like. However, the training of the above-described conventional machine learning model requires a large amount of annotation data to mine the relationship between text information and annotation data.
3. Text classification model based on deep learning
The structure of the text classification model based on deep learning is relatively complex, the text classification model is independent of manually acquired text characteristics, and text content can be directly modeled and learned by mapping the text into a low-dimensional vector space.
In the field of natural language processing, numerous large text pre-training models have achieved good results. However, training of these models requires a large amount of labeling data and huge parameters, making the model costly and time consuming in the training process.
In view of at least one of the above problems, the present disclosure proposes a training method for a text classification model, a text classification method, a device, an apparatus, and a medium.
The following describes a training method, a text classification method, an apparatus, a device and a medium of a text classification model according to an embodiment of the present disclosure with reference to the accompanying drawings.
Fig. 1 is a flowchart of a training method of a text classification model according to an embodiment of the disclosure.
The embodiment of the disclosure is exemplified by the training method of the text classification model being configured in the training device of the text classification model, and the training device of the text classification model can be applied to any electronic device so that the electronic device can execute the training function of the text classification model.
The electronic device may be any device with computing capability, for example, may be a personal computer (Personal Computer, abbreviated as PC), a mobile terminal, a server, and the mobile terminal may be a hardware device with various operating systems, touch screens, and/or display screens, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, and the like.
As shown in fig. 1, the training method of the text classification model may include the following steps:
Step 101, obtaining a training text, and coding the training text by adopting a coding network in a text classification model to obtain a first semantic feature.
In an embodiment of the present disclosure, the training text may include power production information, where the power production information may include power scheduling text, power production meeting content, power production daily reports, power production weekly reports, and the like, to which the present disclosure is not limited.
In the embodiment of the present disclosure, the manner of obtaining the training text is not limited, for example, the training text may be obtained from an existing training set, or the training text may also be collected online, for example, may be obtained online through a web crawler technology, or the training text may also be provided for a user, or the like, which is not limited in this disclosure.
In embodiments of the present disclosure, the text classification model may include an encoding network, such as a BERT (Bidirectional Encoder Representations from Transformer, transform-based bi-directional encoder representation) model, ernie (Enhanced language Representation with Informative Entities, a knowledge-enhanced semantic representation model), and the like, which is not limiting of the present disclosure.
In embodiments of the present disclosure, the training text may be encoded using an encoding network in a text classification model to obtain the first semantic feature.
And 102, acquiring noise characteristics, wherein the sizes of the noise characteristics and the first semantic characteristics are matched.
In the embodiment of the present disclosure, the size of the first semantic feature is determined according to the number of rows and columns of the first semantic feature, for example, the first semantic feature has n rows and m columns, and the size of the first semantic feature is n×m.
In the embodiment of the present disclosure, the size of the noise feature may be matched with the size of the first semantic feature, for example, the size of the first semantic feature is n×m, and the size of the noise feature is also n×m.
In embodiments of the present disclosure, noise characteristics may be obtained, such as may be generated using a gaussian noise generation function.
And 103, fusing the noise characteristics and the first semantic characteristics to obtain fused characteristics.
In the embodiment of the disclosure, the noise feature and the first semantic feature may be fused to obtain a fused feature.
As one example, the noise feature and the first semantic feature may be added to obtain a fused feature.
As another example, the noise feature and the first semantic feature may be stitched to obtain a fused feature.
Step 104, performing first training on the text classification model based on the first semantic features and the fusion features.
In embodiments of the present disclosure, the text classification model may be first trained based on the first semantic features and the fusion features. Therefore, the text classification model-based coding network captures the semantic features of the training text and fuses the semantic features of noise, so that the text classification model can be pre-trained.
The training method of the text classification model comprises the steps of obtaining training texts, encoding the training texts by adopting an encoding network in the text classification model to obtain first semantic features, obtaining noise features, matching the sizes of the noise features and the first semantic features, fusing the noise features and the first semantic features to obtain fused features, and performing first training on the text classification model based on the first semantic features and the fused features. Therefore, the text classification model based coding network captures semantic features of training texts and fuses the semantic features of noise, pre-training of the text classification model can be achieved, the text classification model can effectively learn significant semantic information in the training texts before real training, and accordingly when a small amount of training texts are used for carrying out real training on the text classification model, performance and performance of the model can be improved, and dependence of the model on labeling data can be effectively reduced.
In order to clearly illustrate how the above embodiments of the present disclosure use the coding network in the text classification model to code the training text to obtain the first semantic feature, the present disclosure also proposes a training method of the text classification model.
Fig. 2 is a flowchart of a training method of a text classification model according to a second embodiment of the disclosure.
As shown in fig. 2, the training method of the text classification model may include the following steps:
Step 201, a training text is obtained.
The implementation of step 201 may refer to the implementation of any embodiment of the present disclosure, which is not described herein.
Step 202, word segmentation processing is performed on the training text to obtain at least one word segment of the training text.
In the embodiments of the present disclosure, the number of the words may be, but is not limited to, one, which the present disclosure does not limit.
In the embodiment of the disclosure, the training text may be subjected to word segmentation processing, for example, a sub-word segmenter (Subword Tokenization), a word segmentation algorithm based on HMM (Hidden Markov Model ), a word segmentation algorithm based on CRF (Conditional Random Filed, conditional random field), and the like may be used to segment the training text to obtain at least one word segment of the training text.
As an example, assume that the training text is "step up the voltage of the substation to 100 volts", and after the word segmentation process is performed on the training text, the resultant word includes "will", "substation", "voltage", "boost", "to", "100", "volt".
Step 203, obtaining a word vector corresponding to at least one word segmentation.
In the embodiment of the present disclosure, for any word in at least one word, a word vector corresponding to the any word may be obtained, for example, a torch.nn.ebedding () function may be used to generate a word vector corresponding to the any word.
When a plurality of word segments are used, the word vectors corresponding to the respective word segments have the same size.
Step 204, based on the position of the at least one word in the training text, combining word vectors corresponding to the at least one word to obtain an input vector.
It is to be appreciated that for any of the at least one word segment, the any word segment may have a corresponding position in the training text.
In the embodiment of the disclosure, the word vector corresponding to the at least one word segment can be combined based on the position of the at least one word segment in the training text, so that an input vector can be obtained.
Still referring to the above example, assume that the word "corresponding to the word" is a, the word vector corresponding to the word "substation" is B, the word vector corresponding to the word "is C, the word vector corresponding to the word" voltage "is D, the word vector corresponding to the word" up "is E, the word vector corresponding to the word" up "is F, the word vector corresponding to the word" 100 "is G, the word vector corresponding to the word" volt "is H, and the word vectors corresponding to the respective words are combined based on the position of the word in the training text" up the voltage of the substation to 100 volt ", so as to obtain an input vector (a B C D E F G H).
Step 205, inputting the input vector to a coding network in the text classification model, so as to code the input vector by using the coding network, thereby obtaining a first semantic feature.
In embodiments of the present disclosure, an input vector may be input to an encoding network in a text classification model to encode the input vector with the encoding network, such that a first semantic feature may be obtained.
It is understood that the first semantic feature may comprise a semantic vector that trains the individual tokens in the text. In order to utilize semantic information of upper and lower texts of each word in training text to represent corresponding words and obtain enhanced semantic vectors of each word under different semantic spaces, in one possible implementation manner of the embodiment of the disclosure, an input vector may be input to a coding network in a text classification model to code each word based on multiple attention mechanisms by adopting the coding network and obtain multiple coding vectors corresponding to each word, and for any word, the multiple coding vectors corresponding to the word may be fused to obtain the semantic vector corresponding to the word.
In the embodiment of the present disclosure, a plurality of encoding vectors corresponding to each word segment may be fused, for example, a plurality of encoding vectors corresponding to each word segment may be weighted and summed to obtain a semantic vector corresponding to the word segment.
It should be noted that, for any word, the size of the semantic vector of the word may be the same as the size of the word vector corresponding to the word.
As an example, assuming that the coding network is a BERT model, the BERT model includes 12 layers of transformers, where each layer of transformers includes a forward propagation layer and a multi-head attention layer, the coding network may code each word based on multiple attention mechanisms and may obtain multiple coding vectors corresponding to each word, and for any word, the multiple coding vectors corresponding to the word may be fused, for example, the multiple coding vectors corresponding to the word may be linearly combined, so as to obtain a semantic vector having the same size as a word vector corresponding to the word.
And 206, acquiring noise characteristics, wherein the sizes of the noise characteristics and the first semantic characteristics are matched.
Step 207, fusing the noise feature and the first semantic feature to obtain a fused feature.
Step 208, performing a first training on the text classification model based on the first semantic features and the fusion features.
The execution of steps 206 to 208 may refer to the execution of any embodiment of the disclosure, and will not be described herein.
In any one embodiment of the disclosure, the real training of the text classification model may be that a coding network in the text classification model after the first training may be used to recode the training text to obtain a third semantic feature, the third semantic feature may be input into a classification network in the text classification model after the first training to classify, a prediction class to which the training text belongs may be obtained, a second loss value may be generated according to a difference between a labeling class and the prediction class on the training text, and the text classification model after the first training may be subjected to the second training according to the second loss value.
In an embodiment of the present disclosure, the text classification model may include a classification network, so in the present disclosure, the third semantic feature may be input into the classification network in the first trained text classification model to classify, so as to obtain the prediction category to which the training text belongs.
In the embodiment of the disclosure, the second loss value may be generated according to a difference between the labeling category and the prediction category on the training text. The difference between the second loss value and the above-mentioned difference is a forward relationship (i.e., a positive correlation), that is, the smaller the difference is, the smaller the value of the second loss value is, and the larger the difference is, the larger the value of the second loss value is.
As an example, the training text is "raising the voltage of the substation to 100 v", "the labeling type of the substation" is "power plant facility", "the labeling type of the voltage" is "object", "the labeling type of the raising" is "operation technology", the prediction type of the substation "of the training text is" object "," the labeling type of the voltage "is" object "," the labeling type of the raising "is" operation technology ", at this time, because there is a large difference between the labeling type and the prediction type corresponding to the substation, the prediction accuracy of the text classification model is not good, and therefore, the model parameters in the text classification model can be adjusted, specifically, a second loss value can be generated according to the difference between the labeling type and the prediction type on the training text, so in the present disclosure, the text classification model subjected to the first training can be subjected to the second training according to the second loss value.
For example, the text classification model after the first training may be second trained according to the second loss value to minimize the value of the second loss value.
It should be noted that, the foregoing example is only implemented by taking the termination condition of the second training of the text classification model as the second loss value minimization, and other termination conditions may be set in practical application, for example, the training times reach the set times, the training duration reaches the set duration, the second loss value converges, and the disclosure is not limited to this.
According to the training method of the text classification model, word segmentation processing is conducted on a training text to obtain at least one word segmentation of the training text, word vectors corresponding to the at least one word segmentation are obtained, word vectors corresponding to the at least one word segmentation are combined based on the position of the at least one word segmentation in the training text to obtain input vectors, the input vectors are input to a coding network in the text classification model, and the input vectors are coded by the coding network to obtain first semantic features. Therefore, each word segmentation can be obtained by segmenting the training text, and the input vector of the coding network can be obtained by combining the word vectors of each word segmentation, so that the first semantic features of the training text can be effectively obtained based on the way that the input vector is coded by the coding network.
In any embodiment of the present disclosure, to clearly illustrate how to first train a text classification model based on first semantic features and fusion features, the present disclosure further proposes a training method of the text classification model.
Fig. 3 is a flowchart of a training method of a text classification model according to a third embodiment of the disclosure.
As shown in fig. 3, the training method of the text classification model may include the following steps:
Step 301, obtaining training text.
Step 302, word segmentation processing is performed on the training text to obtain at least one word segment of the training text.
Step 303, obtaining a word vector corresponding to at least one word segmentation.
Step 304, based on the position of at least one word in the training text, word vectors corresponding to the at least one word are combined to obtain an input vector.
Step 305, inputting the input vector to the coding network in the text classification model, so as to code the input vector by using the coding network, thereby obtaining the first semantic feature.
And 306, acquiring noise characteristics, wherein the sizes of the noise characteristics and the first semantic characteristics are matched.
Step 307, fusing the noise feature and the first semantic feature to obtain a fused feature.
The execution of steps 301 to 307 may refer to the execution of any embodiment of the disclosure, and will not be described herein.
Step 308, inputting the fusion feature into a generator in the countermeasure network in the text classification model to obtain a second semantic feature output by the generator.
In an embodiment of the present disclosure, the text classification model may include an countermeasure Network, which may be, for example, a GAN (GENERATIVE ADVERSARIAL Network, generation countermeasure Network) Network.
As one example, a generator in the countermeasure network in the text classification model may include a 1-layer full connection layer and a 3-layer deconvolution layer.
It should be noted that the above examples of the generator structure are merely exemplary, and the generated structure may be set as needed in actual application, which is not limited by the present disclosure.
In embodiments of the present disclosure, the fused features may be input into a generator in a countermeasure network in a text classification model to obtain second semantic features of the generator output.
Step 309, inputting the first semantic feature into a discriminator in the countermeasure network to obtain a first output value of the discriminator output.
In embodiments of the present disclosure, an identifier (also referred to as a discriminant) may be included in the antagonism network.
As one possible implementation, the first semantic feature and the input vector may be stitched to obtain a first stitched feature, and the first stitched feature may be input to a discriminator to obtain a first output value output by the discriminator.
As an example, assume that the first semantic feature X' isInput vector X isSplicing the first semantic features and the input vectors to obtain first spliced features which can beAnd the first splice characteristic can be input into a discriminator to obtain a first output value output by the discriminator, for example, the first output value can be D #,)。
It should be noted that the above examples of the first stitching feature are merely exemplary, and in practical applications, other examples may be used, for example, any of the above examples may be usedOr the first stitching feature may also beOr the first stitching feature may also be。
Step 310, inputting the second semantic feature into the discriminator to obtain a second output value of the discriminator output.
As one possible implementation, the first semantic feature and the second semantic feature may be stitched to obtain a second stitched feature, and the second stitched feature may be input to the discriminator to obtain a second output value output by the discriminator.
As an example, assume that the first semantic feature is X' isThe second semantic feature isIs thatSplicing the first semantic features and the second semantic features to obtain second spliced features which can beAnd the second splice characteristic can be input into a discriminator to obtain a second output value output by the discriminator, for example, the second output value can be D #,)。
It should be noted that the above examples of the second stitching feature are merely exemplary, and in practical applications, other examples may be used, for example, the second stitching feature may beOr the second splice feature may also beOr the second splice feature may also be。
Step 311, determining a first loss value according to the first output value and the second output value.
In the embodiment of the disclosure, the first loss value may be determined according to the first output value and the second output value.
For example, the first output value is D # -,) The second output value is D%,) The first loss value may be determined according to the following formula:
;(1)
Wherein B is the coding network of the text classification model, D is the discriminator of the challenge network in the text classification model, G is the generator of the challenge network in the text classification model, V (D, B, G) is the first loss value, and E (·) represents the expectation of the distribution function.
At step 312, a first training of the text classification model is performed based on the first penalty value.
In embodiments of the present disclosure, the text classification model may be first trained based on the first penalty value.
As an example, the model parameters of the encoding network and the model parameters of the generator of the countermeasure network in the text classification model may be adjusted according to the first loss value to minimize the value of the first loss value, for example, the model parameters in the text classification model are adjusted for formula (1) to be able to update the model parameters of the encoding network and the generator by gradient descent, and the model parameters of the discriminator are updated by gradient ascent, i.e., the following min-max objective function is satisfied:
;(2)
It should be noted that, the foregoing example is only that the termination condition of the first training of the text classification model is taken as the value minimizing of the first loss value, and other termination conditions may be set in practical application, for example, the termination condition may be that the training number reaches the set number, or the termination condition may be that the training duration reaches the set duration, etc., which is not limited in this disclosure.
Therefore, the antagonism network based on the text classification model can realize that the coding network of the text classification model learns the semantic features of the training text under the condition of no labeling data.
The training method of the text classification model comprises the steps of inputting fusion features into a generator in a countermeasure network in the text classification model to obtain second semantic features output by the generator, inputting first semantic features into a discriminator in the countermeasure network to obtain first output values output by the discriminator, inputting second semantic features into the discriminator to obtain second output values output by the discriminator, determining first loss values according to the first output values and the second output values, and performing first training on the text classification model according to the first loss values. Therefore, based on the countermeasure network of the text classification model, the text classification model can learn the semantic features of the training text without labeling data, and the performance and performance of the text classification model can be improved, so that the accuracy and reliability of the model prediction result can be improved.
The above embodiments correspond to a training method of a text classification model, and the disclosure further provides an application method of the text classification model, that is, a text classification method.
Fig. 4 is a flowchart of a text classification method according to a fourth embodiment of the disclosure.
As shown in fig. 4, the text classification method may include the steps of:
step 401, obtaining power production information.
In the embodiment of the present disclosure, the power generation information may include power scheduling text, power generation meeting content, power generation daily report, power generation weekly report, etc., which the present disclosure does not limit.
In the embodiment of the present disclosure, the manner of acquiring the power production information is not limited, for example, the power production information may be collected online by a web crawler technology, or the power production information may be collected offline, or the power production information may be input by a user, or the like, which is not limited in this disclosure.
Step 402, encoding the power production information by using the encoding network in the trained text classification model to obtain semantic features.
The text classification model may be trained by using the training method of the text classification model provided in any embodiment.
In embodiments of the present disclosure, the power production information may be encoded using an encoding network in a trained text classification model to obtain semantic features.
As an example, the power production information may be subjected to word segmentation operation to obtain target words in the power production information, after initial vectors corresponding to the target words are obtained, the initial vectors corresponding to the target words may be combined based on positions of the target words in the power production information to obtain input vectors, and the input vectors may be input to a coding network in a text classification model to encode the input vectors with the coding network, so that semantic features may be obtained.
And step 403, inputting the semantic features into a classification network in the text classification model to classify so as to obtain the category to which the power production information belongs.
In the embodiment of the disclosure, semantic features can be input into a classification network in a text classification model to classify, and the category to which the power production information belongs can be obtained.
The text classification method comprises the steps of obtaining power production information, encoding the power production information by adopting an encoding network in a trained text classification model to obtain semantic features, wherein the text classification model is trained by adopting the training method of the text classification model provided by any embodiment of the disclosure, and inputting the semantic features into the classification network in the text classification model to classify so as to obtain the category to which the power production information belongs. Therefore, the electric power production information can be automatically classified based on the deep learning technology, and the classification effect, namely the accuracy and the reliability of the classification result can be improved.
As an example, the text classification model is trained with a generation countermeasure network, wherein the BERT model may be employed as an encoding network of the power production information, i.e. the encoding network of the text classification model is the BERT model, by which the power production information is encoded, semantic representations of the power production information (denoted as first semantic features in the present disclosure) may be obtained. The semantic representation of the power production information is combined with a gaussian prior to obtain a fused feature, which may be encoded by a generator to obtain a token vector (denoted as a second semantic feature in this disclosure), which is input as a negative sample to a discriminator (also referred to as a discriminator), while the semantic representation obtained by the BERT model may be input as a positive sample to the discriminator. Thus, the text classification model can be optimized to an optimal solution by the countermeasure training of the generator and the discriminator. Wherein a training schematic of the text classification model of the present disclosure may be as shown in fig. 5.
Specifically, the code network of the text classification model adopts the BERT model, and aims to perform first training on the text classification model by using a large-scale non-labeling corpus, obtain semantic representation of the electric power production information containing rich semantic information, and finally apply the semantic representation to a text classification task. The main input of the BERT model is an original word vector (which is recorded as a word vector in the disclosure) of each word in the electric power production information, wherein the original word vector of each word can be generated through random initialization, word vectors corresponding to each word can be combined to obtain an input vector, and the input vector is input into the BERT model, so that vector representation of each word in the electric power production information after the whole text semantic information is fused, namely the semantic representation of the electric power production information can be output.
Illustratively, the BERT model includes 12 layers of transformers, and each layer of transformers includes a forward propagation layer and a multi-headed attention layer. To differentially utilize representations of the terms using different context semantic information, the present disclosure employs an attention mechanism that can obtain the weight of the context on the terms. In the operation process of the attention mechanism, a combined characterization matrix of Key vectors (Key vectors), query vectors (Query vectors) and Value vectors (Value vectors) can be used as an input matrix of a next layer of transducer, and the three vectors can form three corresponding weight matrices, which are respectively marked as W K、WQ and W V. In order to enhance the diversity of the attention mechanism, the method and the device adopt a plurality of different self-attention modules to encode each word of the power production information, so that enhanced semantic vectors of each word in the power production information under different semantic spaces, namely the encoded vectors in the method and the device can be obtained, and aiming at any word, a plurality of encoded vectors of any word can be linearly combined, so that the semantic vectors of each word can be obtained.
In the related art, the task of the generator module is to generate negative samples from samples extracted from a gaussian distribution. Wherein the generator adopts a 1-layer full-connection layer and a 3-layer deconvolution layer to learn Gaussian distribution to obtain false embedding. Inspired by generating a challenge network (GAN), this approach is meaningless for high-dimensional data if the generator of the text classification model in the present disclosure only processes samples generated by gaussian distributions (noted as noise features in the present disclosure). Because semantic changes in the underlying space of text need to be captured from gaussian distributions, in the present disclosure, vectors input into a text classification model that are generated against generators in a network may construct the above-described pseudo-embedding by combining first semantic features with gaussian distributions, i.e., fusing the first semantic features with noise features to obtain fused features.
Assuming that the first semantic feature is Z, the noise feature is Y, and the features are fusedThe determination may be made according to the following equation:
;(3)
Finally, a bilinear binary classification network may be employed as a discriminator for distinguishing positive and negative samples and for first training the text classification model.
As can be seen, the text classification model of the present disclosure includes an encoding network B, a generator G, and a discriminator D, in order to achieve a first training of the text classification model, an optimization function of the following equation (4) may be employed:
;(4)
B(X)=Z;(5)
Wherein X is an input vector, B (X) is a first semantic feature of the coding network B after coding the input vector X, Z is a first semantic feature, As a fused feature after the fusion of the noise feature Y with the first semantic feature Z,To input the fusion feature into the second semantic feature obtained by generator G, P r is a positive sample distribution and P g is a negative sample distribution.
The optimization function described above is a min-max objective function, and the optimization function may employ the same alternating random gradient as a generic generation countermeasure network to optimize (i.e., first train) the text classification model in the present disclosure. Wherein in each iteration the parameter matrix of the discriminator may be updated by performing one or more steps in the positive gradient direction, and the parameters of the encoding network and the parameters of the generator may be updated together in one step towards the negative gradient direction.
Thus, the text classification model of the present disclosure uses the counter learning concept to generate good output results by using the mutual game learning between the generator and the discriminator of the counter network, wherein the training principle of the text classification model of the present disclosure is shown in fig. 5.
The text classification model disclosed by the disclosure can be used for solving the problems that in the existing method, mass data processing can cause long model construction time and high data labeling cost in a fine adjustment stage, and specifically, the advantages of the model can be reflected from the following two aspects:
1. Aiming at the problem that a large amount of labeling data and huge parameters are required for training a traditional model in the field of text classification, the text classification method based on the generation countermeasure network can effectively reduce the dependence of the model on the labeling data.
2. In the process of constructing the negative samples, if a random distribution mode is directly adopted, the negative samples have no meaning, and nonsensical negative sample embedding can influence the optimal training of the model, and a more robust coding network can be trained by adopting a Gaussian distribution and true embedding combination mode in the text classification model.
Corresponding to the training method of the text classification model provided by the embodiments of fig. 1 to 3, the present disclosure further provides a training device of the text classification model, and since the training device of the text classification model provided by the embodiments of the present disclosure corresponds to the training method of the text classification model provided by the embodiments of fig. 1 to 3, the implementation of the training method of the text classification model is also applicable to the training device of the text classification model provided by the embodiments of the present disclosure, which is not described in detail in the embodiments of the present disclosure.
Fig. 6 is a schematic structural diagram of a training device for a text classification model according to a fifth embodiment of the disclosure.
As shown in fig. 6, the training apparatus 600 of the text classification model may include a processing module 601, an obtaining module 602, a fusing module 603, and a first training module 604.
The processing module 601 is configured to obtain a training text, and encode the training text by using an encoding network in a text classification model to obtain a first semantic feature.
An obtaining module 602, configured to obtain a noise feature, where the noise feature matches a size of the first semantic feature.
And the fusion module 603 is configured to fuse the noise feature and the first semantic feature to obtain a fused feature.
A first training module 604 is configured to perform a first training on the text classification model based on the first semantic features and the fusion features.
In one possible implementation manner of the embodiment of the present disclosure, a processing module 601 is configured to perform word segmentation on a training text to obtain at least one word segment of the training text, obtain a word vector corresponding to the at least one word segment, combine the word vectors corresponding to the at least one word segment based on a position of the at least one word segment in the training text to obtain an input vector, and input the input vector to a coding network in a text classification model to encode the input vector with the coding network to obtain a first semantic feature.
In one possible implementation manner of the embodiment of the present disclosure, the first semantic feature includes a semantic vector of each word segment in the training text, and the processing module 601 is configured to input an input vector into a coding network in the text classification model, so as to code each word segment based on multiple attention mechanisms by using the coding network and obtain multiple coding vectors corresponding to each word segment, and fuse the multiple coding vectors corresponding to each word segment for any word segment to obtain the semantic vector corresponding to the word segment.
In one possible implementation of the disclosed embodiments, a first training module 604 is configured to input the fused feature into a generator in a countermeasure network in the text classification model to obtain a second semantic feature output by the generator, input the first semantic feature into a discriminator in the countermeasure network to obtain a first output value output by the discriminator, input the second semantic feature into the discriminator to obtain a second output value output by the discriminator, determine a first loss value according to the first output value and the second output value, and perform a first training on the text classification model according to the first loss value.
In one possible implementation of the embodiment of the disclosure, the first training module 604 is configured to splice the first semantic feature and the input vector to obtain a first spliced feature, and input the first spliced feature into the discriminator to obtain a first output value output by the discriminator.
In one possible implementation of the embodiment of the disclosure, the first training module 604 is configured to splice the first semantic feature and the second semantic feature to obtain a second spliced feature, and input the second spliced feature into the discriminator to obtain a second output value output by the discriminator.
In one possible implementation of the embodiment of the present disclosure, the training apparatus 600 of the text classification model may further include:
And the coding module is used for recoding the training text by adopting a coding network in the text classification model after the first training so as to obtain a third semantic feature.
And the classification module is used for inputting the third semantic features into a classification network in the text classification model subjected to the first training to classify so as to obtain the prediction category to which the training text belongs.
And the generation module is used for generating a second loss value according to the difference between the annotation category and the prediction category on the training text.
And the second training module is used for carrying out second training on the text classification model subjected to the first training according to the second loss value.
According to the training device of the text classification model, training texts are obtained, and coding networks in the text classification model are adopted to code the training texts so as to obtain first semantic features; the method comprises the steps of obtaining noise features, matching the sizes of the noise features and first semantic features, fusing the noise features and the first semantic features to obtain fused features, and performing first training on a text classification model based on the first semantic features and the fused features. Therefore, the text classification model based coding network captures semantic features of training texts and fuses the semantic features of noise, pre-training of the text classification model can be achieved, the text classification model can effectively learn significant semantic information in the training texts before real training, and accordingly when a small amount of training texts are used for carrying out real training on the text classification model, performance and performance of the model can be improved, and dependence of the model on labeling data can be effectively reduced.
Corresponding to the text classification method provided by the embodiment of fig. 4, the present disclosure also provides a text classification device, and since the text classification device provided by the embodiment of the present disclosure corresponds to the text classification method provided by the embodiment of fig. 4, the implementation of the text classification method is also applicable to the text classification device provided by the embodiment of the present disclosure, which is not described in detail in the embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of a text classification device according to a sixth embodiment of the disclosure.
As shown in fig. 7, the text classification apparatus 700 may include an acquisition module 701, an encoding module 702, and a classification module 703.
Wherein, the acquiring module 701 is configured to acquire power generation information.
The encoding module 702 encodes the power production information by using an encoding network in a trained text classification model to obtain semantic features, where the text classification model is trained by using the training device of the text classification model proposed in the embodiment of fig. 6 of the present disclosure.
The classification module 703 is configured to input semantic features into a classification network in the text classification model to classify the semantic features so as to obtain a class to which the power generation information belongs.
The text classification device of the embodiment of the disclosure obtains the power production information by obtaining the power production information, encodes the power production information by adopting an encoding network in a trained text classification model to obtain semantic features, wherein the text classification model is trained by adopting the training method of the text classification model provided by any embodiment of the disclosure, and inputs the semantic features into a classification network in the text classification model to classify so as to obtain the category to which the power production information belongs. Therefore, the electric power production information can be automatically classified based on the deep learning technology, and the classification effect, namely the accuracy and the reliability of the classification result can be improved.
In order to implement the foregoing embodiments, the disclosure further provides an electronic device, which is characterized by including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the training method of the text classification model according to any of the foregoing embodiments of the disclosure when executing the program.
To achieve the above embodiments, the present disclosure further proposes a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a training method of a text classification model as proposed in any of the foregoing embodiments of the present disclosure.
To implement the above embodiments, the present disclosure also provides a computer program product which, when executed by a processor, performs a training method of a text classification model as set forth in any of the foregoing embodiments of the present disclosure.
As shown in fig. 8, the electronic device 12 is in the form of a general purpose computing device. The components of the electronic device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECTION; hereinafter PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, commonly referred to as a "hard disk drive"). Although not shown in fig. 8, a disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.
The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks, such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN), and/or a public network, such as the Internet, through the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the methods mentioned in the foregoing embodiments.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is at least two, such as two, three, etc., unless explicitly specified otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), etc.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.