CN111813928B

CN111813928B - Assessing text classification anomalies predicted by a text classification model

Info

Publication number: CN111813928B
Application number: CN202010273725.3A
Authority: CN
Inventors: 谭铭; S·波达尔; L·克里希纳默西
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2019-04-10
Filing date: 2020-04-09
Publication date: 2024-09-06
Anticipated expiration: 2040-04-09
Also published as: CN111813928A

Abstract

In response to running at least one test phrase on the pre-trained text classifier and identifying a separate predictive classification marker based on the score calculated for each respective at least one test phrase, the text classifier decomposes the plurality of extracted features aggregated in the score into word-level scores for each word in the at least one test phrase. The text classifier assigns a separate heat map value to each word-level score, each respective separate heat map value reflecting the weight of each word-level score. The text classifier outputs individual predictive classification labels and each individual heat map value reflecting the weight of each word level score for defining a heat map identifying the contribution of each word in the at least one test phrase to the individual predictive classification labels to facilitate the client in assessing text classification anomalies.

Description

Assessing text classification anomalies predicted by a text classification model

Technical Field

One or more embodiments of the invention relate generally to data processing and, in particular, to evaluating text classification anomalies predicted by a text classification model.

Description of the Related Art

Machine learning plays an important role in many Artificial Intelligence (AI) applications. One of the achievements in the process of training a machine learning application is a data object called a model used in text classification, which is a parametric representation of patterns (patterns) inferred from training data. After the model is created, the model is deployed into one or more environments used in text classification. At run-time, the model is the core of the machine learning system based on the number of hours developed and the structure generated by the large amount of data.

Disclosure of Invention

In one embodiment, a method involves: in response to running at least one test phrase on the pre-trained text classifier and identifying individual predictive classification markers based on the scores calculated for each respective at least one test phrase, decomposing, by the computer system, the plurality of extracted features aggregated in the scores into a plurality of word-level scores for each word in the at least one test phrase. The method involves assigning, by the computer system, a separate heat map value to each of the plurality of word level scores, each respective separate heat map value reflecting a weight of each of the plurality of word level scores. The method involves outputting, by a computer system, individual predictive classification labels and each individual heat map value reflecting a weight of each word-level score of the plurality of word-level scores for defining a heat map identifying a contribution of each word of the at least one test phrase to the individual predictive classification labels.

In another embodiment, a computer system includes: one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories. The stored program instructions include: program instructions for decomposing the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase in response to running the at least one test phrase on a pre-trained text classifier and identifying a separate predictive classification marker based on the score calculated for each respective at least one test phrase. The stored program instructions include: program instructions for assigning a separate heat map value to each of the plurality of word level scores, each respective separate heat map value reflecting a weight of each of the plurality of word level scores. The stored program instructions include: program instructions for outputting individual predictive classification labels and each individual heat map value reflecting the weight of each word level score of the plurality of word level scores for defining a heat map identifying the contribution of each word in the at least one test phrase to the individual predictive classification labels.

In another embodiment, a computer program product comprises: a computer readable storage medium having program instructions embodied therein, wherein the computer readable storage medium is not the transitory signal itself. The program instructions are executable by the computer to cause the computer to decompose, by the computer, a plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase in response to running the at least one test phrase on a pre-trained text classifier and identifying a separate predictive classification marker based on the score calculated for each respective at least one test phrase. The program instructions are executable by a computer to cause the computer to: a separate heat map value is assigned by the computer to each of the plurality of word level scores, each respective separate heat map value reflecting a weight of each of the plurality of word level scores. The program instructions are executable by a computer to cause the computer to: each individual heat map value reflecting the weight of each word-level score of the plurality of word-level scores and an individual predictive classification label is output by the computer for defining a heat map identifying the contribution of each word of the at least one test phrase to the individual predictive classification label.

Drawings

The novel features believed characteristic of one or more embodiments of the invention are set forth in the appended claims. However, one or more embodiments of the invention itself will be better understood by reference to the following detailed description of illustrative embodiments when read in conjunction with the accompanying drawings, wherein:

FIG. 1 illustrates one example of a block diagram of a text classifier service for facilitating creation and training of a text classifier that classifies text by labels;

FIG. 2 illustrates one example of a block diagram of a text classifier service for providing information related to text classification anomalies predicted by a text classifier during text classifier testing;

FIG. 3 illustrates one example of a word analysis element evaluated by the word analysis component at the text classifier level;

FIG. 4 shows one example of a table illustrating examples of types of extracted features that are decomposed for use in determining feature scores by word;

FIG. 5 illustrates one example of a word level heat map that reflects a true condition heat map compared to a test heat collection map based on test phrases tested on a trained model;

FIG. 6 illustrates one example of a block diagram of a word level heat map reflecting heat maps of k preferred important words based on labels of test phrases tested on a trained model;

FIG. 7 illustrates an example of a computer system in which an embodiment of the invention may be implemented;

FIG. 8 depicts a high-level logic flowchart of a process and computer program for creating and training a classifier model;

FIG. 9 depicts a high level logic flowchart of a process and computer program for updating a trained classifier model;

FIG. 10 depicts a high level logic flowchart of a process and computer program for analyzing a predicted classification to determine a heat map level at a word level, the heat map level indicating word level contributions to the predicted classification of a test phrase and classification labels of a trained model;

FIG. 11 depicts a high level logic flowchart of a process and computer program for outputting a prediction classification with a visual indicator having an effect on the prediction classification based on a corresponding word level heat map level of a most impact classification label;

FIG. 12 depicts a high level logic flowchart of a process and computer program for outputting a prediction classification with a visual indicator having an effect on the prediction classification based on a list of k preferred words of words focused on the most impact classification label according to the respective k preferred heat map levels; and

FIG. 13 depicts a high-level logic flowchart of a process and computer program for supporting updated training of a text classifier that highlights classification tag training for identified anomalies.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

In addition, in the following description, a plurality of systems are described for illustrative purposes. It will be noted and apparent to those skilled in the art that the present invention may be implemented in a variety of systems, including a variety of computer systems and electronic devices running any number of different types of operating systems.

FIG. 1 illustrates a block diagram of a text classifier service for facilitating creation and training of a text classifier that classifies text by labels.

In one example, machine learning plays an important role in artificial intelligence-based applications that interact with one or more Natural Language Processing (NLP) systems. For example, AI-based applications may include, but are not limited to: speech recognition, natural language processing, audio recognition, visual scene analysis, email filtering, social network filtering, machine translation, data leakage, optical character recognition, rank learning, and bioinformatics. In one example, selection of an AI-based application can involve a computer system that can run in one or more types of computing environments, performing tasks that require one or more types of text classification analysis. In one example, machine learning may represent one or more types of AI that train a machine based on algorithms that learn and predict data from data. One of the main achievements in the process of creating and training a machine learning environment is the data objects (called models) built from sample inputs. In one example, model 112 represents a data object of a machine learning environment.

In one example, to create and train model 112, a user (such as client 120) submits an initial training set, such as real-world training set 108, to text classifier service 110. In one example, the reality training set 108 includes one or more words and multi-word phrases, each of which is identified with a tag of a plurality of classification tags identified by the user for training the model 112. For example, the user may select a label identifying the type of action, such as "on" or "off," and assign the label of "on" or "off" to each selection of a word or multi-word phrase that the customer may enter as a requirement for turning on or off the service, such as "add service" with the phrase of "on" label and "off with the word of" off "label. In one example, the real-world training set 108 may include one or more commercially available training sets. In another example, the reality training set 108 may include one or more user-generated training sets, such as training sets of words or phrases collected from conversational dialogue archives that have been labeled by the user. In another example, the real-world situation training set 108 may include one or more purpose-specific automatic training sets collected and labeled by an automatic training set generation service.

In this example, the text classifier service 110 creates an instance of the model 112 in the text classifier 102 and trains the model 112 by applying the real-world situation training set 108. The text classifier 102 represents an example of a model 112 that is combined with the scorer 104 and trained by the real-world training set 108. In one example, model 112 represents a parametric representation of patterns inferred from real-world training set 108 during the training process. In one example, the text classifier service 110 represents an entity that provides a service for use by a client (such as the client 120) for the client to create and train instances of the model 112 in the text classifier 102. For example, text classifier service 110 represents a cloud service provider for providing text classifier 102 as a service through one or more applications selected by client 120. In another example, the text classifier service 110 represents one or more programming interfaces through which the client 120 invokes specific functions to create instances of the model 112 in the text classifier 102 and invokes specific functions to train the text classifier 102 based on the real world health training set 108. In additional or alternative embodiments, the client 120 may interact with the text classifier 102 through additional or alternative interfaces and connections.

In one example, after training, the client 120 may then test the text classifier 102 before deploying the text classifier 102 for access by one or more client applications for providing text classification services (such as intent classification, semantic analysis, or file classification for dialog systems). During training and after being deployed, a user may submit text to the text classifier 102. In response to text submission, the text classifier 102 predicts a classification tag for the text and returns a predicted classification tag indicating which type of text has been received.

In this example, text classifier 102 may respond to test submissions with incorrectly predicted classification labels when trained by real-world situation training set 108. In one example, when the text classifier 102 makes an incorrect prediction, the incorrect prediction refers to an anomaly in the text classification performed by the text classifier 102.

The text classifier service 110 enables the client 120 to test the text classifier 102 and update the real-world training set 108 for additional training of the text classifier 102 to adjust the predictive accuracy of the text classifier 102 for use by the client. In particular, the client 120 relies on the text classifier 102 to provide accurate classification, however, the accuracy of classification predictions made by specific instances of models in the text classifier 102 may be significantly affected by the distribution of training data in the real-world training set 108 as originally used to train the models 112, as well as by additional training data submitted by the client 120 in response to anomalies detected when testing the text classifier 102. Therefore, it is desirable that the text classifier service 110 also provide the user with information about text classification anomalies beyond incorrectly predicted classification labels to enable the user to efficiently and effectively evaluate corrections to the real-world training set 108 that may be training data patterns in the model 112 of the text classifier 102 to improve prediction accuracy.

Fig. 2 shows a block diagram of a text classifier service for providing information about text classification anomalies predicted by a text classifier during text classifier testing.

In one example, during a testing phase, a testing interface (e.g., testing controller 208) of client 120 submits text 222 to text classifier 102, such as through an application programming interface call to text classifier service 110 or a function call directly to text classifier 102. In this example, text 222 may represent a test sample of one or more words or phrases from test set 220. In this example, text classifier 102 receives text 222 and classifies text 222, predicts a labeled classification of text 222.

In this example, the text classification service provided by the text classifier 102 refers to the following linear classifier process: segmenting text that includes one or more words or other combinations of features, extracting features of the word or words, assigning a weight of the tag to each extracted feature, and combining the weights of the predefined tags of the text to identify a score of the tag. In one example, the text classifier 102 may determine a separate score for each marker in the marker selection and identify the predicted marker from the highest scoring marker. In one example, the tag may identify one or more types of classification, such as, but not limited to, intent of the content of the text. For example, text classified by the text classifier 102 may represent a talk utterance that is converted to text (speech utterance), and the intent label predicted by the text classifier 102 may represent a predicted intent of the utterance based on a highest scoring label for the utterance of the plurality of scoring labels for the utterance.

For example, to classify text, the text classifier 102 implements a scorer 104, which scorer 104 extracts features from words in the presented text. Scorer 104 invokes a function of model 112 to identify weights for the tokens for each extracted feature from text 222. Based on the individually assigned weights of the extracted features, the scorer 104 may call a function of the model 112 to evaluate the classification of the entire text, e.g., a particular intent with a percent probability.

For example, features extracted by text classifier 102 include, but are not limited to: the feature based on the unigram, the feature based on the bigram, the feature based on the part of speech, the feature based on the word (term-based), such as the entity-based feature or the concept-based feature, the average pooling of the word embedding feature, and the maximum pooling of the word embedding feature. For example, in a text phrase of "I are students at [ university A ], a unigram feature may include" I "," YES "," student "," at "," university "and" A ", and a bigram feature may include" university A ".

For example, the text classifier 102 may perform a linear classification of the text, where the ranking score S for each token I is a weighted sum of all extracted features based on, for example, the following equations:

S_I(U)＝f₁(U)w_I1+f₂(U)w_I2…+f_k(U)w_IK+b_I，

Where u=u ₁u₂u₃…u_N is a test example, U _n is a word in the test example, and f _k (U) is an extraction feature. In one example, the extracted features may belong to one or more types of features extracted from certain words or terms inside the text. In this example, w _IK is the model parameters for the kth feature of a given marker I. In this example, b _I reflects a contribution from the filler word, e.g., "bias", which reflects an inherent preference for intent without any input word being considered.

In one example, the text classifier 102 represents a text classification model that may be considered by the client 120 as a black box, where text is applied to the text classifier 102 and classification predictions are output from the text classifier 102, but the trained data patterns and functions applied by the text classifier 102 are not visible to the client 120 that requires text classification services from the text classifier 102. To protect the underlying data objects created in the model 112, the entity deployment model 112 may specify one or more protection layers for allowing use of the functionality of the model 112 at deployment time, but protect the trained data patterns of the data objects of the model 112.

In this example, in response to a submission of text 222 from client 120, text classifier 110 returns classification 224 as determined by text classifier 102. In this example, the classification 224 may include a marker 226. The tokens 226 may include specific classification tokens and may include additional values such as, but not limited to, score probabilities calculated for the classification tokens. Additionally, the indicia 226 may include a plurality of indicia.

In one example, in response to receiving classification 224, test controller 208 compares indicia 226 to expected indicia of text 222. In one example, the test set 220 includes desired indicia of text 222. In one example, if the tag 226 does not match the desired tag of the text 222 in the test set 220, the test controller 208 may trigger an output to the user through the user interface 240 to indicate the anomaly.

In one example, based on the detected anomalies, the user may choose to adjust within user interface 240 the selection of one or more words assigned one or more tokens in training data set 238. The user may select within user interface 240 to ask training set controller 250 of test controller 208 to send training data set 238 to text classifier 102 for additional training (as shown by training set 252) and update real-world training set 108 with training data set 238 to maintain a complete training set for training text classifier 102. In this example, by having the client 120 submit additional training data in the training data set 238, the client 120 may improve the accuracy of the predictions performed by the text classifier 102, however, it is desirable to support a user identification at the user interface 240 of which data to include in the training data set 238 for training the text classifier 102 to likely resolve the anomaly and improve the accuracy of the predictions made by the text classifier 102.

The accuracy of the model 112 in performing text classification may be a function of time values and resources applied in creating, training, evaluating, and debugging the model to train the text classifier 102 to accurately classify text. In one example, the amount and distribution of labeled training data used to train the model 112 can significantly affect the reliability of the model 112 to accurately classify text. Although the client 120 relies on the accuracy of the text classifier 102 as a measure of the quality of the model 112, the quality and performance of the text classifier may vary greatly between models and there is no unified measure of the quality of the text classifier model or a publicly available unified training dataset that yields the same accuracy measure when used to train the models.

In addition, when the text classifier 102 is implemented as a black box provided by the text classifier service 110 and the client 120 receives the classification flag in the classification 224, but the classification flag is incorrect, information needs to be provided to the client 120 as to why the text classifier 102 incorrectly classified the selection of text, while also not disclosing to the client 120 the underlying data objects in the model 112, to enable assessment of what type of training data set is needed for potentially improving the classification accuracy of the text classifier 102. For example, if the client 120 submits text 222 of a phrase to the text classifier service 110, "how do you help me? (how are you going to help me? should be classified as "energy" an incorrect classification of "greeting" is received when force "is applied, it is necessary to provide the client 120 with information about why the text classifier 102 will incorrectly" how you will help me? Information "categorized as" greeting "rather than" capability ".

In particular, in addition to providing the classification marks themselves, information needs to be provided to the client 120 regarding why the text classifier 102 incorrectly classified the selection of text, so that a user monitoring the text classification service received by the client 120 can determine additional training data in the training data set 238 that is sent to the text classifier 102 that is likely to train the text classifier 102 to correctly classify the selection of text. In particular, it is difficult for a user to attempt to determine the cause of an anomaly based only on one or both of the markers 226 in the classification 224 and the training data set submitted by the user in the training data set 238, as multiple combined factors may cause the classification to be anomalous. The first factor is minor variations within training data set 238, and feature commissioning may substantially alter the classification predictions performed by text classifier 102. In particular, the text classifier 102 may be trained to determine the class of text strings based on large scale features (e.g., more than 1000 features) extracted internally from the training instance, where the features used to train the model 112 are transparent to the user and the weights of the features are determined inherently by the training process applied to train the text classifier 102. The second factor is that, based on the selection of labeled training data used to train the model 112 and the limited nature of the selection of labeled training data for a particular domain of topics, the model 112 may be overfitted on some undesired symbols (token) or words for that domain. A third factor is that the different classes of features may make the influence of a particular word on the final decision unclear. For example, some features are vocabulary-based, such as unigram and bigram, and other features are not vocabulary-based but are related to vocabulary, such as word embedding and entity types.

For example, considering the second factor, if the system is based in part on word-level representation features because the words "want" and "need" have similar semantic meanings but different lexical meanings, the training string for the sort intent label with "order" may include a large number of occurrences of the word "want" for other sort intent labels, such that the text classifier 120 may incorrectly predict that the text input of "I need to delete an order" has a sort label of "order" instead of a correct sort label of "delete order". Considering the first and third factors, classification publications are based on a single word, but identifying the particular word that caused the misclassification in "i need to delete order" may be challenging based only on test results, and anomalies may disappear or be reproduced if additional training is performed that adjusts the total number and type of training utterances for training model 112, without the user having to create training data identifying the particular word that produced the anomalies.

According to an advantage of the present invention, an anomaly visualization service is provided to facilitate user understanding of specific words that cause text classification anomalies of the text classifier 102 at the client application level. In particular, according to an advantage of the present invention, the anomaly visualization service performs error analysis of the test set at the text classifier level and provides visual analysis and prompting of information about the errors at the application level, thereby helping the user refine training data set 238 for further training of text classifier 102. In one example, visual analysis and cues may be represented in one or more heat maps, where the heat maps apply one or more colors on one or more intensities, and the colored intensities applied to a word represent the relative weights of the word in its contribution to a particular classification mark.

While the embodiments described herein relate to visual heat map output at a user interface as a graphical representation of data that represents different values using a system of color coding and color weights, in additional or alternative embodiments, visual heat map output may be represented in an output interface by other types of output that may be detected by a user, such as, but not limited to: the tactile output of visual indicators in the visual heat map, the audible output of visual indicators in the visual heat map, and other outputs that enable the user to detect different scoring weights for words. Further, in additional or alternative embodiments, the visual heat map output may be represented by graphically represented values in addition to or instead of colors, where the values represent percentages or other weighted values.

In one example, the anomaly visualization service includes a word analysis component 232 implemented at the classifier level with the text classifier 102, a word level heat map component 234 implemented at the client application level of the client 120, and a word level heat map 236, k preferred word heat maps 242 implemented at the user interface level within the user interface 240, and a training dataset 238. In additional or alternative embodiments, the exception visualization service may include additional or alternative functional and data components.

In one example, word analysis component 232 is implemented at the same layer as text classifier 102 or incorporated in text classifier 102 for calculating one or more heat map values for text 222 and one or more heat map values for classification markers included in real-world situation training set 108. In this example, the classification 224 is updated by the word analysis component 232 with one or more heat map values, which are determined by the word analysis component 232 with the label 226, shown as heat map values 228. In one example, each of the heat map values 228 may represent one or more weighted values (e.g., without limitation, percentages and colors), and may be identified with or correspond to one or more symbols (e.g., words), or may be ordered to correspond to particular words in a sequence.

For example, word analysis component 232 can determine heat map value 228 by: the score calculated for each extracted feature is decomposed into each word or other symbol, and each decomposed score is assigned as a heat map value that directly reflects the contribution of that word to the final score of the intent classification. For example, when model 112 is a trained model, all w _IK are fixed. As previously described, the linear model applied by the model 112 for text classification of text 222 (denoted U) uses a weighted sum of the various features extracted from text f _k (U), and then obtains a ranking score S for each token I, for example, by:

S_I(U)＝f₁(U)W_I1+f₂(U)W_I2…+f_k(U)w_IK+b_I

for all types of features used in the text classifier 102, the word analysis component 232 traces back and determines which words contributed to the extracted features. By accumulating all feature scores belonging to each symbol, word analysis component 232 decomposes f _I (U) into each word, as follows:

S_I(U)＝S′_I(u₁)+S′_I(u₂)...+S′_I(u_N)+b_I

In this example, S' _I(u_N) is used as a heat map value that directly reflects the contribution of the word to the final score of intent I. In particular, in this example, given the test case text 222, the sum of the scores of all words on the heat map is exactly the score used to calculate the tag confidence, so the word level score directly reflects the importance of each word in calculating the final intent tag confidence.

In one example, in response to text 222, word level heat map controller 234 receives classification 224 with labels 226 and heat map values 228 and generates a visual graphical representation of the heat map values for text 222 in word level heat map 236. In one example, the word level heat map controller 234 sequentially applies each percentage or color value in the heat map values 228 to the words or other symbols identified in the text 222. In one example, the word level heat map 236 may reflect different heat map values by different colors assigned to different heat map values, by different chromaticities assigned to different percentages of the heat map values, and by other visually discernable output indicators assigned to different heat map values. In another example, word level heat map 236 may reflect different heat map values through other types of output interfaces, including but not limited to audible and tactile interfaces for adjusting the output level or type to identify different heat map values.

In one embodiment, the word level heatmap 236 is described with reference to the advantage of the present invention in that each word or other symbol in a text sequence is shown in correlation with the predictive markers 226. In another embodiment, the word level heat map 236 may include visual indicators of additional types of relevance, such as visualizing a comparison of the relevance for the predictive markers for each word or symbol in the text sequence with the relevance for the real-world markers for each word or other symbol in the text sequence. In particular, in this example, the word level heat map controller 234 may access a real-world heat map of a sentence related to the text 222 and a desired tag for the sentence, and output a word level heat map 236 with a comparison of the real-world heat map to a visual representation of the heat map generated for the text 222 and the predictive tag 226 based on the heat map value 228. In one example, the text classifier 102 may provide the true condition heat map value in the classification 224. In another example, test controller 208 may store a heat map generated from classification 224 in response to text 222 including real-world situation training set 108. Additionally, the test set 220 may include a user-generated real-world heatmap.

In one example, word level heat map controller 234 initially generates one or more tokens and one or more words in training data set 238 based on analyzing values in word level heat map 236. In one example, a user may manually adjust entities in training data set 238 based on examining word-level heat map 236 and ask training set controller 250 to send training data set 238 to train text classifier 102. In one example, training set controller 250 also updates real-world situation training set 108 with training data set 238 to reflect training data currently used to train model 112 in text classifier 102.

In one example, in addition to analyzing words in text 222, word analysis component 232 also analyzes the weight of each word under each tag identified relative to the intent tested by test set 220. For example, the word analysis component 232 stores the sum of word level scores identified for each word in each intent predicted by the test set 220 in the form of word level scores by the intent 234. Based on the k preferred score words ordered in the word-level score of intent 234 for a particular intent sequence, word-analysis component 232 identifies k preferred important words for the particular intent tag for the particular intent, where k can be set to any value, such as "10". The word analysis component 232 returns the ordered k preferred important words for the particular intent marker 226 in the k preferred heat map list 229 with a sequential ordered list of k preferred score words. In addition, the k preferred heat map lists 229 may include heat map values, such as percentages or colors, assigned to each word in a sequentially ordered list that represents the relative score of each word to other words and that is related to the predicted intent.

In this example, in response to receiving a label 226 having k preferred heat map lists 229, the word level heat map controller 234 generates k preferred word heat maps 242, outputs the label 226 and the k preferred lists and visually highlights each word in the k preferred lists having heat map attributes, such as color and percent chromaticity, to visually indicate the relative score of each word with respect to the predicted intent. According to an advantage of the present invention, the k preferred word heat map 242 provides a visual representation of the weights of words trained for predictive intent to assist the user in visually assessing whether there are words of the k preferred words of the predictive intent label that should be ranked higher or lower for the predictive intent label. Further, the k preferred word heat map 242 provides a visual representation of the weights of words trained for the desired intent to assist the user in visually assessing whether there are words of the k preferred words labeled for the desired intent that should be ranked higher or lower. Within the interface providing a visual representation of the weights of words trained for incorrectly predicted intentions and intended intentions in the k preferred heatmaps 242, the user is also provided with an interface in which the training dataset 238 is selectively modified to increase or decrease the words assigned to the predicted intent markers and intended intent markers.

In accordance with an advantage of the present invention, word level heat map 236 and k preferred word heat maps 242 together provide a visual representation of the particular words most likely to cause anomalies and their semantically corresponding words to the user to facilitate the user's selection of the most likely training text classifier 102 within training data set 238 to improve prediction accuracy. For example, word level heat map 236 visually identifies one or more words that most contribute to the predicted intent in the test string to prompt the user for additional training of the question words in the incorrectly predicted test string, and k preferred word heat maps 242 visually identify response semantic association words associated with the incorrectly predicted tokens and the desired tokens to prompt the user for weights of the question words that require additional training in the incorrectly predicted token training and the desired token training.

According to an advantage of the present invention, by word-level heat map visualization provided via heat map values 228 and k preferred heat map lists 229, exception visualization services provided via functionality, and visual representations provided via word-level heat map component 232, word-level heat map controller 234, word-level heat map 236, and k preferred word heat maps 242, time and effort by a user of text classifier 102 to understand at the word level why text classifier 102 generated a particular tag for a particular test phrase and which words of the test phrase contributed the most to the text classification decision is minimized without the need to disclose underlying data objects of model 112. In this example, the user can review the visualization of the scores of particular words within text 222 that contributed to the tag classification in word-level heat map 236 and effectively determine which words or terms are more relevant to each tag of the test phrase and whether the relationship is correct or reasonable to determine which words require additional training. Additionally, in this example, the user can review the visualization of the score ordering of words related to the particular tokens in the k preferred word heat graphs 242 over multiple test phrases to determine if there are words contributing to the score of the particular token that needs to be adjusted.

In one embodiment, the text classifier 102 represents a linear classifier with arbitrary features such as, but not limited to, a linear Support Vector Machine (SVM), logistic regression, and awareness (perception). In another embodiment, the text classifier 102 may implement a more complex model, such as a deep learning model, however, according to an advantage of the present invention, the functionality of the anomaly visualization service does not require the more complex model environment of the deep learning model, but it is applicable to detect multiple weights applied to different symbols in a text string by a linear classifier. In addition, in one embodiment, the text classifier 102 represents a linear classifier that determines a score based on the sum of the separately weighted scores of the extracted features, and the word analysis component 232 is described with respect to directly decomposing the extracted feature scores that determine the final tag prediction to describe how each word or phrase in the text affects the final tag output, however, in additional or alternative embodiments, the model 112 may also learn additional attention variables that are generated as auxiliary data that may or may not affect the final tag prediction score.

FIG. 3 illustrates a block diagram of one example of a word analysis element evaluated by the word analysis component at the text classifier level.

In this example, as shown by reference numeral 302, all weights are fixed for the trained text classifier model. In one example, in response to text phrase M having three words u1, u2, and u3 (as shown at reference numeral 304), text classifier 102 classifies text phrase M by predictive marker X (as shown at reference numeral 322). In this example, the words u1, u2, and u3 may each represent a single word or a phrase having multiple words. In one example, each of the words u1, u2, and u3 may each be referred to as a symbol.

In this example, to determine the marker score 310 for the predictive marker X, the text classifier 102 sums the weighted scores for each extracted feature. For example, the marker score X310 is the sum of the product of the extracted feature 312 and the weight 314, the product of the extracted feature 316 and the weight 318, and the bias 320. In one example, the text classifier 102 may extract the same number of features as the number of words from the test phrase or may extract fewer or more features than the number of words from the test phrase.

In this example, word analysis component 232 decomposes the extracted feature product used to calculate tag score X310 to determine a feature score by word that together with bias 330 totals tag score X310, shown by feature score 326 by word (u 1), feature score 327 by word (u 2), and feature score 328 by word (u 3). For example, in decomposing the extracted feature product, word analysis component 232 can determine the sum of S _I (U) by the following equation, to recover the original classification score,

Where I represents an intention flag, I represents an extracted feature index, u represents a feature weight, and w is a contribution symbol of each feature. For the multi-symbol feature, the score for each symbol is averaged.

In this example, the word analysis component 232 selects a heat map value for each score by word, as shown at reference numeral 332. For example, the word analysis component 232 assigns a heat map value A344 to the feature score 326 per word (u 1), assigns a heat map value B346 to the feature score 327 per word (u 2), and assigns a heat map value C348 to the feature score 328 per word (u 3). In this example, word analysis 232 outputs a classification (as shown at reference numeral 350) having a label X and heat map values a, B, and C, where the sequential order of heat map values in the classification corresponds to the order of words u1, u2, and u3 in test phrase M.

In this example, for each test phrase in the test set 220, the word analysis component 232 updates the record for tag X in the word level score by intent 234 as shown by tag X sum 360. In this example, the tag X sum 360 includes a total score for each word contributing to the tag X and all scores for the predicted intent of the test set 220, including a total score 364 for word U1 362, a total score 368 for word U2 366, and a total score 372 for word U3 370. In this example, the word level score by intent 234 includes a record of each intent marker detected for the test phrase in test set 220.

In this example, based on the tag X sum 360 over multiple test phrases in the test set 220, as shown at reference numeral 380, the word analysis component 232 ranks the k preferred words of tag X by the total score from one or more test phrases. Next, as shown at reference numeral 382, the word analysis component 232 assigns a heat map value (382) to each of the k preferred words by totaling, and as shown at reference numeral 384, the word analysis component 232 outputs a list of k preferred words with heat map values.

FIG. 4 shows one example of a table illustrating examples of types of extracted features that are decomposed for use in determining feature scores by word.

In one example, the text classifier 102 may support multiple types of feature extraction among any type of features that may be decomposed into words. In one example, the text classifier 102 supports word-level features such as a unigram and a part-of-speech (POS). In another example, the text classifier 102 supports word features, such as entity-based features, concept-and word-based features, binary grammar features, and ternary grammar features. In another example, the text classifier 102 supports letter-level n-gram features. In addition, the text classifier 102 supports maximum (average) pooling of word-embedded features or pre-trained CNN or biLSTM features.

In this example, table 402 shows an example of a feature type extracted from a text string, a symbol applied to the feature type, and an example of a score determined for the symbol. For example, table 402 includes: a column identifying feature type 410, a column identifying contribution symbol 412, and a column identifying score S (U) 414.

In the first example of table 402, for the feature type of the unigram 420, the identified contributing symbol is "me" 422, and the score is assigned as "0.4"424. In one example, the word-wise feature score S' _I(u_N from the feature f _k (U) for the feature type of the unigram 420 may be decomposed according to:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK。

In the second example of table 402, for the feature type of the binary grammar 430, the contributing symbol identifier is "I'm Yes" and the score is assigned to "0.4"434, which is the same as the score assigned to the symbol "I'm" 422. In one example, the per-word feature score S' _I(u_N from the feature f _k (U) for the feature type of the bigram 430, and for any word-based multi-word feature, may be calculated according to:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK/|L|。

in one example, L is the length of the word, so the score of the feature is halved to each of the words. For example, the length of "i am" is "2", so the feature product score for extracting the feature "i am" is halved in "i am" and "yes".

In the third example of table 402, for the feature type of part-of-speech POS-PP 440, the contributing symbol identifier is "from" and the score is assigned to "0.5"444, which is a higher score than the score assigned to the symbols "I" 422 and "I" 424. In one example, the word-wise feature score S' _I(u_N from the feature f _k (U) for the feature type of the part-of-speech preposition phrase (POS-PP) 440 may be determined by using a POS labeler to label the POS tag for each word and then treating the particular POS tag as a feature (contributed by the particular word).

In the fourth example of table 402, for the feature type of entity 450, the contributing symbol identifier is "city name a"452, where "city name a" may identify a particular city name and the score is assigned to "0.7"454, which is a higher score than the score assigned to the previous symbol. In one example, the per-word feature score S' _I(u_N from the feature f _k (U) for the entity 450 feature type, as well as for any other entity-based or concept-based multi-word feature, may be calculated according to:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK/|L|。

In the fifth example of table 402, for the feature type or dimension of average word vector 460, the contribution symbol identifier is "avg-w2v-I"462, which represents the average vector of all word vectors for the words in the sentence, where the average vector has a numerical value. For example, for deep learning, a set of word vectors for vocabulary words (vocabolar words) may be pre-trained by a large corpus (e.g., wiki corpus) and used as a fixed input vector for each vocabulary word. In this example, the score is assigned a "-0.27"464, which is a lower score than the score assigned to the previous symbol. In one example, a per-word feature score S' _I(u_N from the feature f _k (U) for the average word vector 460 feature type may be calculated from an average pooling of the word-embedded features of all U _N, where the score of the feature is proportionally assigned back to each word in the sequence according to the value of each word in the embedded dimension. The average of the word vectors for each word in the sentence is then used to obtain the type of sentence-level feature.

In the sixth example in table 402, for the maximum word vector 470 feature type, the contribution symbol identifier is "max-w2v-I"472 and the score is assigned as "0.45"474. In one example, the per-word feature score S' _I(u_N for a feature f _k (U) from the largest word vector 470 feature type may be calculated from the maximum of the word-embedded features for all U _N, where the score for that feature is assigned back to only one word U _N, which has the maximum in the embedded dimension.

In the seventh example, in table 402, for a feature type of character/letter level feature, such as letter-ternary grammar 480, the contributing symbol identifier is "from" 482, and the score is assigned as "0.4"484. In this example, the word u _N "this (this)" has two letter-ternary grammatical features "thi" and "his", where each feature includes three sequential characters (characters) from the word u _N. In one example, the word-wise feature score S' _I(u_N) from the feature f _k (U) for the letter-to-ternary grammar 480 feature type may be calculated according to:

S′_i(u_N)＝S′_i(u_N)+f_k(tri₁)w_iK+f_k′(tri₁)w_iK′

In this example, k and k' may represent the first and second letter-ternary grammars, respectively.

FIG. 5 illustrates a block diagram of one example of a word level heat map that reflects a true condition heat map as compared to a test heat collection map based on test phrases tested on a trained model.

In one example, word level heat map 236 is shown for selecting test phrases from test set 220 classified by model 112. In this example, for purposes of illustration, FIG. 5 reflects the results of testing three test phrases included in test set 220. In additional or alternative examples, the test set 220 may include additional large numbers of test phrases.

In the first example of fig. 5, the same test phrase "how are you going to help me" shown under text 516 is caused in word-level heat map 236 for training truth 504 and test set prediction 506? (how do you help me. In this example, for the same test phrase, the intent marker 510 is identified as "ability" 512 under training truth 504 and as "greeting" 514 under test set prediction 506. In this example, for "how you will help me? "test phrase, test set prediction 506 indicates the label currently predicted by text classifier 102. For example, the word analysis component 232 determines the category labels "capabilities" and heat map values for words in the text 516 and outputs the labels and heat map values in the category 224.

Word level heat map controller 234 visually identifies (e.g., by a percentage color level) a percentage probability of each word symbol identified in text 516 based on the heat map values returned in classification 224. For purposes of illustration, in this example, the color percentages shown for color 518 are illustrated by color intensity numbers (scale) of 0 to 5, each number in the scale reflecting a different chromaticity or different color that may be applied to each symbolized portion of the text phrase. In this example, "0" in color 518 may reflect no chromaticity and "5" in color level 518 may reflect 100% chromaticity.

For example, for text 516, the true condition intent flag "ability" 512 is shown visually affected by the words "you" and "help" reflecting the highest intensity "4", where the words "you" and "help" indicate ability more than greetings. Instead, the predictive intent "greeting" 514 is shown visually affected by the word "are you (you will)" reflecting the highest chroma "5" and the preceding word "how" reflecting the next highest chroma "3", where the word "how" indicates a greeting more than "you will" is able to do. In this example, by visually displaying the symbol scores as a heatmap predicted for training truth and test set, the user can visually understand that the current system gives the word "you will" rather than "help me" more preference. In this example, the symbol "help me" intuitively relates to how the customer service solves the requestor's problem, to the intent "ability" rather than the customer selecting to go to the greeting customer service system. In this example, by visually displaying the symbol score as a heat map for the prediction, the user may choose to adjust training dataset 238 to include information for the phrase "how you will help me? "additional training to improve the symbolic score of" help "and other semantically corresponding words (e.g.," assist (helped) "and" assist (helps) ") in the same text phrase as" you will "for the intent" ability ". Also, in this example, for "how you will help me? "anomalies in testing, the user can selectively adjust training dataset 238 to reduce the appearance of the phrase" you will "when it appears with" help "in a mispredicted intent" greeting "and to increase its appearance in training for true condition intent" ability ".

In the second example of FIG. 5, the same test phrase "I AM FEELING good thanks" shown under text 546 is visualized in word-level heatmap 236 for training truth 504 and test set predictions 506. In this example, for the same test phrase, the intent indicia 540 is identified as "greetings" 542 under training truth 504 and as "thank you" 544 under test set predictions 506. In this example, the word-level heat map controller 234 visually identifies (e.g., by a percentage color level) a percentage probability of each word symbol identified in the text 546 based on the heat map values returned in the classification 224. For example, for text 546, the real-status intent label "greeting" 542 is shown visually affected by the words "feel" and "good" reflecting the intensities "3" and "4", where the words "feel" and "good" indicate a greeting more than thank you. In contrast, the predictive intent "thank" 544 is shown as being visually affected by the word "thanks" reflecting the highest color "5", where the word "thank" indicates a thank more than a greeting. In this example, by visually displaying the symbol scores as a heat map predicted for training truth and test set, the user can visually understand that the current system gives the word "thank you" rather than "feel very good" more preference. In this example, the symbol "feel good" intuitively relates to how the customer greets the customer service, to the intent "greetings", rather than the customer selecting to thank for the customer service system. In this example, by visually displaying the symbol scores as a heat map for predictions, the user may choose to adjust training dataset 238 to include additional training for the phrase "i feel good thank you," thereby improving the symbol scores of "feel" and "good" and other semantically corresponding words (e.g., "feel" and "good") in the same text phrase as "thank you" for the intent "greeting. Additionally, in this example, for anomalies in the test of "I feel well thank you," the user can selectively adjust training dataset 238 to reduce the appearance of the phrase "thank you" when it appears with "feel" and "well" in the mispredicted intent "thank you" and to increase its appearance in the training of the true status intent "greetings".

In the third example of fig. 5, the same test phrase "dial the home number (dial home number)" shown under text 576 is visualized in word level heat map 236 for training truth 504 and test set predictions 506. In this example, for the same test phrase, the intent flag 570 is identified as "phone" 572 under training truth 504 and as "venue" 574 under test set prediction 506. In this example, word level heat map controller 234 visually identifies (e.g., by a percentage color level) a percentage probability of each word symbol identified in text 576 based on the heat map values returned in classification 224. For example, for text 576, the true condition intent flag "phone" 572 is shown visually affected by the words "dial" and "number" reflecting the intensities "4" and "3", where the words "dial" and "number" indicate phone commands more than place commands. In contrast, the predicted intent "locale" 574 is shown visually affected by the word "home" reflecting the highest chroma "5", where the word "home" indicates the locale command more than the telephone command. In this example, by visually displaying the symbol scores as a heatmap predicted for training truth and test set, the user can visually understand that the current system gives the word "home" rather than "dial" and "number" more preference. In this example, the symbols "dial" and "number" are intuitively related to how the customer requires telephone-related services, to the intent "telephone" rather than the customer selecting the location. In this example, by visually displaying the symbol score as a heat map for the prediction, the user may choose to adjust training dataset 238 to include additional training for the phrase "dial home number" to increase the symbol scores of "dial" and "number" in the same text as "home" for the intent "phone". Additionally, in this example, for anomalies in the test of "make home phone," the user can selectively adjust training dataset 238 to reduce the appearance of the phrase "home" when it appears with "make" and "number" in the mispredicted intent "venue" and to increase its appearance in training the true status intent "phone.

FIG. 6 illustrates a block diagram of one example of a word level heat map reflecting heat maps of k preferred important words based on the labels of test phrases tested on a trained model.

In one example, training set 602 reflects current training data for training model 112 that is "on" for intent. For example, training set 602 includes the phrases "I need more lights," you can turn on the radio, "" click on my lock, "" turn on the headlight, "" turn on my wiper, "" turn on the light, "" lock my door, "" turn off my door, "" play music, "" play some music, "" turn on the radio now, "" turn on my standby camera, "" turn on my lights for me, "" turn on my windshield wiper, "and" turn on A/C. In this example, k preferred important words 610 show a list of words reflected in the training set 302, ordered by importance in the prediction intent "on". In this example, the k preferred importance words 610 are shown in order of importance, with the phrase "open" being the most significant column first and the word "camera" being the least significant column last. In this example, the ordering of the k preferred important words 610 is determined by the word analysis component 232 detecting word level scores per intent 234 while testing the text classifier 102 against the test set 220. In particular, the word analysis component 232 can aggregate scores calculated for heat map values for each word under each intent in a word level score per intent 234 and then determine k preferred aggregate heat map values. In another example, words of the k preferred important words 310 may be colored to visually reflect the importance or aggregate heat map value, with the most important words having the highest percentage of chromaticity and the least important words having the lowest percentage of chromaticity.

In this example, the word "door" 612 may reflect the outlier, with respect to intent "open", "door" ranked higher than expected, because training set 602 includes the phrases "lock my door" and "close my door" as training data for intent classification "open", as shown by reference numeral 604. In this example, a user looking at k preferred important words 610 may look at the word "door" as being more important than expected and adjust the training set 602 by reducing the occurrence of the outlier "door". By reducing occurrences of words in training data set 238 that are labeled as outliers in k preferred significant words 610 and selecting to text classifier 102 with training data set 238 as updated, a user may mitigate potential prediction errors prior to deploying the trained classifier model.

In particular, in this example, while in the example of word-level heat map 236 shown in FIG. 5, the user receives visual assessments of words in the test phrase that contribute most and least to the tag prediction in order to quickly identify problem words in the particular test phrase that caused the tag prediction exception, in the example of k preferred word heat maps 242 shown in FIG. 6, the user receives visual assessments of semantically related words in the training corpus that are likely to cause the particular tag prediction for the test set in order to quickly identify problem words trained for the particular tag.

FIG. 7 illustrates a block diagram of one example of a computer system in which an embodiment of the invention may be implemented. The invention may be implemented in various systems and combinations of systems that are comprised of functional components (e.g., the functional components described with reference to computer system 700) and that are communicatively coupled to a network (e.g., network 702).

Computer system 700 includes a bus 722 or other communication device for communicating information within computer system 700, and at least one hardware processing device, such as processor 712, coupled to bus 722 for processing information. Bus 722 preferably includes low-latency and high-latency paths that are connected by bridges and adapters and controlled within computer system 700 by multiple bus controllers. When implemented as a server or node, computer system 700 may include multiple processors designed to improve network service power.

Processor 712 may be at least one general-purpose processor that processes data under control of software 750 during normal operation, which may include at least one of: application software, an operating system, middleware, and other code and computer-executable programs that are accessible from a dynamic storage device (e.g., random Access Memory (RAM) 714), a static storage device (e.g., read Only Memory (ROM) 716), a data storage device (e.g., mass storage device 718), or other data storage medium. Software 750 may include, but is not limited to: code, applications, protocols, interfaces, and processes for controlling one or more systems within a network, including, but not limited to: adapters, switches, servers, cluster systems, and grid environments.

The computer system 700 may communicate with a remote computer (e.g., server 740) or a remote client. In one example, server 740 may be connected to computer system 700 via any type of network (e.g., network 702) through a communication interface (e.g., network interface 732) or over a network link connectable to, for example, network 702.

In this example, multiple systems within a network environment may be communicatively connected via a network 702, the network 702 being a medium for providing communication links between various devices and computer systems that are communicatively connected. For example, network 702 may include permanent connections (e.g., wire or fiber optic cables) and temporary connections made through telephone connections and wireless transmission connections, and may include routers, switches, gateways, and other hardware to enable communication channels between systems connected via network 702. Network 702 may represent one or more of the following: packet-switched based networks, telephony-based networks, broadcast television networks, local and wire-line area networks, public networks, and constraint networks.

The network 702 and the system communicatively connected to the computer 700 via the network 702 may implement one or more layers of one or more types of network protocol stacks, which may include one or more of a physical layer, a link layer, a network layer, a transport layer, a presentation layer, and an application layer. For example, the network 702 may implement one or more of the following: a transmission control protocol/internet protocol (TCP/IP) protocol stack, or an Open System Interconnection (OSI) protocol stack. In addition, for example, network 702 may represent a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another. The network 702 may implement a secure HTTP protocol layer or other secure protocol for secure communications between systems.

In this example, network interface 732 includes an adapter 734 for connecting computer system 700 to network 702 through a link and for communicatively connecting computer system 700 to server 740 or other computing system via network 702. Although not depicted, network interface 732 may include additional software (e.g., device drivers), additional hardware, and other controllers capable of communication. When implemented as a server, computer system 700 may include multiple communication interfaces accessible via, for example, multiple Peripheral Component Interconnect (PCI) bus bridges connected to an input/output controller. In this manner, computer system 700 allows connections to multiple clients via multiple separate ports, and each port may also support multiple connections to multiple clients.

In one embodiment, the operations performed by the processor 712 may control the operations of the flowcharts of FIGS. 8-13 and other operations described herein. The operations performed by processor 712 may be requested by software 750 or other code, or the steps of one embodiment of the invention may be performed by specific hardware components that contain hardwired logic for performing the steps, or by a combination of programmed computer components and custom hardware components. In one embodiment, one or more components of computer system 700 or other components that may be integrated into one or more components of computer system 700 may include hardwired logic for performing the operations of the flowcharts in fig. 8-13.

In addition, computer system 700 may include a number of peripheral components that facilitate input and output. These peripheral components are connected to a plurality of controllers, adapters, and expansion slots coupled to one of the multiple levels of bus 722, such as input/output (I/O) interface 726. For example, input devices 724 may include a microphone, a camera device, an image scanning system, a keyboard, a mouse, or other input peripheral devices communicatively enabled on bus 722 for controlling inputs, e.g., via I/O interface 726. In addition, for example, output devices 720 communicatively enabled on bus 722 via I/O interface 726 for controlling output may include, for example, one or more graphical display devices, audio speakers, and a tactile detectable output interface, but may also include other output interfaces. In alternative embodiments of the present invention, additional or alternative input and output peripheral components may be added.

With respect to FIG. 7, the present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those of ordinary skill in the art will appreciate that the hardware depicted in FIG. 7 may vary. Furthermore, those skilled in the art will appreciate that the depicted example is not meant to imply architectural limitations with respect to the present invention.

FIG. 8 depicts a high-level logic flowchart of a process and computer program for creating and training a classifier model. In one example, the process and computer program begin at block 800 and then proceed to block 802. Block 802 illustrates a determination of whether a request to create a trained model is received from a client. At block 802, if a request to create a trained model is received from a client, the process proceeds to block 804. Block 804 illustrates a determination of whether a user selected dataset is received. At block 804, if a user selected data set is received, the process passes to block 808. At block 804, if the user selected data set is not received, the process passes to block 806. Block 806 shows selecting a default training set of data and the process proceeds to block 808.

Block 808 shows applying the selected training set of data to the model as a training set of real conditions to create a trained model. Next, block 810 shows returning the trained model indicator to the client and the process ends.

FIG. 9 depicts a high-level logic flowchart of a process and computer program for updating a trained classifier model. In one example, the process and computer program begin at block 900 and then proceed to block 902. Block 902 illustrates a determination of whether to receive an updated training set of data from a client for a trained model. At block 902, if an updated training set of data is received from the client for the trained model, the process proceeds to block 904. Block 904 illustrates training to update the classifier model with the updated training set of data. Next, block 906 shows returning the trained model indicator to the client and the process ends.

FIG. 10 depicts a high level logic flowchart of a process and computer program for analyzing a predicted classification to determine a heat map level at a word level that indicates word level contributions to the predicted classification of a test phrase and classification labels of a trained model.

In one example, the process and computer program begin at block 1000 and then proceed to block 1002. Block 1002 illustrates a determination of whether to receive a test set from a client for testing a trained model. At block 1002, if a test set is received from the client for testing the trained model, the process proceeds to block 1004. Block 1004 illustrates running a test set on the trained model. Next, block 1006 illustrates identifying predicted category labels and scores for each test set phrase in the test set. Thereafter, block 1008 illustrates decomposing the extracted features aggregated in the tag score into word-level scores for each word in each test set phrase. Next, block 1010 illustrates assigning a heat map value to each word level score for each word in each test set phrase. Thereafter, block 1012 illustrates storing the assigned heat map values by test set phrase and tag. Next, block 1014 illustrates aggregating the word-by-word level scores for each tag predicted by the test set. Thereafter, block 1016 illustrates identifying k preferred words for each tag in descending order based on the aggregate word-by-word level score for each tag. Next, block 1018 shows assigning heat map values based on word level scores to k preferred words in each tag list. Thereafter, block 1020 illustrates returning a predicted class mark to the client and a corresponding heat map value ordered by test set phrase and k preferred words with heat map values for the predicted class mark.

FIG. 11 depicts a high level logical flowchart of a process and computer program for outputting a prediction classification with a visual indicator having an impact on the prediction classification based on a corresponding word level heat map level of a most impact classification marker.

In one example, the process and computer program begin at block 1100 and then proceed to block 1102. Block 1102 illustrates a determination of whether predicted class labels and word level heat map values per test set phrase have been received from the text classifier. At block 1102, if a predicted class label and word level heat map value per test set phrase are received from the text classifier, the process proceeds to block 1104. Block 1104 illustrates aligning the classification marks and heat map values ordered by test set phrase with the corresponding test set phrase in the submitted test set. Next, block 1106 illustrates accessing (if available) the true condition heat map value estimates and the expected classification marks associated with each test set phrase in the submitted test set. Thereafter, block 1108 illustrates identifying the selection of the submitted test set phrase with a returned classification flag that does not match the expected flag of the test set phrase, which indicates an anomaly. Next, block 1110 illustrates outputting a graphical representation of the selection of the submitted test phrase with the returned classification tags and visual indicators at the word level in the user interface based on the respective word level heat map values as compared to the visual indicators at the word level based on any respective real-condition heat map values and real-condition classification tags, and the process ends.

FIG. 12 depicts a high level logic flowchart of a process and computer program for outputting a prediction classification with a visual indicator having an effect on the prediction classification based on a list of k preferred words of words focused on the most impact classification labels according to the respective k preferred heat map levels.

In one example, the process and computer program begin at block 1200 and then proceed to block 1202. Block 1202 illustrates determining whether one or more predicted k-preferred word lists having k-preferred heat map values by category are received from a text classifier. At block 1202, if k preferred word lists having one or more predictions of k preferred heat map values by category are received from the text classifier, the process proceeds to block 1204. Block 1204 illustrates identifying a training set of classification marks corresponding to each k list of preferred words and heat map values for the classification marks. Next, block 1206 depicts a determination of whether k preferred word lists having word level heat map values per test set phrase are received.

At block 1206, if a list of k preferred words with word level heat map values per test set phrase is received, then the process passes to block 1208. Block 1208 illustrates identifying a selection of submitted test set phrases with returned classification marks that do not match the true condition classification marks and corresponding selections of the true condition classification marks and the returned classification marks. Next, block 1210 illustrates outputting a graphical representation of the selection of k lists of preferred words with a visual indicator in the user interface based on each respective heat map value for selecting a respective real-world situation classification marker and returned classification marker, and the process ends.

Returning to block 1206, if no k preferred word lists having word level heat map values per test set phrase are received, the process passes to block 1212. Next, block 1212 illustrates outputting a graphical representation of one or more k lists of preferred words in the user interface with a visual indicator based on each respective heat map value, and the process ends.

FIG. 13 depicts a high-level logic flowchart of a process and computer program for supporting update training of a text classifier that highlights classification tag training for identified anomalies.

In one example, the process and computer program begin at block 1300 and then proceed to block 1302. Block 1302 illustrates displaying an editable training set in a user interface for additional training of a trained model. Next, block 1304 depicts visually highlighting within the editable training set one or more classification marker pairs identified as true condition classification markers and predictive markers for the identified anomalies. Thereafter, block 1306 illustrates determining whether the user has selected to edit the training set and send the training set to the text classifier. At block 1306, if the user chooses to edit the training set and send the training set to the text classifier, the process proceeds to block 1308. Block 1308 shows sending a training set to the text classifier and a request to update training of the text classifier with the training set, and the process ends.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more embodiments of the invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

While the invention has been particularly shown and described with reference to one or more embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method comprising the steps of:

running at least one test phrase on a pre-trained text classifier, wherein the test phrase comprises one or more words;

Identifying individual predictive classification tags based on the intended score calculated by the text classifier for the test phrase;

Decomposing, by the computer system, the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the test phrase;

assigning, by the computer system, a separate heat map value to each word-level score of the plurality of word-level scores, each respective separate heat map value reflecting a weight of each word-level score of the plurality of word-level scores; and

Outputting, by the computer system, the individual predictive classification tags and each individual heat map value reflecting the weight of each word-level score of the plurality of word-level scores to a client for defining a heat map identifying the contribution of each word in the test phrase to the individual predictive classification tags, wherein the client determines whether each individual predictive classification tag matches a desired classification tag for the client to evaluate text classification anomalies.

2. The method of claim 1, further comprising the step of:

In response to running the test phrase, aggregating, by the computer system, the plurality of word-level scores for each individual predictive classification mark of a plurality of classification marks;

Identifying, by the computer system, a list of preferred words of the plurality of words in descending order from a highest word-wise aggregated score for each individual predictive classification label; and

Outputting, by the computer system, the individual predictive classification labels, each individual heat map value, and the list of preferred words for each respective individual predictive classification label.

3. The method of claim 1, further comprising the step of:

a score for the individual predictive classification label is calculated by the computer system based on a weighted sum of a plurality of combinations of individual extracted features of the plurality of features and weighted model parameters fixed in the pre-trained text classifier.

4. The method of claim 1, wherein decomposing, by the computer system, the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the test phrase further comprises:

decomposing, by the computer system, the plurality of extracted features including one or more of: the term-based features, the average pooling of word-embedded features, the maximum pooling of word-embedded features, and character-level features.

5. The method of claim 1, further comprising the step of:

initiating, by the computer system, a text classifier model;

Training, by the computer system, the text classifier model by applying a training set having a plurality of training phrases;

deploying, by the computer system, the text classifier model as the pre-trained text classifier for client testing; and

In response to receiving the at least one test phrase from the client, running the at least one test phrase on the pre-trained text classifier by the computer system.

6. The method of claim 1, wherein the client outputs each individual heat map value in a user interface for graphically representing the weight of each word-level score to identify a contribution of each word of the at least one test phrase to the individual predictive classification mark.

7. A computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising:

Program code operable to perform the steps of the method according to any one of claims 1 to 6.

8. A computer program product comprising a computer readable storage medium having program instructions embodied therein, the program instructions being executable by a computer to cause the computer to perform the steps in the method according to any one of claims 1 to 6.

9. A computer system comprising means for performing the steps of the method according to any one of claims 1 to 6.

10. A method comprising the steps of:

submitting, by the computer system, a plurality of test phrases, each test phrase comprising a plurality of words, the text classifier being trained to predict a label for each submitted test phrase, the text classifier being trained to calculate a word-level score for the label based on extracted features of each respective word within each submitted test phrase and assign a heat map value to reflect a relative weight of each word-level score;

Receiving, by the computer system, a plurality of classification labels from the text classifier, each classification label including one or more respective heat map values, each heat map value associated with a separate word in a separate test phrase of the plurality of test phrases, each heat map value of the one or more respective heat map values reflecting a respective relative weight of a respective word-level score of a respective extracted feature in the separate word;

Aligning, by the computer system, each of the plurality of classification marks and one or more respective heat map values with a respective test phrase of the plurality of test phrases;

accessing, by the computer system, a desired classification flag and a separate true condition heat map value evaluation associated with each respective test phrase of the plurality of test phrases;

Identifying, by the computer system, one or more anomalies of selection of one or more classification tags of the plurality of classification tags, the one or more classification tags being different from the desired classification tag of each respective test phrase of the plurality of test phrases; and

Outputting, by the computer system, in a user interface, the selected graphical representation having one or more respective test phrases and one or more classification marks based on one or more respective heat map values as compared to one or more visual indicators at word level based on the separate real-world heat map value evaluations and desired classification marks to identify a contribution of each word to the respective classification mark.

11. The method of claim 10, wherein the step of submitting, by the computer system, the plurality of test phrases to the text classifier further comprises the steps of:

Submitting, by the computer system, a plurality of test phrases to the text classifier trained by applying a training set having the plurality of test phrases.

12. The method of claim 10, wherein the step of receiving, by the computer system, a plurality of classification marks from the text classifier, each classification mark comprising one or more heat map values, each heat map value associated with a separate word, further comprises the step of:

the method further includes receiving, by the computer system, the plurality of classification tags from the text classifier, each classification tag including the one or more heat map values, each heat map value associated with a separate word from the text classifier, wherein, in response to running at least one test phrase on a pre-trained text classifier and identifying a separate predictive classification tag based on a score calculated for each respective at least one test phrase, the text classifier decomposes a plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase, and assigning a separate heat map value to each word-level score in the plurality of word-level scores, each respective separate heat map value reflecting a weight of each word-level score in the plurality of word-level scores.

13. The method of claim 10, further comprising the step of:

Displaying, by the computer system, an editable training set having one or more training phrases within the user interface; and

The selection of one or more classification marks identified as the one or more anomalies is visually highlighted within the editable training set by the computer system.

14. The method of claim 10, further comprising the step of:

Receiving, by the computer system, a list of preferred words of a plurality of words and each individual predictive classification mark from the text classifier, wherein the list of preferred words is in descending order from a highest word-by-word score based on a word-by-word aggregate word level score; and

Outputting, by the computer system, each individual predictive categorization label, the one or more corresponding heat map values, and the list of preferred words for each respective individual predictive categorization label in the user interface.

15. A computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising:

program code operable to perform the steps of the method according to any of claims 10 to 14.

16. A computer program product comprising a computer readable storage medium having program instructions embodied therein, the program instructions being executable by a computer to cause the computer to perform the steps in the method according to any one of claims 10 to 14.

17. A computer system comprising means for performing the steps of the method according to any one of claims 10 to 14.