US11657271B2 - Game-theoretic frameworks for deep neural network rationalization - Google Patents
Game-theoretic frameworks for deep neural network rationalization Download PDFInfo
- Publication number
- US11657271B2 US11657271B2 US16/658,122 US201916658122A US11657271B2 US 11657271 B2 US11657271 B2 US 11657271B2 US 201916658122 A US201916658122 A US 201916658122A US 11657271 B2 US11657271 B2 US 11657271B2
- Authority
- US
- United States
- Prior art keywords
- network
- class
- data
- generator
- output label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G06N3/0454—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G06N3/0445—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present disclosure generally relates to natural language processing (NLP), and more particularly, to training NLP applications.
- NLP natural language processing
- neural architectures can be designed that expose more intricate mechanisms of reasoning such as module networks. While salient, such approaches may still involve adopting specialized designs and architectural choices that do not yet reach accuracies comparable to Blackbox approaches.
- limited architectural constraints may be applied in the form of selective rationalization where the goal is to only expose the portion of the text relevant for prediction.
- a computing device for implementing a class-wise adversarial rationalization (CAR) system for determining an output label rationale.
- a first generator network receives a first class of data and selects one or more input features from the first class of data.
- a first predictor network receives the one or more selected input features from the first generator network and predicts a first output label based on the received one or more selected input features from the first generator network.
- a second generator network receives a second class of data and selects one or more input features from the second class of data.
- a second predictor network receives the one or more selected input features from the second generator network and predicts a second output label based on the received one or more selected input features from the second generator network.
- a discriminator network receives the first and second output labels and determines whether the selected one or more input features from the first class of data or the selected features of the one or more input features from the second class of data, more accurately represents the first output label.
- the first class of data represents a first rationale (R1) consistent with the first output label.
- the second class of data represents a second rationale (R2) consistent with the second output label.
- the second output label is a complement of the first output label.
- the first and second class of data are each in natural language.
- the first generator network is trained on a subset of the first class of data.
- the second generator network is trained on a subset of the second class of data.
- the first output label is inconsistent with the output label of the second output label.
- the first generator network is configured to play a cooperative game with the discriminator network.
- the second generator network may be configured to play an adversarial game with the discriminator network.
- the cooperative game between the first generator network and the discriminator network can maximize a predictive accuracy of the one or more input features selected by the first generator network to predict the first output label.
- the adversarial game between the second generator network and the discriminator network can be operative for the second generator network to attempt to convince the discriminator network that the features selected by the second generator network from the second class of data are consistent with the first output label.
- the discriminator network is operative to provide a scaled result based on the selected one or more input features from the first class of data and the selected one or more input features from the second class of data.
- the first generator network and the second generator network are each configured to convince the discriminator network that that they are factual generators for the first output label.
- FIG. 1 A illustrates a selective rationalization system that includes a corpus of data comprising a plurality of input features.
- FIG. 1 B illustrates the selective rationalization system of FIG. 1 A , where the generator network has a collaborative relationship with a predictor network.
- FIG. 2 A illustrates an architecture of a two-player system.
- FIG. 2 B is an architecture of a three-player system, consistent with an illustrative embodiment.
- FIG. 3 is a block diagram of a selective rationalization system having a three-player model, consistent with an illustrative embodiment.
- FIG. 4 A illustrates a three-player model, consistent with an illustrative embodiment.
- FIG. 4 B is a block diagram of a three-player model having an introspective generator, consistent with an illustrative embodiment.
- FIG. 5 illustrates a block diagram of a training framework and an inference framework, consistent with an illustrative embodiment.
- FIG. 6 illustrates how Class-wise Adversarial Rationalization (CAR) works in a bag of word scenario, consistent with an illustrative embodiment.
- CAR Class-wise Adversarial Rationalization
- FIG. 8 provides a functional block diagram illustration of a computer hardware platform that can be used to implement a computing device that is particularly configured to train a natural language processing network.
- the present disclosure generally relates to systems and computerized methods of selective rationalization.
- selective rationalization is increasingly used to facilitate that predictive models reveal how they use any available features.
- the selection may be soft or hard, and identifies a subset of input features relevant for prediction.
- the setup can be viewed as a cooperate game between the selector (sometimes referred to herein as a rationale generator) and a predictor making use of only the selected features.
- the co-operative setting may, however, be compromised for two main reasons. First, the generator typically has no direct access to the outcome it aims to justify, resulting in poor performance. Second, there typically is no control exerted on the information left outside the selection.
- the teachings herein revise the overall co-operative framework to address these challenges.
- an introspective model is introduced that explicitly predicts and includes the outcome into the selection process.
- the rationale complement is controlled via an adversary so as not to leave any useful information out of the selection.
- the teachings herein demonstrate that the two complementary mechanisms maintain both high predictive accuracy and lead to comprehensive rationales. In this way, the machine learning model is made more introspective and computationally accurate.
- the selection process discussed herein can be described as a cooperative game between a generator and a predictor operating on a selected, partial input text.
- the two players aim for the shared goal of achieving high predictive accuracy, operating within the confines imposed by rationale selection (e.g., a small, concise portion of input text).
- rationale selection e.g., a small, concise portion of input text.
- the rationales are learned in an unsupervised manner, without guidance other than their size and form.
- Table 1 above provides the rationales extracted by different models on the sentiment analysis of beer reviews.
- a motivation for the teachings herein arises from the potential failures of cooperative selection. Since a generator typically has no direct access to the outcome it aims to justify, the learning process may converge to a poorly performing solution. Moreover, since only the selected portion is evaluated for its information value (via a predictor), there is typically no explicit control over the remaining portion of the text left outside the rationale.
- the clues in text classification tasks are typically short phrases.
- diverse textual inputs offer a plethora of such clues that may be difficult to disentangle in a way that generalizes to evaluation data.
- the generator may fail to disentangle the information about the correct label, offering misleading rationales instead.
- the collaborative nature of the interaction between a generator network and a predictor network sometimes referred to herein as a “game,” may enable the players to select a sub-optimal communication code that does not generalize, but overfits the training data. This concept is discussed in more detail in the context of the discussion of FIG. 1 B below.
- the teachings herein address these concerns by an introspective rationale generator.
- the idea includes to force the generator to explicitly understand what to generate rationales for. More specifically, the output label of a selective rationalization system is predicted with the a more complete selection of the input features of a corpus of input data, thereby ensuring better overall performance.
- FIGS. 1 A and 1 B illustrate a selective rationalization system 100 A that includes a corpus of data 102 comprising a plurality of input features.
- the corpus of data 102 may be in natural language.
- the system 100 A includes two players, a generator 104 A and a prediction network 108 A.
- the generator 104 A is operative to receive the corpus of data 102 and extract therefrom what it deems to be relevant input features that are salient in determining an output label 110 A. Stated differently, the generator 104 A is operative to select a subset of input features 106 A that is predictive of an output label 110 A that characterizes (e.g., classifies) the corpus of data 102 .
- FIG. 1 A illustrate a selective rationalization system 100 A that includes a corpus of data 102 comprising a plurality of input features.
- the corpus of data 102 may be in natural language.
- the system 100 A includes two players, a generator 104 A and a prediction network 108 A.
- the actually selected sub features 106 A are used by the predictor 108 A to predict an appropriate classification 110 A, sometimes referred to herein as an output label.
- the selected subset of input features 106 A is used by the system 100 A to perform sentiment analysis on the corpus of data 102 such that an appropriate classification thereof is achieved, represented by output label 110 A.
- the output label 110 A is a prediction, such as a classification.
- the classification may be binary (positive/negative; yes/no; etc.,) or any other type of classification into a predetermined set of classes.
- the conclusion of the predictor 108 A is that the corpus of data 102 has a “negative” sentiment of the wine captured in the corpus of data 102 .
- FIG. 1 A is provided by way of example only and not by way of limitation. Indeed, different types of corpus of data 102 , such as key performance indicators (KPIs) of a networked system, medical information, as well as other systems, are contemplated by the teachings herein as well.
- KPIs key performance indicators
- the selective rationalization system 100 A of FIG. 1 A is operative to select a subset of input features 106 A that is most predictive of the output label 110 A.
- the problem is that such selective rationalization systems 100 A may suffer from degeneration, where the appropriate subset of input features is not selected, which may not be readily evident if a correct result is achieved, because neural networks are often treated as a “Black-box.” Accordingly, degeneration involves the generator 104 A collaborating with the prediction network 108 A to guess an output label 110 A and develop its own code to communicate with the predictor network (e.g., comma, period, etc.).
- the predictor network e.g., comma, period, etc.
- 1 B illustrates a scenario where the generator 104 B incorrectly selects a comma as a subset of input features 106 B, which is used by the predictor 108 B to identify a classification 110 B. Although a correct classification is achieved of the corpus of data 102 , the quality of the analysis is not optimal because it is not based on a valid subset of input features 106 B.
- the cooperative game system 100 B of FIG. 1 B has two players: (i) a generator 104 B and (ii) a predictor 108 B. It does not explicitly control the information left out of the rationale 106 B. As a result, it is possible for the rationales to degenerate as including only select words without the appropriate context. With access to the predicted label as input, the generator 104 B and the predictor 104 B can find a communication scheme by encoding the predicted label with special word patterns (e.g., highlight “.” for positive examples and “,” negative ones). Stated differently, the generator 104 B is in collaboration with the predictor 108 B to provide a predicted output label 110 B. Table 1 shows such cases for the two cooperative methods, where degeneration has occurred.
- FIGS. 2 A and 2 B illustrates an enhanced architecture 200 B that includes an additional player with respect to the selective rationalization system 200 A of FIG. 2 A .
- the selective rationalization system 200 A includes a generator network 204 A and a collaborative predictor network 208 B
- the architecture 200 B in addition to a first predictor network 208 B there is a second predictor network 218 , referred to herein as a complementary predictor network 218 .
- FIG. 2 A which is a two-player system
- architecture 200 B is a three-player system.
- the first predictor network 208 B receives subset of input features r selected by the generator network 204 A and predicts an output label y based on r.
- the added third adversarial player namely the complementary predictor network 218 , sometimes referred to herein as the discriminator, is able to guide the cooperative communication between the generator network 204 A and the first predictor network 208 B.
- the goal of the discriminator 218 is to attempt to predict the correct label using only words left out of the rationale.
- the generator aims to fool the discriminator while still maintaining high accuracy for the predictor. This ensures that the selected rationale includes substantially all/most of the input features salient to the target label y, leaving out irrelevant input features.
- the number of input features identified by the generator network is limited to accommodate the computational capability of a computing device performing the calculations. For example, to improve computational speed, the number of input features selected may be confined to a predetermined number, based on the computational capability of the computing device performing the calculations.
- one or more computing platforms performing the three-player selective rationalization system discussed herein may be implemented by virtual computing devices in the form of virtual machines or software containers that are hosted in the cloud, thereby providing an elastic architecture for processing and storage. In this way, the number of input features identified by the generator is expanded or even removed.
- the equilibrium of the three-player architecture 200 B provides improved properties for the extracted rationales. Moreover, the three-player framework facilitates cooperative games such to improve both predictive accuracy and rationale quality. In one aspect, by combining the two approaches of an introspective generator and a three-player architecture, high predictive accuracy is achieved by the computing device, as well as non-degenerate rationales.
- FIG. 3 illustrates a selective rationalization system 300 having a three-player model, consistent with an exemplary embodiment.
- System 300 can be used as an example to better explain how the three-player model operates in improving the quality of the selection of the generator network 304 .
- the generator network 304 receives the corpus of data 302 and predicts the output label 310 . By way of efficiency, it selects input features 306 (e.g., commas in the present example, which may be found in most communications). For example, these commas communicate to the predictor network 308 what the output label 310 should be. In another scenario, the generator 304 may select one or more periods as input features to communicate to the predictor network 308 that the output label 310 should be “positive” instead. Thus, instead of selecting meaningful input features, the generator network 304 simply colludes or collaborates with the predictor network 306 to advance a predicted output label 310 .
- input features 306 e.g., commas in the present example, which may be found in most communications.
- the complementary predictor network 320 uses unselected input features 318 from the corpus of data 302 and comes to the same “correct” prediction, represented by output label 322 .
- the unselected words 318 are sufficient for the complementary predictor 320 to achieve the same result as the first output label 310 .
- the fact that the first output label and the second output label are substantially similar indicates that additional input features from the corpus of data 302 should be selected by the generator network 304 for the first predictor network 308 .
- the output label of the complementary predictor network 322 should be as opposite to the output label of the first predictor network 310 as possible.
- the “worse” the prediction of the complementary predictor network 320 the more accurate the generator network 304 .
- the output of the complementary predictor network 318 may be ambiguous or inconclusive, thereby indicating that it has not extracted any meaningful input features from the corpus of data 302 . That is because all the meaningful features have been selected by the generator network 304 .
- the iterative process ends when all meaningful information is selected by the generator and the complementary predictor provides a result having a confidence level that is below a predetermined threshold (e.g., F rating on a scale of A to F).
- a predetermined threshold e.g., F rating on a scale of A to F.
- the target application here is text classification on data tokens in the form of ⁇ (X, Y) ⁇ .
- the expression Y is denoted as a label.
- R c representing the unselected features of the generator network 204 A, does not include sufficient information to predict Y, as provided by the expression below: H ( Y
- the sufficiency condition of equation 5 above is the core one of a legitimate rationale, which essentially stipulates that the rationale includes all the relevant information in X to predict Y.
- the compactness condition of equation 7 above stipulates that the rationale should be continuous and should not include more words than necessary. For example, without the compactness condition, a trivial solution to equation 5 above would be X itself.
- the first inequality in equation 7 includes the sparsity of rationale, and the second one includes the continuity.
- the comprehensiveness condition of equation 6 is discussed in more detail later.
- degeneration refers to the situation where, rather than finding words (i.e., input features) in the input corpus of data X that explains the output label Y, the generator 204 A R attempts to encode the probability of Y using trivial information, e.g. punctuation and position.
- any previous cooperative framework may suffer from the above problem, if the generator has the potential to accurately guess Y with sufficient accuracy.
- This problem occurs because there is no control of the input features unselected by R.
- some key predictors in X will be left unselected by R.
- architecture 200 B can determine if degeneration occurs. Specifically, when degeneration is present, a substantial portion of the input features are left unselected by R. Accordingly, H(Y
- the selective rationalization system 300 includes three players: (i) a rationale generator, sometimes referred to herein as generator network 304 , which generates the rationale R (represented by block 306 ) and its complement R c (represented by block 318 ) from a corpus of data (e.g., text) 302 ; (ii) a predictor network that predicts the probability of Y based on R, a complementary predictor 320 that predicts the probability of Y based on R c .
- generator network 304 which generates the rationale R (represented by block 306 ) and its complement R c (represented by block 318 ) from a corpus of data (e.g., text) 302 ;
- a predictor network that predicts the probability of Y based on R
- a complementary predictor 320 that predicts the probability of Y based on R c .
- FIG. 4 A illustrates a three-player model 400 A, consistent with an illustrative embodiment.
- the three-player model 400 A introduces an additional complementary predictor 420 that uses reinforcement learning between the generator network 402 and the complementary predictor network 420 , in addition to the cooperative game between the generator network 402 and the predictor network 404 .
- reinforcement learning discussed herein may use machine learning to determine which input features of the input corpus of data X to select to provide to the predictor network, and which complementary input features to select to provide to the complementary predictor network 420 , such that the prediction of the complementary predictor is as bad (e.g., inaccurate) as possible, thereby improving the quality of the selection of the generator network 402 .
- a minimax algorithm is used between the generator network 402 and the complementary predictor network 420 . The iterative process continues until the prediction of the complementary predictor, is as inaccurate as possible or the number of input features selected by the generator network reaches a predetermined threshold.
- the predictor network 404 estimates a probability of Y conditioned on R, denoted as ⁇ circumflex over (p) ⁇ (Y
- the complementary predictor estimates probability of Y conditioned on R c , denoted as ⁇ circumflex over (p) ⁇ c (Y
- both predictors are trained using the cross-entropy loss, provided by the expressions below:
- the generator network 402 extracts R and R c by generating the rationale mask, z( ⁇ ), as shown above in equations 2-3. More specifically, z( ⁇ ) is determined by minimizing the weighted combination of four losses:
- Equation 10 above stipulates the comprehensiveness property of the rationale (Eq. 6). If the complement rationale is less informative of Y than the rationale, then L c should be larger than L p .
- Equation 8 above indicates that the generator network 402 plays a cooperative game with the predictor network 404 , because both try to maximize the predictive performance of R.
- the generator network 402 plays an adversarial game with the complementary predictor network 420 , because the latter tries to maximize the predictive performance of R c , but the former tries to reduce it.
- the three players perform gradient descent steps with respect to their own losses.
- the regular gradient descent algorithm is not applied. Instead a policy gradient is used to optimize the models.
- the negative losses Lp and Lc are replaced with accuracy.
- FIG. 4 B is a block diagram of a three-player model 400 B having an introspective generator, consistent with an illustrative embodiment.
- the introspective generator 460 includes a generator network 452 that explicitly predicts a label 460 before making rationale selections (e.g., R and R c ).
- the improved generator 460 still fits into the basic three-player framework discussed in the context of FIG. 4 A .
- the main difference being how the generator generates the mask z(X), which now breaks down into two steps.
- the generator network 452 uses a regular classifier 454 that takes the input X and predicts the label, denoted ⁇ tilde over (y) ⁇ (X).
- classification tasks, the maximum likelihood estimate is used, as provided by equation 12 below:
- ⁇ tilde over (y) ⁇ is a function of X
- the introspective generator 460 is essentially a function of X.
- the classifier 454 can use the same architecture as that of the predictor network 404 and the complementary predictor network 420 .
- the introspection generator 460 may make the degeneration problem more severe: when the classifier ⁇ tilde over (p) ⁇ ( ⁇
- bidirectional Long short-term memory are used with hidden dimension 400 .
- the classifier 454 comprises the same bidirectional LSTM, and z(X, ⁇ tilde over (y) ⁇ ) is implemented as an LSTM sequential labeler with the label ⁇ tilde over (y) ⁇ transformed to an embedding vector that serves as the initial hidden states of the LSTM.
- the relative position features are added.
- the relative position features are mapped to learnable embedding vectors and concatenated with word embeddings as the inputs to the LSTM encoder of each player. All hyper-parameters are tuned on the development sets according to predictive accuracy. Stated differently, all the models are tuned without seeing any rationale annotations.
- selection of input features can be used to highlight how complex neural predictors operate.
- the selection can be optimized post-hoc for trained models or included directly into the method itself.
- an overall selection of input features may not properly capture the multi-faceted nature of useful rationales such as pros and cons for decisions.
- the teachings herein provide a game theoretic approach to class-dependent rationalization, where the computing device performing the algorithm is specifically trained to highlight evidence supporting alternative conclusions.
- class-wise rationales approach discussed herein which is based on multiple sets of rationales that respectively explain support for different output classes (or decisions).
- class-wise rationalization takes a candidate outcome as input, which can be different from the ground-truth class labels, and uncovers rationales specifically for the given class.
- C AR Class-wise Adversarial Rationalization
- C AR comprises three types of players: (i) a factual rationale generator, which generates rationales that are consistent with the actual label, (ii) a counterfactual rationale generator, which generates rationales that counter the actual label, and (iii) a discriminator, which discriminates between factual and counterfactual rationales.
- a factual rationale generator which generates rationales that are consistent with the actual label
- a counterfactual rationale generator which generates rationales that counter the actual label
- a discriminator which discriminates between factual and counterfactual rationales.
- Both factual and counterfactual rationale generators try to competitively “convince” the discriminator network that they are factual, resulting in an adversarial game between the counterfactual generators and the other two types of players.
- the discussion below explains how the C AR game drives towards meaningful class-wise rationalization, under an information-theoretic metric, which is a class-wise generalization of the maximum mutual information criterion.
- the class-wise rationalization problem can be formulated as follows. For any input X, which is a random vector representing a string of text, the goal is to derive a class-wise rationale Z(t) for any t ⁇ such that Z(t) provides evidence supporting class t.
- Each rationale can be understood as a masked version X, i.e. X with a subset of its words masked away by a special value (e.g., 0).
- Y ⁇ 0, 1 ⁇ ).
- CAR can uncover class-wise rationales using adversarial learning, inspired by outlining pros and cons for decisions.
- the two functional generators generate rationales that justify class t when the actual label agrees with t, and two counterfactual rationale generators, provided by the expression below: g t c (X), t ⁇ 0, 1 ⁇ (Eq. 15)
- the two counterfactual rationale generators generate rationales for the label other than the ground truth.
- FIG. 5 illustrates a block diagram of a training framework 500 and an inference framework 540 and 560 during inference, consistent with an illustrative embodiment.
- the training framework 500 includes a corpus of data 502 from which a factual generator 504 selects a subset of input features in support of its rationale of an output (e.g., sentiment).
- the subset of input features selected by the generator network 506 are provided to a discriminator network 410 .
- the counterfactual generator 524 selects a subset of input features 522 in support of its rationale of a complementary output, represented by block 526 . Both sets of rationales are presented to a discriminator network 510 , operative to discern which position prevails.
- the discriminator is operative to provide a mixed result.
- the discriminator network 510 may provide a more nuanced rating, such as alpha-numeric (e.g., 0 to 10, A to F), descriptive (e.g., none, low, medium, and high), based on color (e.g., red, green, and yellow), or any other suitable rating scale.
- alpha-numeric e.g., 0 to 10, A to F
- descriptive e.g., none, low, medium, and high
- color e.g., red, green, and yellow
- the rating of a hotel may be evaluated as 3 ⁇ 5 stars based on the factual rationale 506 in view of the counterfactual rationale 526 .
- the discriminator network d 0 ( ⁇ ) takes a rationale Z generated by either g 0 f ( ⁇ ) or g 0 c ( ⁇ ) as input, and outputs the probability that Z is generated by the factual generator g 0 f ( ⁇ ).
- the training target for d 0 ( ⁇ ) is based on a generative adversarial network (GAN), provided by the expression below:
- xh 0 ⁇ ( x x + a ) ⁇ ⁇ is ⁇ ⁇ convex ⁇ ⁇ in ⁇ ⁇ x
- ⁇ ⁇ ⁇ xh 1 ⁇ ( a x + a ) ⁇ ⁇ is ⁇ ⁇ concave ⁇ ⁇ in ⁇ ⁇ x , ⁇ x , a ⁇ [ 0 , 1 ] ( Eq . ⁇ 18 )
- FIG. 5 summarizes the training procedure of these three players.
- the counter factual generator 524 (g c ( ⁇ )) plays a game with both d 0 ( ⁇ ) and g 0 f ( ⁇ ), because it tries to trick the discriminator network 510 (d 0 ( ⁇ )) into misclassifying its output as factual, whereas g 0 f ( ⁇ ) helps d 0 ( ⁇ ) make the correct decision, as illustrated by inference framework 540 .
- X can be formulated as an N-dimensional binary vector.
- the rationales Z 0 f and Z 0 c are also multivariate binary vectors.
- FIG. 6 illustrates how CAR works in a bag of word scenario, consistent with an illustrative embodiment.
- Plot 600 of FIG. 2 illustrates p X i
- the words to the left satisfy p X i
- the words to the right are called class-1 words.
- Plot 600 of FIG. 6 also illustrates an example of p z 0,i f
- the goal of the factual generator is to help the discriminator. Therefore, its optimal strategy, given the optimized counterfactual generator, is to “steer” the factual rationale distribution away from the counterfactual rationale distribution. Recall that the counterfactual rationale distribution tries to match the factual rationale distribution, unless its upper-bound is binding. The factual generator will therefore choose the words whose factual upper-bound is higher than the counterfactual upper-bound. These words are, by definition, most indicative of class 0. The counterfactual generator will also favor the same set of words, due to its incentive to match the distributions.
- Plot 640 of FIG. 2 illustrates the optimal strategy for the factual rationale under sparsity constraint:
- the left-hand side in equation 22 represents the expected factual rationale length (in number of words). It also represents the area under the p z 0,i f
- parameter sharing is imposed among the players. Such sharing is motivated by the fact that both the factual and counterfactual generators adopt the same rationalization strategy upon reaching the equilibrium. Therefore, instead of having two separate networks for the two generators, one unified generator network is introduced for each class, a class-0 generator and a class-1 generator, with the ground truth label Y as an additional input to identify between factual and counterfactual modes. Parameter sharing may also be imposed between the two discriminators by introducing a unified discriminator, with an additional input t that helps to identify between the class-0 and class-1 cases. Both the generators and the discriminators include a word embedding layer, a bi-direction LSTM layer followed by a linear projection layer.
- the generators generate the rationales by the independent selection process.
- the convolutional layer outputs a quantized binary mask Sk, which equals to 1 if the k-th word is selected and 0 otherwise.
- the binary masks are multiplied with the corresponding words to produce the rationales.
- the outputs of all the times are max-pooled to produce the factual/counterfactual decision.
- the training objectives are essentially equations 16 and 17 above.
- the main difference is that the constrained optimization in equation 17 is transformed into a multiplier form.
- the multiplier terms or the regularization terms are as follows:
- the first term constrains on the sparsity of the rationale. It encourages that the percentage of the words being selected as rationales is close to a preset level ⁇ .
- the second term constrains on the continuity of the rationale.
- ⁇ 1 , ⁇ 2 , and ⁇ are hyperparameters.
- the h 0 ( ⁇ ) and h 1 ( ⁇ ) functions in equation 17 are both set to linear function, which empirically shows good convergence performance, and which can be shown to satisfy equation 18.
- a straight-through gradient computation technique is applied.
- the training scheme involves the following alternate stochastic gradient descent.
- FIG. 7 presents an illustrative process related to implementing a CAR system for determining a rationale in a natural language processing system.
- Process 700 is illustrated as a collection of blocks in a logical flowchart, which represent sequence of operations that can be implemented in hardware, software, or a combination thereof.
- the blocks represent computer-executable instructions that, when executed by one or more processors, perform the recited operations.
- computer-executable instructions may include routines, programs, objects, components, data structures, and the like that perform functions or implement abstract data types.
- FIG. 8 illustrates an example process of implementing a class-wise adversarial rationalization (CAR) system for determining a rationale in a natural processing system.
- a first generator network receives a first class of data and selects one or more input features from the first class of data.
- the first class of data represents a first rationale (R 1 ) consistent with a first output label.
- a first predictor network receives the one or more selected input features from the first generator network and predicts a first output label based on the received one or more selected input features from the first generator network.
- a second generator network receives a second class of data and selects one or more input features from the second class of data.
- the second class of data represents a second rationale (R 2 ) consistent with a second output label.
- the second output label is a complement of the first output label.
- the first and second class of data may be in natural language.
- a second predictor network receives the one or more selected input features from the second generator network and predicts a second output label based on the received one or more selected input features from the second generator network.
- a discriminator network receiving the first output label and the second output label and determines whether the selected one or more input features from the first class of data or the selected features one or more input features from the second class of data more accurately represents the first output label. In this way, the rationale for the output label, as well as the complement output label, can be determined.
- FIG. 8 provides a functional block diagram illustration of a computer hardware platform 800 that can be used to implement a computing device that is particularly configured to train a natural language processing network.
- FIG. 8 illustrates a network or host computer platform 800 , as may be used to implement an appropriately configured computing device to host a CAR engine as discussed herein.
- the computer platform 800 may include a central processing unit (CPU) 804 , a hard disk drive (HDD) 806 , random access memory (RAM) and/or read only memory (ROM) 808 , a keyboard 810 , a mouse 812 , a display 814 , and a communication interface 816 , which are connected to a system bus 802 .
- CPU central processing unit
- HDD hard disk drive
- RAM random access memory
- ROM read only memory
- the HDD 806 has capabilities that include storing a program that can execute various processes, such as the CAR engine 840 , in a manner described herein.
- the CAR engine 840 may have various modules configured to perform different functions. For example, there may be an interaction module 842 that is operative to receive data from various sources over a network, wherein the data can be used by the CAR engine to perform analysis thereon.
- first generator module 842 operative to receive a first class of data and select one or more input features from the first class of data.
- first predictor module 846 operative to receive the one or more selected input features from the first generator module 844 and predict a first output label based on the received one or more selected input features from the first generator module 842 .
- second generator module 850 operative to receive a second class of data and select one or more input features from the second class of data.
- second predictor module 852 operative to receive the one or more selected input features from the second generator module 850 and predict a second output label based on the received one or more selected input features from the second generator module 850 .
- discriminator module 848 operative to receive the first and second output labels and to determine whether the selected one or more input features from the first class of data or the selected features of the one or more input features from the second class of data, more accurately represents the first output label.
- a program such as ApacheTM, can be stored for operating the system as a Web server.
- the HDD 806 can store an executing application that includes one or more library software modules, such as those for the JavaTM Runtime Environment program for realizing a JVM (JavaTM virtual machine).
- These computer readable program instructions may be provided to a processor of an appropriately configured computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the blocks may occur out of the order noted in the Figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
| TABLE 1 |
| Label: negative |
| Original Text: really cloudy, lots of sediment, washed out yellow color. |
| looks pretty gross, actually, like swamp water. no head, no lacing. |
| Rationale from first example model: |
| [“really cloudy lots”, “yellow”, “no”, “no”] |
| Rationale from cooperative introspection model: [“. looks”, “no”, “no”] |
| Rationale from Introspective model: |
| [“cloudy”, “lots”, “pretty gross”, “no lacing”] |
rc=x with r removed (Eq. 1)
r i(X)= i(X)·X i, (Eq. 2)
-
- Where, i∈{0, 1}N is a binary mask.
r i c(X)=(1− i(X))·X i. (Eq. 3)
p Y(⋅|R)=p Y(⋅|X). (Eq. 5)
H(Y|R c)≥H(Y|R)+h, (Eq. 6)
-
- Where h is a constant.
-
- Where s and c are constants.
-
- Where,
- H(p;q) denotes the cross entropy between p and q; and
- p(⋅|⋅) denotes the empirical distribution.
- Where,
-
- Where Lg encourages the gap between Lp and Lc to be large, as provided by the expression below.
L g=max{Lp−L c +h, 0}. (Eq. 10)
- Where Lg encourages the gap between Lp and Lc to be large, as provided by the expression below.
-
- Where {tilde over (p)}(Y=y|X) is the predicted probability by maximizing the cross entropy, which is pre-trained.
(X)=(X, {tilde over (y)}(X)) (Eq. 13)
gt f(X), t∈{0, 1} (Eq. 14)
gt c(X), t∈{0, 1} (Eq. 15)
-
- Where,
- ω0(⋅) and ω1(⋅) represent multiple regularization constraints such as sparsity and continuity, and
- h0(⋅) and h1(⋅) are both monotonically-increasing functions that satisfy the following properties:
- Where,
p z
-
- Where K denotes the number of words in the input text.
Claims (25)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/658,122 US11657271B2 (en) | 2019-10-20 | 2019-10-20 | Game-theoretic frameworks for deep neural network rationalization |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/658,122 US11657271B2 (en) | 2019-10-20 | 2019-10-20 | Game-theoretic frameworks for deep neural network rationalization |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20210117772A1 US20210117772A1 (en) | 2021-04-22 |
| US11657271B2 true US11657271B2 (en) | 2023-05-23 |
Family
ID=75492088
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/658,122 Active 2041-11-28 US11657271B2 (en) | 2019-10-20 | 2019-10-20 | Game-theoretic frameworks for deep neural network rationalization |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US11657271B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12254388B2 (en) * | 2020-10-27 | 2025-03-18 | Accenture Global Solutions Limited | Generation of counterfactual explanations using artificial intelligence and machine learning techniques |
| US20240256638A1 (en) * | 2023-01-27 | 2024-08-01 | Intuit Inc. | Synthetic data creation using counterfactuals |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190108448A1 (en) | 2017-10-09 | 2019-04-11 | VAIX Limited | Artificial intelligence framework |
| US20190122146A1 (en) | 2017-10-23 | 2019-04-25 | Artificial Intelligence Foundation, Inc. | Dynamic and Intuitive Aggregation of a Training Dataset |
| US20190130221A1 (en) * | 2017-11-02 | 2019-05-02 | Royal Bank Of Canada | Method and device for generative adversarial network training |
| US20190130266A1 (en) | 2017-10-27 | 2019-05-02 | Royal Bank Of Canada | System and method for improved neural network training |
| US20190138847A1 (en) | 2017-11-06 | 2019-05-09 | Google Llc | Computing Systems with Modularized Infrastructure for Training Generative Adversarial Networks |
| US20190189115A1 (en) | 2017-12-15 | 2019-06-20 | Mitsubishi Electric Research Laboratories, Inc. | Method and Apparatus for Open-Vocabulary End-to-End Speech Recognition |
| US20190236139A1 (en) | 2018-01-31 | 2019-08-01 | Jungle Disk, L.L.C. | Natural language generation using pinned text and multiple discriminators |
| US20190266442A1 (en) | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Tunable generative adversarial networks |
| US20200279288A1 (en) | 2019-03-01 | 2020-09-03 | Mastercard International Incorporated | Deep learning systems and methods in artificial intelligence |
| US20200380339A1 (en) | 2019-05-29 | 2020-12-03 | Genentech, Inc. | Integrated neural networks for determining protocol configurations |
| US20210383067A1 (en) | 2020-06-03 | 2021-12-09 | Sap Se | Data-driven structure extraction from text documents |
-
2019
- 2019-10-20 US US16/658,122 patent/US11657271B2/en active Active
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190108448A1 (en) | 2017-10-09 | 2019-04-11 | VAIX Limited | Artificial intelligence framework |
| US20190122146A1 (en) | 2017-10-23 | 2019-04-25 | Artificial Intelligence Foundation, Inc. | Dynamic and Intuitive Aggregation of a Training Dataset |
| US20190130266A1 (en) | 2017-10-27 | 2019-05-02 | Royal Bank Of Canada | System and method for improved neural network training |
| US20190130221A1 (en) * | 2017-11-02 | 2019-05-02 | Royal Bank Of Canada | Method and device for generative adversarial network training |
| US20190138847A1 (en) | 2017-11-06 | 2019-05-09 | Google Llc | Computing Systems with Modularized Infrastructure for Training Generative Adversarial Networks |
| US20190189115A1 (en) | 2017-12-15 | 2019-06-20 | Mitsubishi Electric Research Laboratories, Inc. | Method and Apparatus for Open-Vocabulary End-to-End Speech Recognition |
| US20190236139A1 (en) | 2018-01-31 | 2019-08-01 | Jungle Disk, L.L.C. | Natural language generation using pinned text and multiple discriminators |
| US20190266442A1 (en) | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Tunable generative adversarial networks |
| US20200279288A1 (en) | 2019-03-01 | 2020-09-03 | Mastercard International Incorporated | Deep learning systems and methods in artificial intelligence |
| US20200380339A1 (en) | 2019-05-29 | 2020-12-03 | Genentech, Inc. | Integrated neural networks for determining protocol configurations |
| US20210383067A1 (en) | 2020-06-03 | 2021-12-09 | Sap Se | Data-driven structure extraction from text documents |
Non-Patent Citations (35)
| Title |
|---|
| Alvarez-Melis, D. et al., "Towards Robust Interpretability with Self-Explaining Neural Networks"; arXiv preprint arXiv:1806.07538; (2018); 10 pgs. |
| Andreas, J. et al., "Learning To Compose Neural Networks For Question Answering"; In Proceedings of NAACL-HLT (2016) 10 pgs. |
| Andreas, J. et al., "Neural Module Networks"; In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016); 10 pgs. |
| Arumae, K. et al., "Guiding Extractive Summarization with Question-Answering Rewards"; Proceedings of NAACL-HLT (2019); pp. 2566-2577. |
| Bastings, J. et al., "Interpretable Neural Predictions with Differentiable Binary Variables"; arXiv preprint arXiv: 1905.08160 (2019); 15 pgs. |
| Busoniu, L. "Multi-Agent Reinforcement Learning: A Survey"; In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (2006), Singapore, 7 pgs. |
| Chen, et al. "Can Rationalization Improve Robustness?," arXiv:2204.1790V2 (Year: 2022). * |
| Chen, J. et al., "Learning To Explain: An Information-Theoretic Perspective On Model Interpretation"; In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden (2018); 10 pgs. |
| Chen, J. et al., "L-Shapley And C-Shapley: Efficient Model Interpretation For Structured Data"; arXiv preprint arXiv:1808.02610 (2018); 17 pgs. |
| Goodfellow, I. et al., "Generative Adversarial Nets"; In Advances in neural information processing systems (2014), arXiv:1406.2661v1 [stat.ML] Jun. 10, 2014, 9 pgs. |
| Hendrickx, I. et al., "SemEval-2010 Task 8: Multi-Way Classification Of Semantic Relations Between Pairs Of Nominals"; In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (2009); 7 pgs. |
| Johnson, J. et al., "Inferring And Executing Programs For Visual Reasoning"; In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2989-2998 (2017); arXiv: 1705.03633v1 [cs.CV] May 10, 2017; 13 pgs. |
| Lee, G. et al., "Game-Theoretic Interpretability for Temporal Modeling"; arXiv preprint arXiv.1807.00130 (2018); 5 pgs. |
| Lei, T. et al., "Rationalizing Neural Predictions"; In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016); 11 pgs. |
| Li, J. et al., "Understanding Neural Networks Through Representation Erasure"; arXiv preprint arXiv:1612.08220 (2017); 18 pgs. |
| Li, J. et al., "Visualizing and Understanding Neural Models In NLP"; In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies pp. 681-691 (2016); arXiv:1506.01066v2 [cs.CL] Jan. 8, 2016; 10 pgs. |
| List of IBM Patents or Applications Treated as Related. (2 pgs.). |
| Lundberg, S. M. et al., "A Unified Approach to Interpreting Model Predictions"; 31st Conference on Neural Information Processing Systems (NIPS 2017); arXiv:1705 07874v2 [cs Al] Nov. 25, 2017; 10 pgs. |
| Mcauley, J. et al., "Learning Attitudes and Attributes from Multi-Aspect Reviews"; arXiv:1210.3926v2 [cs.CL] Oct. 31, 2012, 11 pgs. |
| Mullick, S. S. et al., "Generative Adversarial Minority Oversampling"; arXiv:1903.09730v1 [cs.CV] Mar. 22, 2019; 10 pgs. |
| Nguyen, T. H. et al., "Combining Neural Networks and Log-Linear Models to Improve Relation Extraction"; arXiv preprint arXiv:1511.05926 (2015); 7 pgs. |
| Ribeiro, M. T., et al., "Why Should I Trust You? Explaining the Predictions of Any Classifier"; In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM (2016); 10 pgs. |
| Sha, et al. "Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration," Association for the Advancement of Artificial Intelligence, 2020. (Year: 2020). * |
| Silver, D. et al., "Mastering Chess and Shogi by Self-Play with A General Reinforcement Learning Algorithm"; arXiv preprint arXiv:1712.01815 (2017); 19 pgs. |
| Simonyan, K. et al., "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps"; arXiv preprint arXiv:1312.6034v2 [cs.CV] Apr. 19, 2014; 8 pgs. |
| Srivastava, N. et al., "Dropout: A Simple Way To Prevent Neural Networks from Overfitting"; The Journal of Machine Learning Research (2014); vol. 15; pp. 1929-1958. |
| Strumbelj, E. et al., "An Efficient Explanation of Individual Classifications Using Game Theory"; Journal of Machine Learning Research (2010) vol. 11; pp. 1-18. |
| Sundararajan, M. et al., "Axiomatic Attribution for Deep Networks"; In Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia; PMLR 70; Xiv: 1703 01365v2 [cs.LG] Jun. 13, 2017; 11 pgs. |
| Williams, R. J., "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning"; Machine Learning (1992); vol. 8; pp. 229-256. |
| Yala, A. et al., "A Deep Learning Mammography-Based Model for Improved Breast Cancer Risk Prediction"; Radiology (2019); 7 pgs. |
| Yu, M et al., "Learning Corresponded Rationales for Text Matching"; Under review as a conference paper at ICLR 2019; (2018); 12 pgs. |
| Yu, M. et al., "Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control"; https://people.csail.mit.edu/tommi/papers/YCZJ_EMNLP2019.pdf; (2019) 13 pgs. |
| Zaidan, O. et al., "Using Annotator Rationales to Improve Machine Learning for Text Categorization"; Proceedings of NAACL HLT 2007, Rochester, NY, Apr. 2007; Association for Computational Linguistics (2007); pp. 260-267. |
| Zhang, Y. et al., "Aspect-Augmented Adversarial Networks for Domain Adaptation"; Transactions of the Association for Computational Linguistics (2017); vol. 5; pp. 515-528. |
| Zhao, M. et al., "Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture"; Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70; (2017); 10 pgs. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20210117772A1 (en) | 2021-04-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11551000B2 (en) | Introspective extraction and complement control | |
| US11062179B2 (en) | Method and device for generative adversarial network training | |
| CN113011529B (en) | Training method, training device, training equipment and training equipment for text classification model and readable storage medium | |
| Persello et al. | Active and semisupervised learning for the classification of remote sensing images | |
| EP3754549A1 (en) | A computer vision method for recognizing an object category in a digital image | |
| US20240046128A1 (en) | Dynamic causal discovery in imitation learning | |
| US9159021B2 (en) | Performing multistep prediction using spatial and temporal memory system | |
| US11972335B2 (en) | System and method for improving classification in adversarial machine learning | |
| US20230334330A1 (en) | Automated creation of tiny deep learning models based on multi-objective reward function | |
| US20230077528A1 (en) | Method of Generating Conversation Information Using Examplar-Based Generation Model and Apparatus for the Same | |
| US20240152760A1 (en) | Method, apparatus, device and medium for training and applying a contrastive learning model | |
| Sun et al. | Multiple-view multiple-learner semi-supervised learning | |
| US11657271B2 (en) | Game-theoretic frameworks for deep neural network rationalization | |
| US20250272477A1 (en) | Machine learning large language model ensemble deployment in content summarization | |
| Zhang et al. | Integration of an improved dynamic ensemble selection approach to enhance one-vs-one scheme | |
| CN116341564A (en) | Problem reasoning method and device based on semantic understanding | |
| Bai et al. | CLR-DRNets: Curriculum learning with restarts to solve visual combinatorial games | |
| Dalla et al. | Automated SAT problem feature extraction using convolutional autoencoders | |
| CN118395955A (en) | Large language model fine tuning method, sample feature completion method and device | |
| Džeroski et al. | Machine learning, ensemble methods in | |
| KR102676342B1 (en) | Method and apparatus for automatically recommeding learning data based on machine learning | |
| Ashtekar et al. | Class Incremental Learning from First Principles: A Review | |
| Peck et al. | Calibrated multi-probabilistic prediction as a defense against adversarial attacks | |
| Fu et al. | A Theoretical Survey on Foundation Models | |
| US20250245485A1 (en) | User insights using deep generative foundation models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| AS | Assignment |
Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JAAKKOLA, TOMMI S.;REEL/FRAME:061949/0174 Effective date: 20221130 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHANG, SHIYU;YU, MO;ZHANG, YANG;REEL/FRAME:061949/0164 Effective date: 20191021 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |