US20230325676A1 - Active learning via a sample consistency assessment - Google Patents
Active learning via a sample consistency assessment Download PDFInfo
- Publication number
- US20230325676A1 US20230325676A1 US18/333,998 US202318333998A US2023325676A1 US 20230325676 A1 US20230325676 A1 US 20230325676A1 US 202318333998 A US202318333998 A US 202318333998A US 2023325676 A1 US2023325676 A1 US 2023325676A1
- Authority
- US
- United States
- Prior art keywords
- training samples
- unlabeled training
- unlabeled
- cross entropy
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
- G06F7/24—Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/091—Active learning
Definitions
- This disclosure relates to active learning such as active learning using a sample consistency assessment.
- supervised machine learning models require large amounts of labeled training data in order to accurately predict results.
- labeling the data is frequently very difficult. That is, labeling vast quantities of data is often inordinately expensive if not outright impossible.
- active learning is a popular type of machine learning that allows for the prioritization of unlabeled data in order to train a model only on data that will have the highest impact (i.e., the greatest increase in accuracy).
- an active learning algorithm is first trained on a small sub-set of labeled data and then may actively query a teacher to label select unlabeled training samples. The process of selecting the unlabeled training samples is an active field of study.
- the method includes obtaining, by data processing hardware, a set of unlabeled training samples. During each of a plurality of active learning cycles and for each unlabeled training sample in the set of unlabeled training samples, the method includes perturbing, by the data processing hardware, the unlabeled training sample to generate an augmented training sample. The method also includes, generating, by the data processing hardware, using the machine learning model configured to receive the unlabeled training sample and the augmented training sample as inputs, a predicted label for the unlabeled training sample and a predicted label for the augmented training sample and determining, by the data processing hardware, an inconsistency value for the unlabeled training sample.
- the inconsistency value represents variance between the predicted label for the unlabeled training sample and the predicted label for the augmented training sample.
- the method also includes sorting, by the data processing hardware, the unlabeled training samples in the set of unlabeled training samples in descending order based on the inconsistency values and obtaining, by the data processing hardware, for each unlabeled training sample in a threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples, a ground truth label.
- the method includes selecting, by the data processing hardware, a current set of labeled training samples.
- the current set of labeled training samples includes each unlabeled training sample in the threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples paired with the corresponding obtained ground truth label.
- the method also includes training, by the data processing hardware, using the current set of labeled training samples and a proper subset of unlabeled training samples from the set of unlabeled training samples, the machine learning model.
- Implementations of the disclosure may include one or more of the following optional features.
- the threshold number of unlabeled training samples is less than a cardinality of the set of unlabeled training samples.
- the inconsistency value for each unlabeled training sample in the threshold number of unlabeled training samples may be greater than the inconsistency value for each unlabeled training sample not selected from the sorted unlabeled training samples in the set of unlabeled training samples.
- the method further includes obtaining, by the data processing hardware, the proper subset of unlabeled training samples from the set of unlabeled training samples by removing the threshold number of unlabeled training samples from the set of unlabeled training samples.
- the method may further include selecting, by the data processing hardware, a first M number of unlabeled training samples from the sorted unlabeled training samples in the set of unlabeled training samples as the threshold number of unlabeled training samples.
- the method further includes, during an initial active learning cycle, randomly selecting, by the data processing hardware, a random set of unlabeled training samples from the set of unlabeled training samples and obtaining, by the data processing hardware, corresponding ground truth labels for each unlabeled training sample in the random set of unlabeled training samples.
- the method may also further include training, by the data processing hardware, using the random set of unlabeled training samples and the corresponding ground truth labels, the machine learning model. This example may include, during the initial active learning cycle, identifying, by the data processing hardware, a candidate set of unlabeled training samples from the set of unlabeled training samples.
- a cardinality of the candidate set of unlabeled training samples may be less than a cardinality of the set of unlabeled training samples.
- the method may also further include determining, by the data processing hardware, a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples and determining, by the data processing hardware, a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples.
- the method may also further include determining, by the data processing hardware, whether the first cross entropy is greater than or equal to the second cross entropy and, when the first cross entropy is greater than or equal to the second cross entropy, selecting, by the data processing hardware, the candidate set of unlabeled training samples as a starting size for initially training the machine learning model. Identifying the candidate set of unlabeled training samples from the set of unlabeled training samples, in some implementations, includes determining the inconsistency value for each unlabeled training sample of the set of unlabeled training samples.
- the method may further include, when the first cross entropy is less than the second cross entropy, randomly selecting, by the data processing hardware, an expanded set of training samples from the unlabeled set of training samples and updating, by the data processing hardware, the candidate set of unlabeled training samples to include the expanded set of training samples randomly selected from the unlabeled set of training samples.
- the method may also further include updating, by the data processing hardware, the unlabeled set of training samples by removing each training sample from the expanded set of training samples from the unlabeled set of training samples.
- the method may also further include determining, by the data processing hardware, the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the training samples in the updated candidate set of unlabeled training samples and determining, by the data processing hardware, the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the training samples in the updated candidate set of unlabeled training samples.
- the method may also further include determining, by the data processing hardware, whether the first cross entropy is greater than or equal to the second cross entropy.
- the method may further include selecting, by the data processing hardware, the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
- the machine learning model includes a convolutional neural network.
- the memory hardware stores instructions that when executed on the data processing hardware cause the data processing hardware to perform operations.
- the operations include obtaining a set of unlabeled training samples. During each of a plurality of active learning cycles and for each unlabeled training sample in the set of unlabeled training samples, the operations include perturbing the unlabeled training sample to generate an augmented training sample.
- the operations also include, generating, using the machine learning model configured to receive the unlabeled training sample and the augmented training sample as inputs, a predicted label for the unlabeled training sample and a predicted label for the augmented training sample and determining an inconsistency value for the unlabeled training sample.
- the inconsistency value represents variance between the predicted label for the unlabeled training sample and the predicted label for the augmented training sample.
- the operations also include sorting the unlabeled training samples in the set of unlabeled training samples in descending order based on the inconsistency values and obtaining, for each unlabeled training sample in a threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples, a ground truth label.
- the operations include selecting a current set of labeled training samples.
- the threshold number of unlabeled training samples is less than a cardinality of the set of unlabeled training samples.
- the inconsistency value for each unlabeled training sample in the threshold number of unlabeled training samples may be greater than the inconsistency value for each unlabeled training sample not selected from the sorted unlabeled training samples in the set of unlabeled training samples.
- the operations further include obtaining the proper subset of unlabeled training samples from the set of unlabeled training samples by removing the threshold number of unlabeled training samples from the set of unlabeled training samples.
- the operations may further include selecting a first M number of unlabeled training samples from the sorted unlabeled training samples in the set of unlabeled training samples as the threshold number of unlabeled training samples.
- the operations further include, during an initial active learning cycle, randomly selecting a random set of unlabeled training samples from the set of unlabeled training samples and obtaining corresponding ground truth labels for each unlabeled training sample in the random set of unlabeled training samples.
- the operations may also further include training, using the random set of unlabeled training samples and the corresponding ground truth labels, the machine learning model. This example may include, during the initial active learning cycle, identifying a candidate set of unlabeled training samples from the set of unlabeled training samples.
- a cardinality of the candidate set of unlabeled training samples may be less than a cardinality of the set of unlabeled training samples.
- the operations may also further include determining a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples and determining a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples.
- the operations may also further include determining whether the first cross entropy is greater than or equal to the second cross entropy and, when the first cross entropy is greater than or equal to the second cross entropy, selecting the candidate set of unlabeled training samples as a starting size for initially training the machine learning model. Identifying the candidate set of unlabeled training samples from the set of unlabeled training samples, in some implementations, includes determining the inconsistency value for each unlabeled training sample of the set of unlabeled training samples.
- the operations may further include, when the first cross entropy is less than the second cross entropy, randomly selecting an expanded set of training samples from the unlabeled set of training samples and updating the candidate set of unlabeled training samples to include the expanded set of training samples randomly selected from the unlabeled set of training samples.
- the operations may also further include updating the unlabeled set of training samples by removing each training sample from the expanded set of training samples from the unlabeled set of training samples.
- the operations may also further include determining the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the training samples in the updated candidate set of unlabeled training samples and determining the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the training samples in the updated candidate set of unlabeled training samples.
- the operations may also further include determining whether the first cross entropy is greater than or equal to the second cross entropy. When the first cross entropy is greater than or equal to the second cross entropy, the operations may further include selecting the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
- the machine learning model includes a convolutional neural network.
- FIG. 1 is a schematic view of an example system for training an active learning model.
- FIG. 2 is a schematic view of example components of the system of FIG. 1 .
- FIGS. 3 A- 3 C are schematic views of components for determining an initial starting size of labeled training samples.
- FIG. 4 is a flowchart of an example arrangement of operations for a method of active learning via a sample consistency assessment.
- FIG. 5 is a schematic view of an example computing device that may be used to implement the systems and methods described herein.
- training data is labeled by human operators.
- an expert annotator e.g., a trained human
- active learning the model is allowed to proactively select a subset of training samples from a set of unlabeled training samples and request the subset be labeled from an “oracle,” e.g., an expert annotator or any other entity that may accurately label the selected samples (i.e., the “ground truth” label). That is, active learning modules dynamically pose queries during training to actively select which samples to train on. Active learning has the potential to greatly reduce the overhead of labeling data while simultaneously increasing accuracy with substantially less labeled training samples.
- selection methods typically depend on outputs and/or intermediate features of the target model to measure unlabeled samples. For example, a method may use entropy of the output to measure uncertainty. Another method may ensure that selected samples cover a large range of diversity. Yet another method may use predicted loss to attempt to select the most valuable samples.
- CNN convolutional neural networks
- Implementations herein are directed toward an active learning model trainer that trains a model (e.g., a CNN model) without introducing additional labeling cost.
- the trainer uses unlabeled data to improve the quality of the trained model while keeping the number of labeled samples small.
- the trainer is based upon the assumption that a model should be consistent in its decisions between a sample and a meaningfully distorted version of the same sample (i.e., a consistency of predictions).
- an example system 100 includes a processing system 10 .
- the processing system 10 may be a single computer, multiple computers, or a distributed system (e.g., a cloud environment) having fixed or scalable/elastic computing resources 12 (e.g., data processing hardware) and/or storage resources 14 (e.g., memory hardware).
- the processing system 10 executes an active learning model trainer 110 .
- the model trainer 110 trains a target model 130 (e.g., a machine learning model) to make predictions based on input data.
- the model trainer 110 trains a convolutional neural network (CNN).
- CNN convolutional neural network
- the model trainer 110 trains the target model 130 on a set of unlabeled training samples 112 , 112 U.
- An unlabeled training sample refers to data that does not include any annotations or other indications of the correct result for the target model 130 which is in contrast to labeled data that does include such annotations.
- labeled data for a target model 130 that is trained to transcribe audio data includes the audio data as well as a corresponding accurate transcription of the audio data.
- Unlabeled data for the same target model 130 would include the audio data without the transcription.
- the target model 130 may make a prediction based on a training sample and then easily compare the prediction to the label serving as a ground-truth to determine how accurate the prediction was. In contrast, such feedback is not available with unlabeled data.
- the unlabeled training samples 112 U may be representative of whatever data the target model 130 requires to make its predictions.
- the unlabeled training data may include frames of image data (e.g., for object detection or classification, etc.), frames of audio data (e.g., for transcription or speech recognition, etc.), and/or text (e.g., for natural language classification, etc.).
- the unlabeled training samples 112 U may be stored on the processing system 10 (e.g., within memory hardware 14 ) or received, via a network or other communication channel, from another entity.
- the target model 130 (i.e., the machine learning model the active learning model trainer 110 is training) is initially trained on a small set of labeled training samples 112 , 112 L and/or unlabeled training samples 112 U. This quickly provides the target model 130 with rough initial prediction capabilities.
- This minimally-trained target model 130 receives, for each unlabeled training sample 112 U, the unlabeled training sample 112 U and the corresponding augmented training sample 112 A.
- the target model 130 using the unlabeled training sample 112 U, generates a predicted label 132 , 132 P U .
- the predicted label 132 P U represents the target model's prediction based on the unlabeled training sample 112 U and the model's training to this point.
- the target model 130 using the augmented training sample 112 A, generates another predicted label 132 , 132 P A .
- the predicted label 132 P A represents the target model's prediction based on the augmented training sample 112 A and the model's training to this point.
- the target model 130 typically is not configured to process both the unlabeled training sample 112 U and the augmented training sample 112 A simultaneously, and instead processes them sequentially (in either order) to first, generate a first prediction label 132 P with either one of the unlabeled training sample 112 U or the augmented training sample 112 A, and second, generate a second prediction label 132 P with the other one of the unlabeled training sample 112 U or the augmented training sample 112 A.
- the active learning model trainer 110 includes an inconsistency determiner 140 .
- the inconsistency determiner 140 receives both predictions 132 P U , 132 P A for each pair of samples 112 for each unlabeled training sample 112 U in the set of unlabeled training samples 112 U.
- the inconsistency determiner 140 determines an inconsistency value 142 that represents variance between the predicted label 132 P U of the unlabeled training sample 112 U and the predicted label 132 P A of the augmented training sample 112 A. That is, a large inconsistency value 142 indicates that the unlabeled training sample 112 U produces large unsupervised loss when the target model 130 converges.
- a small inconsistency value 142 indicates that the unlabeled training sample 112 U produces a small unsupervised loss when the target model 130 converges.
- a sample selector 150 receives the inconsistency value 142 associated with each of the unlabeled training samples 112 U.
- the sample selector sorts the unlabeled training samples 112 U in descending order based on the inconsistency values 142 and selects a current set of unlabeled training samples 112 U T from the sorted unlabeled training samples 112 U. That is, the sample selector 150 selects a threshold number of unlabeled training samples 112 U T based on their respective inconsistency values 142 to form a current set of unlabeled training samples 112 U T .
- the sample selector 150 obtains, for each unlabeled training sample 112 U T , a ground truth label 132 G.
- the ground truth labels 132 G are labels that are empirically determined by another source.
- an oracle 160 determines the ground truth labels 132 G of the unlabeled training samples 112 U T .
- the oracle 160 is a human annotator or other human agent.
- the sample selector 150 may send the selected unlabeled training samples 112 U T to the oracle 160 .
- the oracle 160 in response to receiving the unlabeled training samples 112 U T , determines or otherwise obtains the associated ground truth label 132 G for each unlabeled training sample 112 U T .
- the unlabeled training samples 112 U T combined with the ground truth labels 132 G, form labeled training samples 112 L and may be stored with other labeled training samples 112 L (e.g., the labeled training samples 112 L that the model trainer 110 used to initially train the target model 130 ). That is, the model trainer 110 may select a current set of labeled training samples 112 L that includes the selected unlabeled training samples 110 U T paired with the corresponding ground truth labels 132 G.
- the model trainer 110 trains (e.g., retrains or fine-tunes), using the current set of labeled training samples 112 L (i.e., the selected unlabeled training samples 112 U T and the corresponding ground truth labels 132 G), the target model 130 .
- the model trainer 110 trains, using the current set of labeled training samples 112 L and a proper subset of unlabeled training samples 112 U P from the set of unlabeled training samples 112 U, the target model 130 .
- the proper subset of unlabeled training samples 112 U P may include each unlabeled training sample 112 U that was not part of any set of unlabeled training samples 112 U T (i.e., unlabeled training samples 112 U selected to obtain the corresponding ground truth label 132 G).
- the model trainer 110 may obtain the proper subset of unlabeled training samples 112 U P from the set of unlabeled training samples 112 U by removing the threshold number of unlabeled training samples 112 U T from the set of unlabeled training samples 112 U.
- the model trainer 110 may also include in the training any previously labeled training samples 112 L (i.e., from initial labels or from previous active learning cycles). Thus, the model trainer 110 may train the target model 130 on all labeled training samples 112 L (i.e., the current set of labeled training samples 110 L in addition to any previously labeled training samples 112 L) and all remaining unlabeled training samples 112 U (i.e., the set of unlabeled training samples 112 U minus the selected unlabeled training samples 112 U T ) via semi-supervised learning. That is, in some examples, the active learning model trainer 110 completely retrains the target model 130 using all of the unlabeled training samples 112 U and labeled training samples 112 L.
- the active learning model trainer incrementally retrains the target model 130 using only the newly obtained labeled training samples 112 L.
- training the target model 130 may refer to completely retraining the target model 130 from scratch or some form of retraining/fine-tuning the target model 130 by conducting additional training (with or without parameter changes such as by freezing weights of one or more layers, adjusting learning speed, etc.).
- the model trainer 110 may repeat the process (i.e., perturbing unlabeled training samples 112 U, determining inconsistency values 142 , selecting unlabeled training samples 112 U T , obtaining ground truth labels 132 G, etc.) for any number of active learning cycles.
- the active learning model trainer 110 repeats training of the target model 130 (and subsequently growing the set of labeled training samples 112 L) for a predetermined number of cycles or until the target model 130 reaches a threshold effectiveness or until a labeling budget is satisfied. In this way, the model trainer 110 gradually increases the number of labeled training samples 112 L until the number of samples is sufficient to train the target model 130 .
- the inconsistency value 142 for each unlabeled training sample 112 U in the threshold number of unlabeled training samples 110 U T is greater than the inconsistency value 142 for each unlabeled training sample 112 U not selected from the sorted unlabeled training samples 112 U in the set of unlabeled training samples 112 U.
- a schematic view 200 shows that the inconsistency determiner 140 sorts the inconsistency values 142 , 142 a —n from the most inconsistent value 142 a (i.e., the highest inconsistent value 142 ) to the least inconsistent value 142 n (i.e., the lowest inconsistent value).
- Each inconsistency value 142 has a corresponding unlabeled training sample 112 U, 112 Ua-n.
- the most inconsistent value 142 a corresponds to the unlabeled training sample 112 Ua while the least inconsistent value 142 n corresponds to the unlabeled training sample 112 Un.
- the sample selector 150 selects the five unlabeled training samples 112 U with the five most inconsistent values 142 as the current set of unlabeled training samples 112 U T . It is understood that five is merely exemplary, and the sample selector 150 may select any number of unlabeled training samples 112 U.
- the threshold number of unlabeled training samples 112 U T may be less than a cardinality of the set of unlabeled training samples 112 U.
- the sample selector 150 selects a first M number (e.g., five, ten, fifty, etc.) of unlabeled training samples 112 U from the sorted unlabeled training samples 112 U in the set of unlabeled training samples 112 U as the threshold number of training samples 12 U T .
- the selected unlabeled training samples 112 U are passed to the oracle 160 to retrieve the corresponding ground truth labels 132 G.
- the oracle 160 determines a corresponding ground truth label 132 G for each of the five unlabeled training samples 112 U T .
- the model trainer 110 may now use these five labeled training samples 112 L (i.e., the five corresponding pairs of unlabeled training samples 112 U and ground truth labels 132 G) to train or retrain or fine-tune the target model 130 .
- the model trainer 110 provides initial training of the untrained target model 130 during an initial active learning cycle (i.e., the first active learning cycle).
- an initial set selector 310 randomly selects a random set of unlabeled training samples 112 U R from the set of unlabeled training samples 112 U.
- the initial set selector 310 also obtains corresponding ground-truth labels 132 G R for each unlabeled training sample 112 U R in the random set of unlabeled training samples 112 U R .
- the model trainer 110 may train, using the random set of unlabeled training samples 112 U R and the corresponding ground-truth labels 132 G R (to form a set of labeled training samples 112 L R ), the machine learning model 130 . That is, in some implementations, prior to the target model 130 receiving any training, the model trainer 110 randomly selects a small set (relative to the entire set) of unlabeled training samples 112 U R and obtains the corresponding ground-truth labels 132 G R to provide initial training of the target model 130 .
- the model trainer 110 may identify a candidate set of unlabeled training samples 112 U C from the set of unlabeled training samples 112 U (e.g., fifty samples, one hundred samples, etc.).
- a cardinality of the candidate set of training samples 112 U C may be less than a cardinality of the set of unlabeled training samples 112 U. For example, as shown in a schematic view 300 b of FIG.
- the initial set selector 310 may receive inconsistency values 142 from the inconsistency determiner 140 based on predicted labels 132 P U from the target model 130 and select the candidate set of unlabeled training samples 112 U C based on the inconsistency values 142 of each unlabeled training sample 112 U. That is, the model trainer 110 identifies the candidate set of unlabeled training samples 112 U C by determining the inconsistency value 142 for each unlabeled training sample 112 U of the set of unlabeled training samples 112 U.
- the candidate set of unlabeled training samples 112 U C includes half of the unlabeled training samples 112 U in the set of unlabeled training samples 112 U with the highest corresponding inconsistency values 142 .
- the initial set selector 310 may determine a first cross entropy 320 between a distribution of ground-truth labels 132 G and a distribution of predicted labels 132 P U generated using the machine learning model 130 for the training samples in the candidate set of unlabeled training samples 112 U C .
- the initial set selector 310 may also determine a second cross entropy 330 between the distribution of ground-truth labels 132 G and a distribution of predicted labels 132 P U generated by the machine learning model 130 for the training samples in the set of unlabeled training samples 112 U.
- the first cross entropy 320 is between the actual label distribution for the candidate set 112 U C and the predicted label distribution for the candidate set 112 U C while the second cross entropy 330 is between the same actual label distribution for the candidate set 112 U C as the first cross entropy 320 and the predicted label distribution for the entire set of unlabeled training samples 112 U.
- Cross entropy may be thought of generally as calculating the differences between two distributions.
- the initial set selector 310 determines whether the first cross entropy 320 is greater than or equal to the second cross entropy 330 at step 350 .
- the differences between the actual label distributions and the predicted label distribution for the candidate set 112 U C is greater than or equal to the differences between the actual label distributions and the predicted label distributions for the entire set of unlabeled training samples 112 U.
- the model trainer 110 determines the inconsistency value 142 for each unlabeled training sample 112 U of the set of unlabeled training samples 112 U
- the model trainer 110 is selecting unlabeled training samples 112 U that the model 130 is most uncertain about (i.e., samples 112 U that tend to be far away from the data distribution) and thus indicates better performance.
- the initial set selector 310 may select the candidate set of unlabeled training samples 112 U C as a starting size of the current set of labeled training samples 112 L.
- the model trainer 110 may proceed with subsequent active learning cycles as described above ( FIGS. 1 and 2 ).
- the initial set selector 310 randomly selects an expanded set of training samples 112 U E from the unlabeled set of training samples 112 U.
- the initial set selector 310 updates the candidate set of unlabeled training samples 112 U C to include the expanded set of training samples 112 U E randomly selected from the unlabeled set of training samples 112 U.
- the initial set selector 310 updates the unlabeled set of training samples 112 U by removing each training sample from the expanded set of training samples 112 Us from the unlabeled set of training samples 112 U. This ensures that unlabeled training samples 112 U are not duplicated.
- the initial set selector 310 may repeat each of the previous steps with the updated candidate set 112 U C .
- the initial set selector 310 determines the first cross entropy 320 between the distribution of ground truth labels 132 G and the distribution of predicted labels 132 P generated using the machine learning model 130 for the training samples in the updated candidate set of unlabeled training samples 112 U C .
- the initial set selector 310 also determines the second cross entropy 330 between the distribution of ground truth labels 132 G and the distribution of predicted labels 132 P generated using the machine learning model 130 for the training samples in the updated candidate set of unlabeled training samples 112 U C .
- the initial set selector 310 again determines whether the first cross entropy 320 is greater than or equal to the second cross entropy 330 .
- the initial set selector selects the updated candidate set of unlabeled training samples 112 U C as a starting size for initially training the machine learning model 130 .
- the initial set selector 310 may continue to iteratively expand the candidate set 112 U C until the first cross entropy 320 is greater than or equal to the second cross entropy 330 (i.e., indicates that the target model 130 performance is sufficient).
- FIG. 4 is a flowchart of an exemplary arrangement of operations for a method 400 for active learning via a sample consistency assessment.
- the method 400 includes obtaining, by data processing hardware 12 , a set of unlabeled training samples 112 U.
- the method 400 includes perturbing, by the data processing hardware 12 , the unlabeled training sample 112 U to generate an augmented training sample 112 A.
- the method 400 includes generating, by the data processing hardware 12 , using a machine learning model 130 configured to receive the unlabeled training sample 112 U and the augmented training sample 112 A as inputs, a predicted label 132 P U for the unlabeled training sample 112 U and a predicted label 132 P A for the augmented training sample 112 A.
- the method 400 includes determining, by the data processing hardware 12 , an inconsistency value 142 for the unlabeled training sample 112 U.
- the inconsistency value 142 represents variance between the predicted label 132 P U for the unlabeled training sample 112 U and the predicted label 132 P A for the augmented training sample 112 A.
- the method 400 at step 410 , includes sorting, by the data processing hardware 12 , the unlabeled training samples 112 U in the set of unlabeled training samples 112 U in descending order based on the inconsistency values 142 .
- the method 400 includes obtaining, by the data processing hardware 12 , for each unlabeled training sample 112 U in a threshold number of unlabeled training samples 112 U T selected from the sorted unlabeled training samples 112 U in the set of unlabeled training samples 112 U, a ground truth label 132 G.
- the method 400 includes selecting, by the data processing hardware 12 , a current set of labeled training samples 112 L, the current set of labeled training samples 112 L including each unlabeled training sample 112 U in the threshold number of unlabeled training samples 112 U T selected from the sorted unlabeled training samples 112 U in the set of unlabeled training samples 112 U paired with the corresponding obtained ground truth label 132 G.
- the method 400 includes training, by the data processing hardware 12 , using the current set of labeled training samples 112 L and a proper subset of unlabeled training samples 112 U P from the set of unlabeled training samples 112 U, the machine learning model 130 .
- the model trainer 110 may identify unlabeled training samples 112 U that have a high potential for performance improvement relative to the performance improvement of other unlabeled training samples 112 U without increasing (and potentially reducing) the total labeling cost (e.g., expenditure of computation resources, consumption of human annotator time, etc.).
- the model trainer 110 also determines an appropriate size for an initial or starting set of labeled training examples 112 L by using a cost-efficient approach that avoids overhead stemming from starting with large sets of labeled data samples 112 L while also ensuring optimal model performance with a limited number of labeled training samples 112 L (i.e., compared to conventional techniques).
- FIG. 5 is a schematic view of an example computing device 500 that may be used to implement the systems and methods described in this document.
- the computing device 500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
- the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
- the computing device 500 includes a processor 510 , memory 520 , a storage device 530 , a high-speed interface/controller 540 connecting to the memory 520 and high-speed expansion ports 550 , and a low speed interface/controller 560 connecting to a low speed bus 570 and a storage device 530 .
- Each of the components 510 , 520 , 530 , 540 , 550 , and 560 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
- the processor 510 can process instructions for execution within the computing device 500 , including instructions stored in the memory 520 or on the storage device 530 to display graphical information for a graphical user interface (GUI) on an external input/output device, such as display 580 coupled to high speed interface 540 .
- GUI graphical user interface
- multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
- multiple computing devices 500 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
- the memory 520 stores information non-transitorily within the computing device 500 .
- the memory 520 may be a computer-readable medium, a volatile memory unit(s), or non-volatile memory unit(s).
- the non-transitory memory 520 may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by the computing device 500 .
- non-volatile memory examples include, but are not limited to, flash memory and read-only memory (ROM)/programmable read-only memory (PROM)/erasable programmable read-only memory (EPROM)/electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs).
- volatile memory examples include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes.
- the storage device 530 is capable of providing mass storage for the computing device 500 .
- the storage device 530 is a computer-readable medium.
- the storage device 530 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
- a computer program product is tangibly embodied in an information carrier.
- the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
- the information carrier is a computer- or machine-readable medium, such as the memory 520 , the storage device 530 , or memory on processor 510 .
- the high speed controller 540 manages bandwidth-intensive operations for the computing device 500 , while the low speed controller 560 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only.
- the high-speed controller 540 is coupled to the memory 520 , the display 580 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 550 , which may accept various expansion cards (not shown).
- the low-speed controller 560 is coupled to the storage device 530 and a low-speed expansion port 590 .
- the low-speed expansion port 590 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
- input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
- the computing device 500 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 500 a or multiple times in a group of such servers 500 a , as a laptop computer 500 b , or as part of a rack server system 500 c.
- implementations of the systems and techniques described herein can be realized in digital electronic and/or optical circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- a software application may refer to computer software that causes a computing device to perform a task.
- a software application may be referred to as an “application,” an “app,” or a “program.”
- Example applications include, but are not limited to, system diagnostic applications, system management applications, system maintenance applications, word processing applications, spreadsheet applications, messaging applications, media streaming applications, social networking applications, and gaming applications.
- the processes and logic flows described in this specification can be performed by one or more programmable processors, also referred to as data processing hardware, executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read only memory or a random access memory or both.
- the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- mass storage devices for storing data
- a computer need not have such devices.
- Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- one or more aspects of the disclosure can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- Other kinds of devices can be used to provide interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method includes obtaining a set of unlabeled training samples. For each training sample in the set of unlabeled training samples generating, the method includes using a machine learning model and the training sample, a corresponding first prediction, generating, using the machine learning model and a modified unlabeled training sample, a second prediction, the modified unlabeled training sample based on the training sample, and determining a difference between the first prediction and the second prediction. The method includes selecting, based on the differences, a subset of the set of unlabeled training samples. For each training sample in the subset of the set of unlabeled training samples, the method includes obtaining a ground truth label for the training sample, and generating a corresponding labeled training sample based on the training sample paired with the ground truth label. The method includes training the machine learning model using the corresponding labeled training samples.
Description
- This U.S. patent application is a continuation of, and claims priority under 35 U.S.C. § 120 from, U.S. patent application Ser. No. 17/000,094, filed on Aug. 21, 2020, which claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application 62,890,379, filed on Aug. 22, 2019. The disclosures of these prior applications are considered part of the disclosure of this application and are hereby incorporated by reference in their entireties.
- This disclosure relates to active learning such as active learning using a sample consistency assessment.
- Generally, supervised machine learning models require large amounts of labeled training data in order to accurately predict results. However, while obtaining large amounts of unlabeled data is often easy, labeling the data is frequently very difficult. That is, labeling vast quantities of data is often inordinately expensive if not outright impossible. Thus, active learning is a popular type of machine learning that allows for the prioritization of unlabeled data in order to train a model only on data that will have the highest impact (i.e., the greatest increase in accuracy). Typically, an active learning algorithm is first trained on a small sub-set of labeled data and then may actively query a teacher to label select unlabeled training samples. The process of selecting the unlabeled training samples is an active field of study.
- One aspect of the disclosure provides a method for active learning via a sample consistency assessment. The method includes obtaining, by data processing hardware, a set of unlabeled training samples. During each of a plurality of active learning cycles and for each unlabeled training sample in the set of unlabeled training samples, the method includes perturbing, by the data processing hardware, the unlabeled training sample to generate an augmented training sample. The method also includes, generating, by the data processing hardware, using the machine learning model configured to receive the unlabeled training sample and the augmented training sample as inputs, a predicted label for the unlabeled training sample and a predicted label for the augmented training sample and determining, by the data processing hardware, an inconsistency value for the unlabeled training sample. The inconsistency value represents variance between the predicted label for the unlabeled training sample and the predicted label for the augmented training sample. The method also includes sorting, by the data processing hardware, the unlabeled training samples in the set of unlabeled training samples in descending order based on the inconsistency values and obtaining, by the data processing hardware, for each unlabeled training sample in a threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples, a ground truth label. The method includes selecting, by the data processing hardware, a current set of labeled training samples. The current set of labeled training samples includes each unlabeled training sample in the threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples paired with the corresponding obtained ground truth label. The method also includes training, by the data processing hardware, using the current set of labeled training samples and a proper subset of unlabeled training samples from the set of unlabeled training samples, the machine learning model.
- Implementations of the disclosure may include one or more of the following optional features. In some implementations, the threshold number of unlabeled training samples is less than a cardinality of the set of unlabeled training samples. The inconsistency value for each unlabeled training sample in the threshold number of unlabeled training samples may be greater than the inconsistency value for each unlabeled training sample not selected from the sorted unlabeled training samples in the set of unlabeled training samples.
- Optionally, the method further includes obtaining, by the data processing hardware, the proper subset of unlabeled training samples from the set of unlabeled training samples by removing the threshold number of unlabeled training samples from the set of unlabeled training samples. The method may further include selecting, by the data processing hardware, a first M number of unlabeled training samples from the sorted unlabeled training samples in the set of unlabeled training samples as the threshold number of unlabeled training samples.
- In some examples, the method further includes, during an initial active learning cycle, randomly selecting, by the data processing hardware, a random set of unlabeled training samples from the set of unlabeled training samples and obtaining, by the data processing hardware, corresponding ground truth labels for each unlabeled training sample in the random set of unlabeled training samples. The method may also further include training, by the data processing hardware, using the random set of unlabeled training samples and the corresponding ground truth labels, the machine learning model. This example may include, during the initial active learning cycle, identifying, by the data processing hardware, a candidate set of unlabeled training samples from the set of unlabeled training samples. A cardinality of the candidate set of unlabeled training samples may be less than a cardinality of the set of unlabeled training samples. The method may also further include determining, by the data processing hardware, a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples and determining, by the data processing hardware, a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples. The method may also further include determining, by the data processing hardware, whether the first cross entropy is greater than or equal to the second cross entropy and, when the first cross entropy is greater than or equal to the second cross entropy, selecting, by the data processing hardware, the candidate set of unlabeled training samples as a starting size for initially training the machine learning model. Identifying the candidate set of unlabeled training samples from the set of unlabeled training samples, in some implementations, includes determining the inconsistency value for each unlabeled training sample of the set of unlabeled training samples.
- In some implementations, the method may further include, when the first cross entropy is less than the second cross entropy, randomly selecting, by the data processing hardware, an expanded set of training samples from the unlabeled set of training samples and updating, by the data processing hardware, the candidate set of unlabeled training samples to include the expanded set of training samples randomly selected from the unlabeled set of training samples. The method may also further include updating, by the data processing hardware, the unlabeled set of training samples by removing each training sample from the expanded set of training samples from the unlabeled set of training samples. During an immediately subsequent active learning cycle, the method may also further include determining, by the data processing hardware, the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the training samples in the updated candidate set of unlabeled training samples and determining, by the data processing hardware, the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the training samples in the updated candidate set of unlabeled training samples. The method may also further include determining, by the data processing hardware, whether the first cross entropy is greater than or equal to the second cross entropy. When the first cross entropy is greater than or equal to the second cross entropy, the method may further include selecting, by the data processing hardware, the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model. In some examples, the machine learning model includes a convolutional neural network.
- Another aspect of the disclosure provides data processing hardware and memory hardware in communication with the data processing hardware. The memory hardware stores instructions that when executed on the data processing hardware cause the data processing hardware to perform operations. The operations include obtaining a set of unlabeled training samples. During each of a plurality of active learning cycles and for each unlabeled training sample in the set of unlabeled training samples, the operations include perturbing the unlabeled training sample to generate an augmented training sample. The operations also include, generating, using the machine learning model configured to receive the unlabeled training sample and the augmented training sample as inputs, a predicted label for the unlabeled training sample and a predicted label for the augmented training sample and determining an inconsistency value for the unlabeled training sample. The inconsistency value represents variance between the predicted label for the unlabeled training sample and the predicted label for the augmented training sample. The operations also include sorting the unlabeled training samples in the set of unlabeled training samples in descending order based on the inconsistency values and obtaining, for each unlabeled training sample in a threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples, a ground truth label. The operations include selecting a current set of labeled training samples. The current set of labeled training samples includes each unlabeled training sample in the threshold number of unlabeled training samples selected from the sorted unlabeled training samples in the set of unlabeled training samples paired with the corresponding obtained ground truth label. The operations also include training, using the current set of labeled training samples and a proper subset of unlabeled training samples from the set of unlabeled training samples, the machine learning model.
- This aspect may include one or more of the following optional features. In some implementations, the threshold number of unlabeled training samples is less than a cardinality of the set of unlabeled training samples. The inconsistency value for each unlabeled training sample in the threshold number of unlabeled training samples may be greater than the inconsistency value for each unlabeled training sample not selected from the sorted unlabeled training samples in the set of unlabeled training samples.
- Optionally, the operations further include obtaining the proper subset of unlabeled training samples from the set of unlabeled training samples by removing the threshold number of unlabeled training samples from the set of unlabeled training samples. The operations may further include selecting a first M number of unlabeled training samples from the sorted unlabeled training samples in the set of unlabeled training samples as the threshold number of unlabeled training samples.
- In some examples, the operations further include, during an initial active learning cycle, randomly selecting a random set of unlabeled training samples from the set of unlabeled training samples and obtaining corresponding ground truth labels for each unlabeled training sample in the random set of unlabeled training samples. The operations may also further include training, using the random set of unlabeled training samples and the corresponding ground truth labels, the machine learning model. This example may include, during the initial active learning cycle, identifying a candidate set of unlabeled training samples from the set of unlabeled training samples. A cardinality of the candidate set of unlabeled training samples may be less than a cardinality of the set of unlabeled training samples. The operations may also further include determining a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples and determining a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples. The operations may also further include determining whether the first cross entropy is greater than or equal to the second cross entropy and, when the first cross entropy is greater than or equal to the second cross entropy, selecting the candidate set of unlabeled training samples as a starting size for initially training the machine learning model. Identifying the candidate set of unlabeled training samples from the set of unlabeled training samples, in some implementations, includes determining the inconsistency value for each unlabeled training sample of the set of unlabeled training samples.
- In some implementations, the operations may further include, when the first cross entropy is less than the second cross entropy, randomly selecting an expanded set of training samples from the unlabeled set of training samples and updating the candidate set of unlabeled training samples to include the expanded set of training samples randomly selected from the unlabeled set of training samples. The operations may also further include updating the unlabeled set of training samples by removing each training sample from the expanded set of training samples from the unlabeled set of training samples. During an immediately subsequent active learning cycle, the operations may also further include determining the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the training samples in the updated candidate set of unlabeled training samples and determining the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the training samples in the updated candidate set of unlabeled training samples. The operations may also further include determining whether the first cross entropy is greater than or equal to the second cross entropy. When the first cross entropy is greater than or equal to the second cross entropy, the operations may further include selecting the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model. In some examples, the machine learning model includes a convolutional neural network.
- The details of one or more implementations of the disclosure are set forth in the accompanying drawings and the description below. Other aspects, features, and advantages will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a schematic view of an example system for training an active learning model. -
FIG. 2 is a schematic view of example components of the system ofFIG. 1 . -
FIGS. 3A-3C are schematic views of components for determining an initial starting size of labeled training samples. -
FIG. 4 is a flowchart of an example arrangement of operations for a method of active learning via a sample consistency assessment. -
FIG. 5 is a schematic view of an example computing device that may be used to implement the systems and methods described herein. - Like reference symbols in the various drawings indicate like elements.
- As acquiring vast quantities of data becomes cheaper and easier, advances in machine learning are capitalizing by training models using deep learning methods on large amounts of data. However, this raises new challenges, as typically the data is unlabeled which requires labeling prior to use with supervised learning or semi-supervised learning models. Conventionally, training data is labeled by human operators. For example, when preparing training samples for a model that performs object detection with frames of image data, an expert annotator (e.g., a trained human) may label frames of image data by drawing a bounding box around pedestrians. When the quantity of data is vast, manually labeling the data is expensive at best and impossible at worst.
- One popular approach to the data labeling problem is active learning. In active learning, the model is allowed to proactively select a subset of training samples from a set of unlabeled training samples and request the subset be labeled from an “oracle,” e.g., an expert annotator or any other entity that may accurately label the selected samples (i.e., the “ground truth” label). That is, active learning modules dynamically pose queries during training to actively select which samples to train on. Active learning has the potential to greatly reduce the overhead of labeling data while simultaneously increasing accuracy with substantially less labeled training samples.
- In order to select samples that are useful to improve the target model, selection methods typically depend on outputs and/or intermediate features of the target model to measure unlabeled samples. For example, a method may use entropy of the output to measure uncertainty. Another method may ensure that selected samples cover a large range of diversity. Yet another method may use predicted loss to attempt to select the most valuable samples. However, all of these methods struggle to apply to convolutional neural networks (CNN) when the labeling budget is small, because typically a large set of labeled data is needed for accurate CNN models.
- Implementations herein are directed toward an active learning model trainer that trains a model (e.g., a CNN model) without introducing additional labeling cost. The trainer uses unlabeled data to improve the quality of the trained model while keeping the number of labeled samples small. The trainer is based upon the assumption that a model should be consistent in its decisions between a sample and a meaningfully distorted version of the same sample (i.e., a consistency of predictions).
- Referring to
FIG. 1 , in some implementations, anexample system 100 includes aprocessing system 10. Theprocessing system 10 may be a single computer, multiple computers, or a distributed system (e.g., a cloud environment) having fixed or scalable/elastic computing resources 12 (e.g., data processing hardware) and/or storage resources 14 (e.g., memory hardware). Theprocessing system 10 executes an activelearning model trainer 110. Themodel trainer 110 trains a target model 130 (e.g., a machine learning model) to make predictions based on input data. For example, themodel trainer 110 trains a convolutional neural network (CNN). Themodel trainer 110 trains thetarget model 130 on a set of 112, 112U. An unlabeled training sample refers to data that does not include any annotations or other indications of the correct result for theunlabeled training samples target model 130 which is in contrast to labeled data that does include such annotations. For example, labeled data for atarget model 130 that is trained to transcribe audio data includes the audio data as well as a corresponding accurate transcription of the audio data. Unlabeled data for thesame target model 130 would include the audio data without the transcription. With labeled data, thetarget model 130 may make a prediction based on a training sample and then easily compare the prediction to the label serving as a ground-truth to determine how accurate the prediction was. In contrast, such feedback is not available with unlabeled data. - The
unlabeled training samples 112U may be representative of whatever data thetarget model 130 requires to make its predictions. For example, the unlabeled training data may include frames of image data (e.g., for object detection or classification, etc.), frames of audio data (e.g., for transcription or speech recognition, etc.), and/or text (e.g., for natural language classification, etc.). Theunlabeled training samples 112U may be stored on the processing system 10 (e.g., within memory hardware 14) or received, via a network or other communication channel, from another entity. - The
model trainer 110 includes asample perturber 120. Thesample perturber 120 receives eachunlabeled training sample 112U in the set ofunlabeled training samples 112U and perturbs eachunlabeled training sample 112U to generate a corresponding 112, 112A. That is, theaugmented training sample sample perturber 120 introduces small, but meaningful changes to eachunlabeled training sample 112U. For example, thesample perturber 120 increases or decreases values by a predetermined or random amount to generate a pair oftraining samples 112 that includes the originalunlabeled training sample 112U and the corresponding augmented (i.e., perturbed)training sample 112A. As another example, when theunlabeled training sample 112U includes a frame of image data, thesample perturber 120 may rotate the image, flip the image, crop the image, etc. Thesample perturber 120 may use any other conventional means of perturbing the data as well. - As discussed in more detail below, the target model 130 (i.e., the machine learning model the active
learning model trainer 110 is training) is initially trained on a small set of labeled 112, 112L and/ortraining samples unlabeled training samples 112U. This quickly provides thetarget model 130 with rough initial prediction capabilities. This minimally-trainedtarget model 130 receives, for eachunlabeled training sample 112U, theunlabeled training sample 112U and the correspondingaugmented training sample 112A. Thetarget model 130, using theunlabeled training sample 112U, generates a predictedlabel 132, 132PU. The predicted label 132PU represents the target model's prediction based on theunlabeled training sample 112U and the model's training to this point. Thetarget model 130, using the augmentedtraining sample 112A, generates another predictedlabel 132, 132PA. The predicted label 132PA represents the target model's prediction based on theaugmented training sample 112A and the model's training to this point. Note that thetarget model 130 typically is not configured to process both theunlabeled training sample 112U and theaugmented training sample 112A simultaneously, and instead processes them sequentially (in either order) to first, generate a first prediction label 132P with either one of theunlabeled training sample 112U or theaugmented training sample 112A, and second, generate a second prediction label 132P with the other one of theunlabeled training sample 112U or theaugmented training sample 112A. - The active
learning model trainer 110 includes aninconsistency determiner 140. Theinconsistency determiner 140 receives both predictions 132PU, 132PA for each pair ofsamples 112 for eachunlabeled training sample 112U in the set ofunlabeled training samples 112U. Theinconsistency determiner 140 determines aninconsistency value 142 that represents variance between the predicted label 132PU of theunlabeled training sample 112U and the predicted label 132PA of the augmentedtraining sample 112A. That is, alarge inconsistency value 142 indicates that theunlabeled training sample 112U produces large unsupervised loss when thetarget model 130 converges. Conversely, asmall inconsistency value 142 indicates that theunlabeled training sample 112U produces a small unsupervised loss when thetarget model 130 converges. In some examples, the greater the difference between the predicted labels 132PU, 132PA, the greater the associatedinconsistency value 142. - A
sample selector 150 receives theinconsistency value 142 associated with each of theunlabeled training samples 112U. The sample selector sorts theunlabeled training samples 112U in descending order based on the inconsistency values 142 and selects a current set ofunlabeled training samples 112UT from the sortedunlabeled training samples 112U. That is, thesample selector 150 selects a threshold number ofunlabeled training samples 112UT based on their respective inconsistency values 142 to form a current set ofunlabeled training samples 112UT. Thesample selector 150 obtains, for eachunlabeled training sample 112UT, aground truth label 132G. The ground truth labels 132G are labels that are empirically determined by another source. In some implementations, anoracle 160 determines the ground truth labels 132G of theunlabeled training samples 112UT. Optionally, theoracle 160 is a human annotator or other human agent. - The
sample selector 150 may send the selectedunlabeled training samples 112UT to theoracle 160. Theoracle 160, in response to receiving theunlabeled training samples 112UT, determines or otherwise obtains the associatedground truth label 132G for eachunlabeled training sample 112UT. Theunlabeled training samples 112UT, combined with the ground truth labels 132G, form labeledtraining samples 112L and may be stored with other labeledtraining samples 112L (e.g., the labeledtraining samples 112L that themodel trainer 110 used to initially train the target model 130). That is, themodel trainer 110 may select a current set of labeledtraining samples 112L that includes the selected unlabeled training samples 110UT paired with the corresponding ground truth labels 132G. - The
model trainer 110 trains (e.g., retrains or fine-tunes), using the current set of labeledtraining samples 112L (i.e., the selectedunlabeled training samples 112UT and the corresponding ground truth labels 132G), thetarget model 130. In some implementations, themodel trainer 110 trains, using the current set of labeledtraining samples 112L and a proper subset ofunlabeled training samples 112UP from the set ofunlabeled training samples 112U, thetarget model 130. The proper subset ofunlabeled training samples 112UP may include eachunlabeled training sample 112U that was not part of any set ofunlabeled training samples 112UT (i.e.,unlabeled training samples 112U selected to obtain the correspondingground truth label 132G). Put another way, themodel trainer 110 may obtain the proper subset ofunlabeled training samples 112UP from the set ofunlabeled training samples 112U by removing the threshold number ofunlabeled training samples 112UT from the set ofunlabeled training samples 112U. - The
model trainer 110 may also include in the training any previously labeledtraining samples 112L (i.e., from initial labels or from previous active learning cycles). Thus, themodel trainer 110 may train thetarget model 130 on all labeledtraining samples 112L (i.e., the current set of labeled training samples 110L in addition to any previously labeledtraining samples 112L) and all remainingunlabeled training samples 112U (i.e., the set ofunlabeled training samples 112U minus the selectedunlabeled training samples 112UT) via semi-supervised learning. That is, in some examples, the activelearning model trainer 110 completely retrains thetarget model 130 using all of theunlabeled training samples 112U and labeledtraining samples 112L. In other examples, the active learning model trainer incrementally retrains thetarget model 130 using only the newly obtained labeledtraining samples 112L. As used herein, training thetarget model 130 may refer to completely retraining thetarget model 130 from scratch or some form of retraining/fine-tuning thetarget model 130 by conducting additional training (with or without parameter changes such as by freezing weights of one or more layers, adjusting learning speed, etc.). - The
model trainer 110 may repeat the process (i.e., perturbingunlabeled training samples 112U, determining inconsistency values 142, selectingunlabeled training samples 112UT, obtaining ground truth labels 132G, etc.) for any number of active learning cycles. For example, the activelearning model trainer 110 repeats training of the target model 130 (and subsequently growing the set of labeledtraining samples 112L) for a predetermined number of cycles or until thetarget model 130 reaches a threshold effectiveness or until a labeling budget is satisfied. In this way, themodel trainer 110 gradually increases the number of labeledtraining samples 112L until the number of samples is sufficient to train thetarget model 130. - Referring now to
FIG. 2 , in some examples, theinconsistency value 142 for eachunlabeled training sample 112U in the threshold number of unlabeled training samples 110UT is greater than theinconsistency value 142 for eachunlabeled training sample 112U not selected from the sortedunlabeled training samples 112U in the set ofunlabeled training samples 112U. In this example, aschematic view 200 shows that theinconsistency determiner 140 sorts the inconsistency values 142, 142 a—n from the most inconsistent value 142 a (i.e., the highest inconsistent value 142) to the least inconsistent value 142 n (i.e., the lowest inconsistent value). Eachinconsistency value 142 has a correspondingunlabeled training sample 112U, 112Ua-n. Here, the most inconsistent value 142 a corresponds to the unlabeled training sample 112Ua while the least inconsistent value 142 n corresponds to the unlabeled training sample 112Un. In this example, thesample selector 150 selects the fiveunlabeled training samples 112U with the five mostinconsistent values 142 as the current set ofunlabeled training samples 112UT. It is understood that five is merely exemplary, and thesample selector 150 may select any number ofunlabeled training samples 112U. Thus, the threshold number ofunlabeled training samples 112UT may be less than a cardinality of the set ofunlabeled training samples 112U. In some implementations, thesample selector 150 selects a first M number (e.g., five, ten, fifty, etc.) ofunlabeled training samples 112U from the sortedunlabeled training samples 112U in the set ofunlabeled training samples 112U as the threshold number of training samples 12UT. - The selected
unlabeled training samples 112U are passed to theoracle 160 to retrieve the corresponding ground truth labels 132G. Continuing with the illustrated example, theoracle 160 determines a correspondingground truth label 132G for each of the fiveunlabeled training samples 112UT. Themodel trainer 110 may now use these five labeledtraining samples 112L (i.e., the five corresponding pairs ofunlabeled training samples 112U and ground truth labels 132G) to train or retrain or fine-tune thetarget model 130. - Referring now to
FIGS. 3A-C , in some examples, themodel trainer 110 provides initial training of theuntrained target model 130 during an initial active learning cycle (i.e., the first active learning cycle). As shown inschematic view 300 a (FIG. 3A ), in some implementations, aninitial set selector 310 randomly selects a random set ofunlabeled training samples 112UR from the set ofunlabeled training samples 112U. Theinitial set selector 310 also obtains corresponding ground-truth labels 132GR for eachunlabeled training sample 112UR in the random set ofunlabeled training samples 112UR. Themodel trainer 110 may train, using the random set ofunlabeled training samples 112UR and the corresponding ground-truth labels 132GR (to form a set of labeledtraining samples 112LR), themachine learning model 130. That is, in some implementations, prior to thetarget model 130 receiving any training, themodel trainer 110 randomly selects a small set (relative to the entire set) ofunlabeled training samples 112UR and obtains the corresponding ground-truth labels 132GR to provide initial training of thetarget model 130. - Because the random set of
unlabeled training samples 112UR is both random and small, the training of thetarget model 130 is likely insufficient. To further refine a starting set of labeledtraining samples 112L to initially train the target model, themodel trainer 110 may identify a candidate set ofunlabeled training samples 112UC from the set ofunlabeled training samples 112U (e.g., fifty samples, one hundred samples, etc.). A cardinality of the candidate set oftraining samples 112UC may be less than a cardinality of the set ofunlabeled training samples 112U. For example, as shown in aschematic view 300 b ofFIG. 3B , theinitial set selector 310 may receiveinconsistency values 142 from theinconsistency determiner 140 based on predicted labels 132PU from thetarget model 130 and select the candidate set ofunlabeled training samples 112UC based on the inconsistency values 142 of eachunlabeled training sample 112U. That is, themodel trainer 110 identifies the candidate set ofunlabeled training samples 112UC by determining theinconsistency value 142 for eachunlabeled training sample 112U of the set ofunlabeled training samples 112U. Optionally, the candidate set ofunlabeled training samples 112UC includes half of theunlabeled training samples 112U in the set ofunlabeled training samples 112U with the highest corresponding inconsistency values 142. - After receiving corresponding
ground truth Labels 132GC, theinitial set selector 310 may determine afirst cross entropy 320 between a distribution of ground-truth labels 132G and a distribution of predicted labels 132PU generated using themachine learning model 130 for the training samples in the candidate set ofunlabeled training samples 112UC. Theinitial set selector 310 may also determine asecond cross entropy 330 between the distribution of ground-truth labels 132G and a distribution of predicted labels 132PU generated by themachine learning model 130 for the training samples in the set ofunlabeled training samples 112U. That is, thefirst cross entropy 320 is between the actual label distribution for the candidate set 112UC and the predicted label distribution for the candidate set 112UC while thesecond cross entropy 330 is between the same actual label distribution for the candidate set 112UC as thefirst cross entropy 320 and the predicted label distribution for the entire set ofunlabeled training samples 112U. Cross entropy may be thought of generally as calculating the differences between two distributions. - Referring now to
FIG. 3C anddecision tree 300 c, in some implementations, theinitial set selector 310 determines whether thefirst cross entropy 320 is greater than or equal to thesecond cross entropy 330 atstep 350. In this scenario, the differences between the actual label distributions and the predicted label distribution for the candidate set 112UC is greater than or equal to the differences between the actual label distributions and the predicted label distributions for the entire set ofunlabeled training samples 112U. When the candidate set 112UC is selected at least in part based on the largest inconsistency values 142 (i.e., themodel trainer 110 determines theinconsistency value 142 for eachunlabeled training sample 112U of the set ofunlabeled training samples 112U), themodel trainer 110 is selectingunlabeled training samples 112U that themodel 130 is most uncertain about (i.e.,samples 112U that tend to be far away from the data distribution) and thus indicates better performance. - Because of this indication, when the
first cross entropy 320 is greater than or equal to thesecond cross entropy 330, at step 360, theinitial set selector 310 may select the candidate set ofunlabeled training samples 112UC as a starting size of the current set of labeledtraining samples 112L. With thetarget model 130 initially trained, themodel trainer 110 may proceed with subsequent active learning cycles as described above (FIGS. 1 and 2 ). - When the
first cross entropy 320 is less than the second cross entropy 330 (i.e., an indication ofpoor target model 130 performance), the current candidate set 112UC is inadequate for initial training of thetarget model 130. In this example, theinitial set selector 310, atstep 370, randomly selects an expanded set oftraining samples 112UE from the unlabeled set oftraining samples 112U. Atstep 380, theinitial set selector 310 updates the candidate set ofunlabeled training samples 112UC to include the expanded set oftraining samples 112UE randomly selected from the unlabeled set oftraining samples 112U. In some examples, theinitial set selector 310 updates the unlabeled set oftraining samples 112U by removing each training sample from the expanded set of training samples 112Us from the unlabeled set oftraining samples 112U. This ensures thatunlabeled training samples 112U are not duplicated. - During an immediately subsequent active learning cycle (i.e., the next active learning cycle), at
step 390, theinitial set selector 310 may repeat each of the previous steps with the updated candidate set 112UC. For example, theinitial set selector 310 determines thefirst cross entropy 320 between the distribution of ground truth labels 132G and the distribution of predicted labels 132P generated using themachine learning model 130 for the training samples in the updated candidate set ofunlabeled training samples 112UC. Theinitial set selector 310 also determines thesecond cross entropy 330 between the distribution of ground truth labels 132G and the distribution of predicted labels 132P generated using themachine learning model 130 for the training samples in the updated candidate set ofunlabeled training samples 112UC. Theinitial set selector 310 again determines whether thefirst cross entropy 320 is greater than or equal to thesecond cross entropy 330. When thefirst cross entropy 320 is greater than or equal to thesecond cross entropy 330, the initial set selector selects the updated candidate set ofunlabeled training samples 112UC as a starting size for initially training themachine learning model 130. When thefirst cross entropy 320 is less than thesecond cross entropy 330, theinitial set selector 310 may continue to iteratively expand the candidate set 112UC until thefirst cross entropy 320 is greater than or equal to the second cross entropy 330 (i.e., indicates that thetarget model 130 performance is sufficient). -
FIG. 4 is a flowchart of an exemplary arrangement of operations for amethod 400 for active learning via a sample consistency assessment. Themethod 400, atstep 402, includes obtaining, bydata processing hardware 12, a set ofunlabeled training samples 112U. During each of a plurality of active learning cycles, for eachunlabeled training sample 112U in the set ofunlabeled training samples 112U, themethod 400, atstep 404, includes perturbing, by thedata processing hardware 12, theunlabeled training sample 112U to generate anaugmented training sample 112A. Atstep 406, themethod 400 includes generating, by thedata processing hardware 12, using amachine learning model 130 configured to receive theunlabeled training sample 112U and theaugmented training sample 112A as inputs, a predicted label 132PU for theunlabeled training sample 112U and a predicted label 132PA for theaugmented training sample 112A. - At
step 408, themethod 400 includes determining, by thedata processing hardware 12, aninconsistency value 142 for theunlabeled training sample 112U. Theinconsistency value 142 represents variance between the predicted label 132PU for theunlabeled training sample 112U and the predicted label 132PA for theaugmented training sample 112A. Themethod 400, atstep 410, includes sorting, by thedata processing hardware 12, theunlabeled training samples 112U in the set ofunlabeled training samples 112U in descending order based on the inconsistency values 142. - At
step 412, themethod 400 includes obtaining, by thedata processing hardware 12, for eachunlabeled training sample 112U in a threshold number ofunlabeled training samples 112UT selected from the sortedunlabeled training samples 112U in the set ofunlabeled training samples 112U, aground truth label 132G. Themethod 400, atstep 414, includes selecting, by thedata processing hardware 12, a current set of labeledtraining samples 112L, the current set of labeledtraining samples 112L including eachunlabeled training sample 112U in the threshold number ofunlabeled training samples 112UT selected from the sortedunlabeled training samples 112U in the set ofunlabeled training samples 112U paired with the corresponding obtainedground truth label 132G. Atstep 416, themethod 400 includes training, by thedata processing hardware 12, using the current set of labeledtraining samples 112L and a proper subset ofunlabeled training samples 112UP from the set ofunlabeled training samples 112U, themachine learning model 130. - Thus, the
model trainer 110 may identifyunlabeled training samples 112U that have a high potential for performance improvement relative to the performance improvement of otherunlabeled training samples 112U without increasing (and potentially reducing) the total labeling cost (e.g., expenditure of computation resources, consumption of human annotator time, etc.). Themodel trainer 110 also determines an appropriate size for an initial or starting set of labeled training examples 112L by using a cost-efficient approach that avoids overhead stemming from starting with large sets of labeleddata samples 112L while also ensuring optimal model performance with a limited number of labeledtraining samples 112L (i.e., compared to conventional techniques). -
FIG. 5 is a schematic view of anexample computing device 500 that may be used to implement the systems and methods described in this document. Thecomputing device 500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document. - The
computing device 500 includes aprocessor 510,memory 520, astorage device 530, a high-speed interface/controller 540 connecting to thememory 520 and high-speed expansion ports 550, and a low speed interface/controller 560 connecting to a low speed bus 570 and astorage device 530. Each of the 510, 520, 530, 540, 550, and 560, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. Thecomponents processor 510 can process instructions for execution within thecomputing device 500, including instructions stored in thememory 520 or on thestorage device 530 to display graphical information for a graphical user interface (GUI) on an external input/output device, such asdisplay 580 coupled tohigh speed interface 540. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also,multiple computing devices 500 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system). - The
memory 520 stores information non-transitorily within thecomputing device 500. Thememory 520 may be a computer-readable medium, a volatile memory unit(s), or non-volatile memory unit(s). Thenon-transitory memory 520 may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by thecomputing device 500. Examples of non-volatile memory include, but are not limited to, flash memory and read-only memory (ROM)/programmable read-only memory (PROM)/erasable programmable read-only memory (EPROM)/electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs). Examples of volatile memory include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes. - The
storage device 530 is capable of providing mass storage for thecomputing device 500. In some implementations, thestorage device 530 is a computer-readable medium. In various different implementations, thestorage device 530 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. In additional implementations, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as thememory 520, thestorage device 530, or memory onprocessor 510. - The
high speed controller 540 manages bandwidth-intensive operations for thecomputing device 500, while thelow speed controller 560 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only. In some implementations, the high-speed controller 540 is coupled to thememory 520, the display 580 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 550, which may accept various expansion cards (not shown). In some implementations, the low-speed controller 560 is coupled to thestorage device 530 and a low-speed expansion port 590. The low-speed expansion port 590, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter. - The
computing device 500 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as astandard server 500 a or multiple times in a group ofsuch servers 500 a, as alaptop computer 500 b, or as part of arack server system 500 c. - Various implementations of the systems and techniques described herein can be realized in digital electronic and/or optical circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- A software application (i.e., a software resource) may refer to computer software that causes a computing device to perform a task. In some examples, a software application may be referred to as an “application,” an “app,” or a “program.” Example applications include, but are not limited to, system diagnostic applications, system management applications, system maintenance applications, word processing applications, spreadsheet applications, messaging applications, media streaming applications, social networking applications, and gaming applications.
- These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” and “computer-readable medium” refer to any computer program product, non-transitory computer readable medium, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
- The processes and logic flows described in this specification can be performed by one or more programmable processors, also referred to as data processing hardware, executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- To provide for interaction with a user, one or more aspects of the disclosure can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
- A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. Accordingly, other implementations are within the scope of the following claims.
Claims (20)
1. A computer-implemented method executed by data processing hardware that causes the data processing hardware to perform operations comprising:
obtaining a set of unlabeled training samples;
for each particular unlabeled training sample in the set of unlabeled training samples:
generating, using a machine learning model and the particular unlabeled training sample, a corresponding first prediction;
generating, using the machine learning model and a modified unlabeled training sample, a corresponding second prediction, the modified unlabeled training sample based on the particular unlabeled training sample; and
determining a corresponding difference between the corresponding first prediction and the corresponding second prediction;
selecting, based on the corresponding differences, a subset of the set of unlabeled training samples;
for each particular unlabeled training sample in the subset of the set of unlabeled training samples:
obtaining a corresponding ground truth label for the particular unlabeled training sample; and
generating a corresponding labeled training sample based on the particular unlabeled training sample paired with the corresponding ground truth label; and
training the machine learning model using the corresponding labeled training samples.
2. The method of claim 1 , wherein a number of unlabeled training samples in the subset of the set of unlabeled training samples is less than a cardinality of the set of unlabeled training samples.
3. The method of claim 1 , wherein selecting, based on the corresponding differences, the subset of the set of unlabeled training samples comprises selecting unlabeled training samples having a corresponding difference satisfying a threshold.
4. The method of claim 1 , wherein selecting, based on the corresponding differences, the subset of the set of unlabeled training samples comprises selecting a threshold number of the unlabeled training samples having the largest corresponding differences.
5. The method of claim 1 , wherein the operations further comprise, during an initial active learning cycle:
randomly selecting a random set of unlabeled training samples from the set of unlabeled training samples;
for each particular unlabeled training sample in the random set of unlabeled training samples, obtaining a corresponding ground truth label; and
training the machine learning model using the random set of unlabeled training samples and the corresponding ground truth labels.
6. The method of claim 5 , wherein the operations further comprise, during the initial active learning cycle:
identifying a candidate set of unlabeled training samples from the set of unlabeled training samples, wherein a cardinality of the candidate set of unlabeled training samples is less than a cardinality of the set of unlabeled training samples;
determining a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples;
determining a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples;
determining that the first cross entropy is greater than or equal to the second cross entropy; and
based on determining that the first cross entropy is greater than or equal to the second cross entropy, selecting the candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
7. The method of claim 6 , wherein identifying the candidate set of unlabeled training samples from the set of unlabeled training samples comprises determining the corresponding difference for each unlabeled training sample of the set of unlabeled training samples.
8. The method of claim 7 , wherein the operations further comprise, when the first cross entropy is less than the second cross entropy:
randomly selecting an expanded set of unlabeled training samples from the set of unlabeled training samples;
updating the candidate set of unlabeled training samples to include the expanded set of unlabeled training samples randomly selected from the set of unlabeled training samples;
updating the set of unlabeled training samples by removing each unlabeled training sample from the expanded set of unlabeled training samples from the set of unlabeled training samples; and
during an immediately subsequent active learning cycle:
determining the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the updated candidate set of unlabeled training samples;
determining the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the unlabeled training samples in the updated candidate set of unlabeled training samples;
determining that the first cross entropy is greater than or equal to the second cross entropy; and
based on determining that the first cross entropy is greater than or equal to the second cross entropy, selecting a size of the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
9. The method of claim 1 , wherein the machine learning model comprises a convolutional neural network.
10. The method of claim 1 , wherein the corresponding difference between the corresponding first prediction and the corresponding second prediction represents a variance between the corresponding first prediction and the corresponding second prediction.
11. A system comprising:
data processing hardware; and
memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising:
obtaining a set of unlabeled training samples;
for each particular unlabeled training sample in the set of unlabeled training samples:
generating, using a machine learning model and the particular unlabeled training sample, a corresponding first prediction;
generating, using the machine learning model and a modified unlabeled training sample, a corresponding second prediction, the modified unlabeled training sample based on the particular unlabeled training sample; and
determining a corresponding difference between the corresponding first prediction and the corresponding second prediction;
selecting, based on the corresponding differences, a subset of the set of unlabeled training samples;
for each particular unlabeled training sample in the subset of the set of unlabeled training samples:
obtaining a corresponding ground truth label for the particular unlabeled training sample; and
generating a corresponding labeled training sample based on the particular unlabeled training sample paired with the corresponding ground truth label; and
training the machine learning model using the corresponding labeled training samples.
12. The system of claim 11 , wherein a number of unlabeled training samples in the subset of the set of unlabeled training samples is less than a cardinality of the set of unlabeled training samples.
13. The system of claim 11 , wherein selecting, based on the corresponding differences, the subset of the set of unlabeled training samples comprises selecting unlabeled training samples having a corresponding difference satisfying a threshold.
14. The system of claim 11 wherein selecting, based on the corresponding differences, the subset of the set of unlabeled training samples comprises selecting a threshold number of the unlabeled training samples having the largest corresponding differences.
15. The system of claim 11 , wherein the operations further comprise, during an initial active learning cycle:
randomly selecting a random set of unlabeled training samples from the set of unlabeled training samples;
for each particular unlabeled training sample in the random set of unlabeled training samples, obtaining a corresponding ground truth label; and
training the machine learning model using the random set of unlabeled training samples and the corresponding ground truth labels.
16. The system of claim 15 , wherein the operations further comprise, during the initial active learning cycle:
identifying a candidate set of unlabeled training samples from the set of unlabeled training samples, wherein a cardinality of the candidate set of unlabeled training samples is less than a cardinality of the set of unlabeled training samples;
determining a first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the candidate set of unlabeled training samples;
determining a second cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the set of unlabeled training samples;
determining that the first cross entropy is greater than or equal to the second cross entropy; and
based on determining that the first cross entropy is greater than or equal to the second cross entropy, selecting the candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
17. The system of claim 16 , wherein identifying the candidate set of unlabeled training samples from the set of unlabeled training samples comprises determining the corresponding difference for each unlabeled training sample of the set of unlabeled training samples.
18. The system of claim 17 , further comprising, when the first cross entropy is less than the second cross entropy:
randomly selecting an expanded set of unlabeled training samples from the set of unlabeled training samples;
updating the candidate set of unlabeled training samples to include the expanded set of unlabeled training samples randomly selected from the set of unlabeled training samples;
updating the set of unlabeled training samples by removing each unlabeled training sample from the expanded set of unlabeled training samples from the set of unlabeled training samples; and
during an immediately subsequent active learning cycle:
determining the first cross entropy between a distribution of ground truth labels and a distribution of predicted labels generated using the machine learning model for the unlabeled training samples in the updated candidate set of unlabeled training samples;
determining the second cross entropy between the distribution of ground truth labels and a distribution of predicted labels generating using the machine learning model for the unlabeled training samples in the updated candidate set of unlabeled training samples;
determining that the first cross entropy is greater than or equal to the second cross entropy; and
based on determining that the first cross entropy is greater than or equal to the second cross entropy, selecting a size of the updated candidate set of unlabeled training samples as a starting size for initially training the machine learning model.
19. The system of claim 11 , wherein the machine learning model comprises a convolutional neural network.
20. The system of claim 11 , wherein the corresponding difference between the corresponding first prediction and the corresponding second prediction represents a variance between the corresponding first prediction and the corresponding second prediction.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/333,998 US20230325676A1 (en) | 2019-08-22 | 2023-06-13 | Active learning via a sample consistency assessment |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962890379P | 2019-08-22 | 2019-08-22 | |
| US17/000,094 US12271822B2 (en) | 2019-08-22 | 2020-08-21 | Active learning via a sample consistency assessment |
| US18/333,998 US20230325676A1 (en) | 2019-08-22 | 2023-06-13 | Active learning via a sample consistency assessment |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/000,094 Continuation US12271822B2 (en) | 2019-08-22 | 2020-08-21 | Active learning via a sample consistency assessment |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20230325676A1 true US20230325676A1 (en) | 2023-10-12 |
Family
ID=72560891
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/000,094 Active 2043-03-30 US12271822B2 (en) | 2019-08-22 | 2020-08-21 | Active learning via a sample consistency assessment |
| US18/333,998 Abandoned US20230325676A1 (en) | 2019-08-22 | 2023-06-13 | Active learning via a sample consistency assessment |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/000,094 Active 2043-03-30 US12271822B2 (en) | 2019-08-22 | 2020-08-21 | Active learning via a sample consistency assessment |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US12271822B2 (en) |
| EP (1) | EP4018382B1 (en) |
| JP (2) | JP7293498B2 (en) |
| KR (1) | KR20220047851A (en) |
| CN (1) | CN114600117A (en) |
| WO (1) | WO2021035193A1 (en) |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11423264B2 (en) * | 2019-10-21 | 2022-08-23 | Adobe Inc. | Entropy based synthetic data generation for augmenting classification system training data |
| CN112085219B (en) * | 2020-10-13 | 2024-02-13 | 北京百度网讯科技有限公司 | Model training method, SMS review method, device, equipment and storage medium |
| US20220156574A1 (en) * | 2020-11-19 | 2022-05-19 | Kabushiki Kaisha Toshiba | Methods and systems for remote training of a machine learning model |
| WO2022202456A1 (en) * | 2021-03-22 | 2022-09-29 | 株式会社日立製作所 | Appearance inspection method and appearance inspection system |
| CN113033537B (en) * | 2021-03-25 | 2022-07-01 | 北京百度网讯科技有限公司 | Method, apparatus, apparatus, medium and program product for training a model |
| CN113033665B (en) * | 2021-03-26 | 2024-07-19 | 北京沃东天骏信息技术有限公司 | Sample expansion method, training method and system and sample learning system |
| CN113158051B (en) * | 2021-04-23 | 2022-11-18 | 山东大学 | Label sorting method based on information propagation and multilayer context information modeling |
| CN113408650B (en) * | 2021-07-12 | 2023-07-18 | 厦门大学 | Semi-supervised 3D shape recognition method based on consistency training |
| CN113761842B (en) * | 2021-09-07 | 2024-12-20 | 联想(北京)有限公司 | Data processing method, device and electronic equipment |
| KR20240068704A (en) * | 2021-09-30 | 2024-05-17 | 구글 엘엘씨 | Contrast Siamese networks for semi-supervised speech recognition. |
| US20240403723A1 (en) * | 2021-10-22 | 2024-12-05 | Nec Corporation | Information processing device, information processing method, and recording medium |
| CN114444717B (en) * | 2022-01-25 | 2025-08-22 | 杭州海康威视数字技术股份有限公司 | Autonomous learning method, device, electronic device and machine-readable storage medium |
| CN114925773B (en) * | 2022-05-30 | 2024-12-03 | 阿里巴巴(中国)有限公司 | Model training method, device, electronic device and storage medium |
| US20240013777A1 (en) * | 2022-07-11 | 2024-01-11 | Google Llc | Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition |
| KR102792266B1 (en) | 2022-11-09 | 2025-04-08 | 주식회사 써티웨어 | System for providing user interface based on web for labeling of training data |
| KR102819246B1 (en) * | 2022-12-01 | 2025-06-11 | 주식회사 써티웨어 | System and method for data labeling based on deep active learning through the whole data lifecycle |
| CN117009883B (en) * | 2023-09-28 | 2024-04-02 | 腾讯科技(深圳)有限公司 | Object classification model construction method, object classification method, device and equipment |
| US20250232009A1 (en) * | 2024-01-12 | 2025-07-17 | Optum, Inc. | Dataset labeling using large language model and active learning |
| CN119443086B (en) * | 2024-10-21 | 2025-09-30 | 平安科技(深圳)有限公司 | Syntax error correction method, system, device and medium based on consistency learning |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150379429A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Interactive interfaces for machine learning model evaluations |
| US20180359416A1 (en) * | 2017-06-13 | 2018-12-13 | Adobe Systems Incorporated | Extrapolating lighting conditions from a single digital image |
| US20200084427A1 (en) * | 2018-09-12 | 2020-03-12 | Nvidia Corporation | Scene flow estimation using shared features |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5196425B2 (en) | 2008-03-07 | 2013-05-15 | Kddi株式会社 | Support vector machine relearning method |
| US8825570B2 (en) * | 2012-07-31 | 2014-09-02 | Hewlett-Packard Development Company, L.P. | Active learning with per-case symmetrical importance scores |
| US9378065B2 (en) * | 2013-03-15 | 2016-06-28 | Advanced Elemental Technologies, Inc. | Purposeful computing |
| US10534809B2 (en) * | 2016-08-10 | 2020-01-14 | Zeekit Online Shopping Ltd. | Method, system, and device of virtual dressing utilizing image processing, machine learning, and computer vision |
| US11120337B2 (en) * | 2017-10-20 | 2021-09-14 | Huawei Technologies Co., Ltd. | Self-training method and system for semi-supervised learning with generative adversarial networks |
| CN108021931A (en) * | 2017-11-20 | 2018-05-11 | 阿里巴巴集团控股有限公司 | A kind of data sample label processing method and device |
| US10726313B2 (en) * | 2018-04-19 | 2020-07-28 | Adobe Inc. | Active learning method for temporal action localization in untrimmed videos |
| US11361197B2 (en) * | 2018-06-29 | 2022-06-14 | EMC IP Holding Company LLC | Anomaly detection in time-series data using state inference and machine learning |
| US11580002B2 (en) * | 2018-08-17 | 2023-02-14 | Intensity Analytics Corporation | User effort detection |
| CN109036389A (en) * | 2018-08-28 | 2018-12-18 | 出门问问信息科技有限公司 | The generation method and device of a kind of pair of resisting sample |
| CN109272031B (en) * | 2018-09-05 | 2021-03-30 | 宽凳(北京)科技有限公司 | Training sample generation method, device, equipment and medium |
| CN109376796A (en) * | 2018-11-19 | 2019-02-22 | 中山大学 | Image classification method based on active semi-supervised learning |
| CN109472318B (en) * | 2018-11-27 | 2021-06-04 | 创新先进技术有限公司 | Method and device for selecting features for constructed machine learning model |
-
2020
- 2020-08-21 JP JP2022511319A patent/JP7293498B2/en active Active
- 2020-08-21 CN CN202080073812.2A patent/CN114600117A/en active Pending
- 2020-08-21 WO PCT/US2020/047534 patent/WO2021035193A1/en not_active Ceased
- 2020-08-21 EP EP20775092.8A patent/EP4018382B1/en active Active
- 2020-08-21 KR KR1020227009141A patent/KR20220047851A/en active Pending
- 2020-08-21 US US17/000,094 patent/US12271822B2/en active Active
-
2023
- 2023-06-07 JP JP2023094205A patent/JP7507287B2/en active Active
- 2023-06-13 US US18/333,998 patent/US20230325676A1/en not_active Abandoned
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150379429A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Interactive interfaces for machine learning model evaluations |
| US20180359416A1 (en) * | 2017-06-13 | 2018-12-13 | Adobe Systems Incorporated | Extrapolating lighting conditions from a single digital image |
| US20200084427A1 (en) * | 2018-09-12 | 2020-03-12 | Nvidia Corporation | Scene flow estimation using shared features |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114600117A (en) | 2022-06-07 |
| JP2022545476A (en) | 2022-10-27 |
| US20210056417A1 (en) | 2021-02-25 |
| EP4018382A1 (en) | 2022-06-29 |
| JP2023126769A (en) | 2023-09-12 |
| KR20220047851A (en) | 2022-04-19 |
| EP4018382B1 (en) | 2025-10-01 |
| US12271822B2 (en) | 2025-04-08 |
| JP7293498B2 (en) | 2023-06-19 |
| WO2021035193A1 (en) | 2021-02-25 |
| JP7507287B2 (en) | 2024-06-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20230325676A1 (en) | Active learning via a sample consistency assessment | |
| US20230351192A1 (en) | Robust training in the presence of label noise | |
| US20230325675A1 (en) | Data valuation using reinforcement learning | |
| US10515443B2 (en) | Utilizing deep learning to rate attributes of digital images | |
| US20190354810A1 (en) | Active learning to reduce noise in labels | |
| KR102748213B1 (en) | Reinforcement learning based on locally interpretable models | |
| US10867246B1 (en) | Training a neural network using small training datasets | |
| KR102824651B1 (en) | Framework for L2TL (Learning to Transfer Learn) | |
| Hegde et al. | Aspect based feature extraction and sentiment classification of review data sets using Incremental machine learning algorithm | |
| US20250148280A1 (en) | Techniques for learning co-engagement and semantic relationships using graph neural networks | |
| US11829442B2 (en) | Methods and systems for efficient batch active learning of a deep neural network | |
| CN108804577A (en) | A kind of predictor method of information label interest-degree | |
| US20230359938A1 (en) | Contrastive Sequence-to-Sequence Data Selector | |
| WO2025101527A1 (en) | Techniques for learning co-engagement and semantic relationships using graph neural networks | |
| US20220253694A1 (en) | Training neural networks with reinitialization | |
| US20240249204A1 (en) | Active Selective Prediction Using Ensembles and Self-training | |
| US20250094727A1 (en) | System and method for determining topics based on selective topic models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, ZIZHAO;PFISTER, TOMAS JON;ARIK, SERCAN OMER;AND OTHERS;SIGNING DATES FROM 20190830 TO 20190914;REEL/FRAME:063940/0576 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |