WO2023080509A1 - Procédé et dispositif pour l'apprentissage d'étiquettes bruyantes par estimation efficace de la matrice de transition - Google Patents
Procédé et dispositif pour l'apprentissage d'étiquettes bruyantes par estimation efficace de la matrice de transition Download PDFInfo
- Publication number
- WO2023080509A1 WO2023080509A1 PCT/KR2022/016182 KR2022016182W WO2023080509A1 WO 2023080509 A1 WO2023080509 A1 WO 2023080509A1 KR 2022016182 W KR2022016182 W KR 2022016182W WO 2023080509 A1 WO2023080509 A1 WO 2023080509A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- classifier
- clean
- noise
- input data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present disclosure relates to a method and apparatus for learning noise labels through efficient conversion matrix estimation.
- the classifier determines which class the input data belongs to, and classifies the input data into the most similar class among the preset classes even if the input data does not belong to any of the preset classes.
- the classifier performs learning based on the result of classifying the input data and the label of the input data.
- the labeling of the input data is incorrect, that is, when a noise label is learned, the performance of the classification model decreases. can do. Since training a classification model with only clean labels has limitations such as quantitative limitations, various studies on how to learn noise labels have been conducted.
- An object of the present disclosure is to provide a method and apparatus for learning noise labels through efficient conversion matrix estimation.
- the problems to be solved by the present disclosure are not limited to the above-mentioned problems, and other problems and advantages of the present disclosure that are not mentioned can be understood by the following description and more clearly understood by the embodiments of the present disclosure. It will be. In addition, it will be appreciated that the problems and advantages to be solved by the present disclosure can be realized by the means and combinations indicated in the claims.
- a method for learning a noise label includes a transition matrix based on output data of a noise classifier for first input data and a first label corresponding to the first input data. ) estimating; Clean based on output data of the clean classifier for the first input data, output data of the clean classifier for the second input data, the first label, a second label corresponding to the second input data, and the conversion matrix. calculating classifier loss; calculating a noise classifier loss based on the output data of the noise classifier for the second input data and the second label; and allowing the clean classifier and the noise classifier to learn a noise label based on the clean classifier loss and the noise classifier loss.
- a computer-readable recording medium includes a recording medium on which a program for executing the method described above is recorded on a computer.
- An apparatus for learning a noise label includes a memory in which at least one program is stored; and a processor operating by executing the at least one program, wherein the processor estimates a conversion matrix based on output data of a noise classifier for first input data and a first label corresponding to the first input data. and based on output data of the clean classifier for the first input data, output data of the clean classifier for the second input data, the first label, a second label corresponding to the second input data, and the conversion matrix.
- a label transition matrix can be efficiently estimated, and a classifier can be effectively trained using not only clean labels but also noise labels through the estimated transition matrix.
- the conversion matrix is estimated adaptively, so that the classifier does not blindly trust samples that may contain already corrected labels, avoiding problems related to miscorrection.
- the noise label can be continuously corrected in real time while learning the noise label at every iteration.
- 1 is a diagram for explaining an example of classifying input data into at least one class.
- FIG. 2 is a flowchart illustrating an example of a method of learning a noise label according to an embodiment.
- FIG. 3 is a schematic diagram illustrating a processor forming a clean batch and a noise batch according to an embodiment.
- FIG. 4 is a schematic diagram illustrating a process of estimating a conversion matrix by a processor according to an embodiment.
- FIG. 5 is a schematic diagram illustrating a process of calculating a clean classifier loss by a processor according to an embodiment.
- FIG. 6 is a schematic diagram illustrating a process of calculating a noise classifier loss by a processor according to an embodiment.
- FIG. 7 is a schematic diagram illustrating a process of optimizing parameters of a clean classifier and a noise classifier by a processor according to an embodiment.
- FIG. 8 is a schematic diagram illustrating a two-head architecture according to one embodiment.
- FIG. 10 is a block diagram illustrating an apparatus for learning a noise label according to an exemplary embodiment.
- FIG. 11 is a diagram for explaining an example in which a final model is utilized according to an embodiment.
- a method for learning a noise label according to the first aspect includes estimating a transition matrix based on output data of a noise classifier for first input data and a first label corresponding to the first input data. doing; Clean based on output data of the clean classifier for the first input data, output data of the clean classifier for the second input data, the first label, a second label corresponding to the second input data, and the conversion matrix. calculating classifier loss; calculating a noise classifier loss based on the output data of the noise classifier for the second input data and the second label; and training the clean classifier and the noise classifier based on the clean classifier loss and the noise classifier loss.
- Some embodiments of the present disclosure may be represented as functional block structures and various processing steps. Some or all of these functional blocks may be implemented as a varying number of hardware and/or software components that perform specific functions.
- functional blocks of the present disclosure may be implemented by one or more microprocessors or circuit configurations for a predetermined function.
- the functional blocks of this disclosure may be implemented in various programming or scripting languages.
- Functional blocks may be implemented as an algorithm running on one or more processors.
- the present disclosure may employ prior art for electronic environment setting, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “means” and “component” may be used broadly and are not limited to mechanical and physical components.
- 1 is a diagram for explaining an example of classifying input data into at least one class.
- FIG. 1 shows an example of input data 110, a classifier (classification model) 120, and a classification result 130.
- FIG. 1 shows that the input data 110 is classified into a total of three classes, the number of classes is not limited to the example of FIG. 1 .
- the type of input data 110 is not limited.
- the input data 110 may correspond to various types of data such as images, texts, and voices.
- the classifier 120 may calculate a probability that the input data 110 is classified into a specific class. For example, the classifier 120 may calculate a probability that input data is classified for each class using a softmax function and cross entropy.
- the classifier 120 classifies the input image into a first class or a second class. Even if the input data 110 is an animal image, the classifier 120 classifies the input image into a class determined to be more similar among the first class and the second class.
- the classifier 120 may be learned by learning data.
- the classifier 120 may be trained in a direction in which an error between a result output from the classifier 120 and an actual correct answer is reduced.
- supervised learning achieves many achievements by using a vast amount of annotated data, such as image classification, object detection and face recognition. It can solve various classification tasks. It has been theoretically and empirically proven that the performance of supervised learning-based classification models improves as the size of the annotated data increases. However, it is impossible to avoid noise labels since not all data can be finely annotated.
- the recently proposed method further improves the performance of the classifier by utilizing a small, inexpensively acquired clean data set.
- the loss correction method reduces the influence of noise labels by modifying the loss function, and the re-weighting method penalizes samples that are likely to be noisy labels. ) do.
- recent label correction methods can achieve remarkable performance based on model-agnostic meta-learning (MAML). This method directly reduces the noise level by re-labeling the noise labels, raising the theoretical upper limit of prediction performance.
- MAML model-agnostic meta-learning
- the MAML-based label correction method has two problems. One is that MAML-based label calibration methods blindly trust previously miscalibrated labels. Miscalibrated labels are often retained during training, which causes the model to learn miscalibrated labels as ground-truth labels. The other is that MAML-based label correction methods are inherently slow to learn, resulting in excessive computational overhead. This inefficiency is due to the fact that the MAML-based label correction method involves performing a virtual update with a noisy data set, finding optimal parameters using a clean data set, and updating real parameters with the found parameters, resulting in a single iteration ( It comes from having multiple training steps per iteration.
- FasTEN learns noise labels by efficiently estimating the transition matrix, while continuously correcting them in real time.
- noise label learning through efficient conversion matrix estimation based on a method called FasTEN proposed in the present disclosure will be described in detail with reference to FIGS. 2 to 8 .
- FIG. 2 is a flowchart illustrating an example of a method of learning a noise label according to an embodiment.
- the operations shown in FIG. 2 may be executed by a noise label learning apparatus to be described later. Specifically, the method of learning the noise label shown in FIG. 2 may be executed by the processor shown in FIG. 10 .
- the processor forms a clean batch composed of one or more first input data included in the clean data set and one or more first labels corresponding to the one or more first input data included in the clean data set, and one or more second input data included in the noise data set
- a noise array composed of data and one or more second labels corresponding thereto may be formed.
- the clean data set includes a plurality of first input data and a plurality of first labels respectively corresponding to the plurality of first input data.
- the clean data set includes a plurality of first input data and first labels matched thereto.
- all of the plurality of first labels included in the clean data set are accurately labeled for each of the plurality of first input data.
- the noise data set includes a plurality of second input data and a plurality of second labels respectively corresponding to the plurality of second input data.
- the noise data set includes a plurality of second input data and second labels matched thereto.
- at least some of the plurality of second labels included in the noise data set are incorrectly labeled for each of the plurality of second input data.
- some of the plurality of second labels are incorrectly labeled for each of the plurality of second input data.
- a clean batch and a noise batch may be configured by randomly selecting from the clean data set and the noise data set, respectively.
- the size of the clean batch and the size of the noise batch may be set to be the same, but are not limited thereto.
- the processor may estimate a transition matrix based on the output data of the noise classifier for the first input data and the first label.
- a noise classifier refers to a classifier that is learned using a noisy data set.
- a noise classifier may include a linear classifier and a feature extractor.
- a noise classifier is used to estimate a transition matrix, and the estimated transition matrix is used to learn noise labels.
- the processor may calculate a clean classifier loss based on output data of the clean classifier for the first input data, output data of the clean classifier for the second input data, the first label, the second label, and the conversion matrix. .
- the clean classifier is trained using not only the clean data set but also the noise data set.
- the clean classifier loss is a value used to train the clean classifier and, in an embodiment, may be calculated based on a loss based on clean data and a loss based on noise data.
- a clean classifier like a noise classifier, may include a linear classifier and a feature extractor.
- the processor may calculate a noise classifier loss based on the second label and output data of the noise classifier for the second input data.
- the noise classifier is a classifier that is learned using a noisy data set, and the noise classifier loss is used to train the noise classifier.
- the noise classifier loss may be calculated based on the output data of the noise classifier for the second input data and the second label.
- the processor may cause the clean classifier and the noise classifier to learn a noise label based on the clean classifier loss and the noise classifier loss.
- a clean classifier and a noise classifier may be trained based on the clean classifier loss and the noise classifier loss calculated in the previous step.
- a clean classifier and a noise classifier may constitute a two-head architecture sharing a feature extractor, and through the two-head architecture, efficient learning through a single backpropagation may be realized.
- FIG. 3 is a schematic diagram illustrating a processor forming a clean batch and a noise batch according to an embodiment.
- a batch 320 used for one iteration may be formed from a data set 310 .
- the data set 310 can be divided into multiple batches, and one batch 320 can be used for one iteration.
- the data set 310 is a set of data used for learning a classifier (classification model).
- the data set 310 may include a plurality of input data and a plurality of labels corresponding thereto.
- batch 320 may also consist of one or more input data and one or more labels corresponding thereto.
- the data set 310 may be a clean data set or a noise data set. That is, a clean batch may be formed from a clean data set, and a noise batch may be formed from a noise data set.
- the processor may form batches to have the same number of samples per class.
- the processor may construct a clean batch by randomly selecting K samples from each of the N classes in the clean data set (where N and K are natural numbers greater than or equal to 1). Each sample consists of input data and clean labels corresponding to the input data. For example, a clean batch may be constructed according to Equation 1 below.
- Equation 1 denotes a clean batch, denotes an input, Is refers to the clean label of
- the processor may configure the noise batch by randomly selecting M samples from the noise data set.
- Each sample consists of input data and a noise label corresponding to the input data.
- the noise arrangement may be configured according to Equation 2 below.
- Equation 2 above denotes the noise placement, denotes an input, Is refers to the noise label of In one embodiment, M may be equal to KN.
- FIG. 4 is a schematic diagram illustrating a process of estimating a conversion matrix by a processor according to an embodiment.
- the input included in the clean batch 410 (corresponding to the first input data) is input to the noise classifier 420 .
- As output data through the noise classifier 420 is output.
- Via calculator 430 and the inputs included in the clean batch 410 label corresponding to conversion matrix estimated as output based on (corresponding to the first label) is output.
- denotes the conversion matrix means that the conversion matrix is estimated.
- Each component of is a clean label Ga Noise Label It can be defined as the probability of being contaminated by For example, if there are 4 classes in the data set: 'cat', 'dog', 'bird', and 'airplane', the class of any input data that would be inappropriate to be labeled as 'cat' is 'cat'. The probability of being classified as a 'cat' and labeled 'cat' will be greater when the class of the input data is actually 'dog' than when it is 'bird' or 'airplane'.
- the probability that the class of the input data that is inappropriate to be labeled as 'airplane' is classified as 'airplane' and labeled as 'airplane' is the probability that the class of the input data is actually 'bird' if it is 'cat' or 'airplane'. It will be bigger than the case of 'dog'.
- the conversion matrix is a matrix representing the probability of being misclassified between classes.
- a small amount of clean data set can be utilized to estimate the conversion matrix.
- building a clean data set costs more than building a noisy data set.
- the learning speed can be increased by using a two-head architecture so that the conversion matrix can be learned with a single backpropagation.
- noise classifier 420 may be used to estimate the transition matrix.
- the noise classifier 420 of the present disclosure refers to a classifier that is trained with a noise batch constructed from a noise data set. Because noise classifier 420 is trained on noise data, having noise classifier 420 classify clean data will classify it incorrectly. For example, a classifier trained to classify an image of a cat as having a class of 'dog' will classify the class as 'dog' even when another cat image is input. Based on these characteristics of the noise classifier 420 and the clean data, a conversion matrix can be estimated using "how the noise classifier 420 misclassifies".
- the conversion matrix is class-independent and instance-independent, i.e. can be designed to be
- the feature extractor and a linear classifier By parameterizing can be expressed as, where denotes a noise classifier trained only with noise labels.
- a noise classifier may include a linear classifier and a feature extractor. noise classifier If gives a perfect prediction for noisy data, as shown in Equation 4 below, included in the clean batch and Conversion probability using can be estimated.
- Equation 4 above denotes a clean batch, denotes an input, Is refers to the clean label of In Equation 4 above, Is a feature extractor included in the noise classifier, Means a linear classifier included in the noise classifier, denotes a noise classifier.
- the conversion matrix estimated through the process described above may then be used to calculate the clean classifier loss.
- the noise classifier 420 and the calculator 430 are shown as separate functional blocks for convenience of description and functional division of the operation of the processor and the flow of data according to an embodiment of the present disclosure, but the noise classifier 420 ) and the operator 430 may be configured as one piece of hardware.
- FIG. 5 is a schematic diagram illustrating a process of calculating a clean classifier loss by a processor according to an embodiment.
- the input included in the clean batch 510 is input to the clean classifier 530.
- As output data through the clean classifier 530 is output.
- the input (corresponding to the second input data) is input to the clean classifier 530.
- As output data through the clean classifier 530 is output Via calculator 540 , the input included in the clean batch 510 label corresponding to , the input included in the noise batch 520. label corresponding to (corresponding to the second label) and the transition matrix Calculated clean classifier loss as output based on is output
- Calculating the clean classifier loss is for training the clean classifier 530, for example the clean classifier loss Can be calculated according to Equation 5 below.
- Equation 5 above denotes a clean batch, denotes the input included in the clean batch, Is refers to the clean label of In Equation 5 above, denotes the noise placement, denotes the input included in the noise batch, Is refers to the noise label of In Equation 5 above, Means the cross-entropy loss function, is a feature extractor included in the clean classifier, Means a linear classifier included in the clean classifier, denotes a clean classifier.
- Output data of the clean classifier 530 for and label corresponding to A clean data based loss is calculated based on Output data of the clean classifier 530 for , the transition matrix and label corresponding to A noise data based loss is calculated based on , and a clean classifier loss based on the calculated clean data based loss and the noise data based loss. This is calculated
- transition matrix is estimated correctly, then the clean classifier can be statistically consistent. This approach avoids the miscorrection problem because the clean classifier does not blindly trust the corrected label.
- the clean classifier 530 of the present disclosure may be learned by utilizing not only clean data (clean batch) but also noise data (noise batch).
- noise data includes labels with noise, since the amount of data is larger than that of clean data, it is efficient to use this to train the clean classifier 530. It should be noted that such noisy data can be exploited if the conversion matrix is estimated correctly.
- the clean classifier loss calculated in this way can then be used to optimize the parameters of the clean classifier.
- the clean classifier 530 and the calculator 540 are shown as separate functional blocks for convenience of description and functional division of the operation of the processor and the flow of data according to an embodiment of the present disclosure, but the clean classifier 530 ) and the operator 540 may be configured as one piece of hardware.
- FIG. 6 is a schematic diagram illustrating a process of calculating a noise classifier loss by a processor according to an embodiment.
- the input included in the noise batch 610 is input to the noise classifier 620.
- As output data through the noise classifier 620 is output Via calculator 630 and the inputs included in the noise batch 610.
- label corresponding to Noise classifier loss computed as output based on is output.
- the noise classifier 620 is trained through noise data. Because of this, if the noise classifier 620 classifies clean data, it may be classified incorrectly.
- Computing the noise classifier loss is for training the noise classifier 620, e.g., the noise classifier loss. Can be calculated through Equation 7 below.
- Equation 6 above denotes the noise placement, denotes an input, Is refers to the noise label of In Equation 6 above, Means the cross entropy loss function, Is a feature extractor included in the noise classifier, Means a linear classifier included in the noise classifier, denotes a noise classifier.
- the noise label distribution may be continuously shifted while the noise label is dynamically corrected by the label correction step to reduce the noise level.
- the noise classifier 620 and the calculator 630 are shown as separate functional blocks for convenience of description and functional division of the operation of the processor and the flow of data according to an embodiment of the present disclosure, but the noise classifier 620 ) and the operator 630 may be configured as one piece of hardware.
- FIG. 7 is a schematic diagram illustrating a process of optimizing parameters of a clean classifier and a noise classifier by a processor according to an embodiment.
- the clean classifier loss calculated in the above process Over-noise classifier loss Parameters of the clean classifier 720 and the noise classifier 730 are optimized based on .
- the clean classifier 720 can learn not only clean data but also noise data, and the noise classifier loss
- the noise classifier 730 for estimating the conversion matrix may be trained through a process of optimizing parameters of the noise classifier 730 based on .
- a calculator 710, a clean classifier 720, and a noise classifier 730 are shown as individual functional blocks for convenience of description and functional division of the operation of a processor and the flow of data according to an embodiment of the present disclosure.
- the calculator 710, the clean classifier 720, and the noise classifier 730 may be composed of a single piece of hardware.
- FIG. 8 is a schematic diagram illustrating a two-head architecture according to one embodiment.
- the clean classifier 810 includes a first linear classifier and a feature extractor
- the noise classifier 820 includes a second linear classifier and a feature extractor. At this time, the clean classifier 810 and the noise classifier 820 share the same feature extractor.
- the clean classifier is a linear classifier and feature extractor can include noise classifier is a linear classifier and feature extractor can include
- the clean classifier 810 and the noise classifier 820 may constitute a two-head architecture.
- the present disclosure adopts a two-head architecture composed of two classifiers sharing a feature extractor, i.e., a clean classifier 810 and a noise classifier 820.
- the feature extractor of the clean classifier 810 and the feature extractor of the noise classifier 820 is the same (i.e. ).
- the clean classifier 810 and the noise classifier 820 may share the same feature extractor.
- parameters of the feature extractor, the linear classifier of the clean classifier 810 and the linear classifier of the noise classifier 820 may be optimized through single back-propagation through the two-head architecture. there is.
- the clean classifier 810 and the noise classifier 820 may share weights by sharing the same feature extractor.
- the architecture of this embodiment does not share a linear classifier, since it is impractical to model both clean data distribution and noise data distribution with a single linear classifier.
- a clean classifier and a noise classifier are respectively and ( (equivalent to) is defined as the final objective function Can be generated as shown in Equation 7 below.
- Equation 7 above Is a hyper parameter and is a loss balancing factor.
- hyperparameter is the classifier is introduced into the objective function to prevent overfitting, and experiments are conducted to find the optimal value.
- the prior art requires a plurality of learning steps, which lowers the learning efficiency.
- the aforementioned MAML has been widely used for sample re-weighting, label correction, and label conversion matrix estimation.
- this approach first, a virtual update is performed with a noisy data set, optimal parameters are found using a clean data set, and real parameters are updated with the found parameters.
- This hypothetical update process requires backpropagation three times per iteration, which at least triples the computational cost.
- the processor may correct the noise labels included in the noise batch.
- the processor may calibrate the noise label in each iteration, so that the noise classifier that learns the noise data may also be updated in each iteration. Since the noise classifier is updated at every iteration, the conversion matrix used to learn the clean classifier can also be estimated differently at every iteration. That is, in the present disclosure, one of the problems of the prior art, that the model learns an erroneously calibrated label as a ground truth label, can be solved.
- the processor determines the input included in the batch of noise, represented by a probability vector.
- Output data of clean classifier for The label may be corrected based on a comparison result between values of elements included in and a threshold value.
- a clean classifier for an input included in a noise batch formed from a noise data set as output data which is represented by a probability vector.
- the largest of the elements in this probability vector is the threshold , the label corresponding to that element is corrected to a more probable label. Since this approach only relies on the most recent predictions of the model being trained, the decision to calibrate labels can change each time.
- calibrated labels can be expressed as in Equation 8 below.
- Equation 8 denotes the output data of the clean classifier, denotes a floor function, is the noise placement refers to the original label of
- the probability vector used for label correction decision is the output data of the clean classifier for the input data included in the noise batch. , which is the clean classifier loss It is data that has already been calculated in the process of calculating . That is, according to the present disclosure, more efficient label correction is possible by using an already calculated value again.
- the first linear classifier, the second linear classifier, and the feature extractor are shown as individual functional blocks for convenience of description and functional division of the operation of the processor and the flow of data according to an embodiment of the present disclosure, but the clean classifier
- the first linear classifier and feature extractor included in 810, the second linear classifier and feature extractor included in noise classifier 820, the clean classifier 810, and the noise classifier 820 included in the noise classifier 820 may be configured as one piece of hardware. .
- the noise label learning method through efficient conversion matrix estimation according to an embodiment of the present disclosure described with reference to FIGS. 2 to 8 can be summarized as the algorithm shown in FIG. 9 .
- FIG. 10 is a block diagram illustrating an apparatus for learning a noise label according to an exemplary embodiment.
- the noise label learning apparatus 1000 performs the noise label learning method described above with reference to FIGS. 1 to 8 . Therefore, even if omitted below, those skilled in the art can easily understand that the above description of the method for learning a noise label with reference to FIGS. 1 to 8 can be equally applied to the noise label learning apparatus 1000. there is.
- a noise label learning apparatus 1000 may include a processor 1010 and a memory 1020.
- the memory 1020 is operably connected to the processor 1010 and stores at least one program for the processor 1010 to operate. In addition, the memory 1020 stores all data related to the contents described above with reference to FIGS. 1 to 8 , such as learning data, input data, and class information.
- the memory 1020 may temporarily or permanently store data processed by the processor 1010 .
- the memory 1020 may include magnetic storage media or flash storage media, but is not limited thereto.
- the memory 1020 may include built-in memory and/or external memory, and may include volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, and NAND.
- volatile memory such as flash memory or NOR flash memory, flash drives such as SSD, compact flash (CF) card, SD card, Micro-SD card, Mini-SD card, Xd card, or memory stick;
- flash drives such as SSD, compact flash (CF) card, SD card, Micro-SD card, Mini-SD card, Xd card, or memory stick;
- CF compact flash
- SD card Secure Digital
- Micro-SD card Micro-SD card
- Mini-SD card Mini-SD card
- Xd card or memory stick
- it may include a storage device such as a HDD.
- the processor 1010 performs the method of classifying the input data described above with reference to FIGS. 1 to 8 according to a program stored in the memory 1020 .
- the processor 1010 forms a clean batch composed of one or more first input data included in the clean data set and one or more first labels corresponding to the one or more first input data included in the clean data set, and one or more second input data included in the noise data set and the corresponding one or more first labels. It is possible to form a noise batch consisting of one or more second labels.
- the first input data is an input included in the clean batch.
- the first label is clean label
- the second input data is an input included in the noise batch
- the second label is noise label of respond to
- the clean batch includes (N * K) samples selected from the clean data set, N is the number of classes included in the clean data set, K is the number of samples selected from each of the classes, N and K may be a natural number greater than or equal to 1.
- the noise batch is composed of (N * K) samples selected from the noise data set, and N and K may be natural numbers greater than or equal to 1.
- the processor 1010 may estimate a conversion matrix based on the output data of the noise classifier for the first input data and the first label.
- the output data of the noise classifier for the first input data is respond to
- the processor 1010 may calculate a clean classifier loss based on output data of the clean classifier for the first input data, output data of the clean classifier for the second input data, the first label, the second label, and the conversion matrix.
- the output data of the clean classifier for the first input data is
- the output data of the clean classifier for the second input data is respond to
- the processor 1010 calculates the clean classifier loss based on the output data and the first label of the clean classifier for the first input data, and calculates the clean classifier loss for the second input data. Calculating a noise data-based loss based on the output data of , a conversion matrix, and a second label, and calculating a clean classifier loss based on the clean data-based loss and the noise-data-based loss.
- the processor 1010 may calculate a noise classifier loss based on the output data of the noise classifier for the second input data and the second label.
- the output data of the noise classifier for the second input data is respond to
- the processor 1010 may calculate a noise classifier loss based on the output of the noise classifier for the second input data and the second label.
- the processor 1010 may cause the clean classifier and the noise classifier to learn noise labels based on the clean classifier loss and the noise classifier loss.
- the clean classifier may include a first linear classifier
- the noise classifier may include a second linear classifier
- the clean classifier and the noise classifier may share the same feature extractor.
- the processor 1010 may optimize the parameters of the feature extractor, the parameters of the first linear classifier, and the parameters of the second linear classifier through single backpropagation.
- the processor 1010 may correct a second label corresponding to the second input data. Specifically, correcting the second label by the processor 1010 generates a probability vector based on the output data of the clean classifier for the second input data, and determines the value of one or more elements included in the probability vector and the threshold value. and correcting the second label based on the comparison result.
- the processor 1010 may refer to a data processing device embedded in hardware having a physically structured circuit to perform functions expressed by codes or instructions included in a program.
- a microprocessor a central processing unit (CPU), a processor core, a multiprocessor, an application-specific ASIC Integrated circuit
- a processing device such as a field programmable gate array (FPGA) may be included, but is not limited thereto.
- FIG. 11 is a diagram for explaining an example in which a final model is utilized according to an embodiment.
- FIG. 11 shows a network configuration diagram including a server 1110 and a plurality of terminals 1121 to 1124 according to an embodiment.
- the server 1110 may be a mediation device that connects the plurality of terminals 1121 to 1124.
- the server 1110 may provide a mediation service to transmit/receive data between a plurality of terminals 1121 to 1124 .
- the server 1110 and the plurality of terminals 1121 to 1124 may be connected to each other through a communication network.
- the server 1110 may transmit data to or receive data from a plurality of terminals 1121 to 1124 through a communication network.
- the communication network may be implemented as one of a wired communication network, a wireless communication network, and a complex communication network.
- the communication network may include mobile communication networks such as 3G, Long Term Evolution (LTE), LTE-A, and 5G.
- the communication network may include a wired or wireless communication network such as Wi-Fi, Universal Mobile Telecommunications System (UMTS)/General Packet Radio Service (GPRS), or Ethernet.
- UMTS Universal Mobile Telecommunications System
- GPRS General Packet Radio Service
- Communication networks include Magnetic Secure Transmission (MST), Radio Frequency IDentification (RFID), Near Field Communication (NFC), ZigBee, Z-Wave, Bluetooth, and Bluetooth Low Energy (BLE).
- MST Magnetic Secure Transmission
- RFID Radio Frequency IDentification
- NFC Near Field Communication
- ZigBee Z-Wave
- Bluetooth Bluetooth Low Energy
- the communication network may include a local area network such as infrared communication (IR, InfraRed communication).
- the communication network may include a local area network (LAN), a metropolitan area network (MAN), or a wide area network (WAN).
- LAN local area network
- MAN metropolitan area network
- WAN wide area network
- Each of the plurality of terminals 1121 to 1124 may be implemented as one of a desktop computer, a laptop computer, a smart phone, a smart tablet, a smart watch, a mobile terminal, a digital camera, a wearable device, or a portable electronic device. Also, the plurality of terminals 1121 to 1124 may execute programs or applications.
- the plurality of terminals 1121 to 1124 may execute an application capable of receiving mediation services.
- the mediation service means that users of the plurality of terminals 1121 to 1124 perform video calls and/or voice calls with each other.
- the server 1110 may perform various classification tasks. For example, the server 1110 may classify users into predetermined classes based on information provided from users of the plurality of terminals 1121 to 1124 . In particular, when the server 1110 receives individual face images from people subscribed to the mediation service (ie, users of the plurality of terminals 1121 to 1124), the server 1110 converts the face images for various purposes. It can be classified into a certain class.
- the predetermined class may be a class set based on a person's gender or a class set based on a person's age.
- the classification model of the present disclosure may be used by server 1110 to filter abusing elements. Specifically, when the server 1110 receives an image from a user subscribed to a mediation service (ie, a plurality of terminals 1121 to 1124), the server 1110 may classify the received image into a predetermined class. At this time, a predetermined class may be set based on the abusing element. For example, in response to determining that the received image includes an abusing element, the server 1110 may classify and label the received image as an 'abusing class'.
- a mediation service ie, a plurality of terminals 1121 to 1124
- an abusive element may refer to an element that needs to be prevented from being displayed, such as a harmful element, an element against public order and morals, a sadistic element, a sexual element, an element inappropriate for minors, and the like.
- the server 1110 may stop data transmission/reception regarding an image classified as containing the abusing element according to the inclusion of the abusing element, or may block service use of a terminal that has transmitted the image.
- the server 1110 may request an additional authentication procedure to transmit/receive data about an image classified as containing an abusing element.
- the final model generated according to the method described above with reference to FIGS. 1 to 8 is stored in the server 1110, and the server 1110 can accurately classify the image into a predetermined class.
- a classification model capable of accurately classifying the input data into a predetermined class regardless of the distribution of the input data can be created.
- the above-described method can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium.
- the structure of data used in the above-described method can be recorded on a computer-readable recording medium through various means.
- the computer-readable recording medium includes storage media such as magnetic storage media (eg, ROM, RAM, USB, floppy disk, hard disk, etc.) and optical reading media (eg, CD-ROM, DVD, etc.) do.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un procédé pour l'apprentissage d'étiquettes bruyantes selon lequel un premier mode comprend les étapes suivantes: l'estimation d'une matrice de transition sur la base de données de sortie d'un classificateur de bruit pour des premières données d'entrée et d'une première étiquette correspondant aux premières données d'entrée; le calcul de la perte d'un classificateur propre sur la base de données de sortie du classificateur propre pour des premières données d'entrée, de données de sortie du classificateur propre pour des secondes données d'entrée, de la première étiquette, d'une seconde étiquette correspondant aux secondes données d'entrée, et de la matrice de transition; le calcul de la perte du classificateur de bruit sur la base des données de sortie du classificateur de bruit pour les secondes données d'entrée, et de la seconde étiquette; et l'entraînement du classificateur propre et du classificateur de bruit sur la base de la perte du classificateur propre et de la perte du classificateur de bruit.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR20210150892 | 2021-11-04 | ||
| KR10-2021-0150892 | 2021-11-04 | ||
| KR10-2022-0062899 | 2022-05-23 | ||
| KR1020220062899A KR20230065137A (ko) | 2021-11-04 | 2022-05-23 | 효율적인 전환 매트릭스 추정을 통한 노이즈 레이블을 학습하는 방법 및 장치 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023080509A1 true WO2023080509A1 (fr) | 2023-05-11 |
Family
ID=86241372
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/016182 Ceased WO2023080509A1 (fr) | 2021-11-04 | 2022-10-21 | Procédé et dispositif pour l'apprentissage d'étiquettes bruyantes par estimation efficace de la matrice de transition |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023080509A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117523212A (zh) * | 2023-11-09 | 2024-02-06 | 广州航海学院 | 用于车辆款式图像数据的标签噪声识别方法、系统及设备 |
| CN119904670A (zh) * | 2024-12-12 | 2025-04-29 | 河海大学 | 一种基于类别自适应的动态标签分布阈值的鲁棒分类方法及装置 |
-
2022
- 2022-10-21 WO PCT/KR2022/016182 patent/WO2023080509A1/fr not_active Ceased
Non-Patent Citations (5)
| Title |
|---|
| HAN JIANGFAN; LUO PING; WANG XIAOGANG: "Deep Self-Learning From Noisy Labels", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 5137 - 5146, XP033723138, DOI: 10.1109/ICCV.2019.00524 * |
| HU MENGYING; HAN HU; SHAN SHIGUANG; CHEN XILIN: "Weakly Supervised Image Classification Through Noise Regularization", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 15 June 2019 (2019-06-15), pages 11509 - 11517, XP033687242, DOI: 10.1109/CVPR.2019.01178 * |
| MANDAL DEVRAJ; BHARADWAJ SHRISHA; BISWAS SOMA: "A Novel Self-Supervised Re-labeling Approach for Training with Noisy Labels", 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), IEEE, 1 March 2020 (2020-03-01), pages 1370 - 1379, XP033770991, DOI: 10.1109/WACV45572.2020.9093342 * |
| WANG YISEN; MA XINGJUN; CHEN ZAIYI; LUO YUAN; YI JINFENG; BAILEY JAMES: "Symmetric Cross Entropy for Robust Learning With Noisy Labels", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 322 - 330, XP033723795, DOI: 10.1109/ICCV.2019.00041 * |
| YAO JIANGCHAO, WU HAO, ZHANG YA, TSANG IVOR W., SUN JUN: "Safeguarded Dynamic Label Regression for Noisy Supervision", PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 33, no. 01, 17 July 2019 (2019-07-17), pages 9103 - 9110, XP093064595, ISSN: 2159-5399, DOI: 10.1609/aaai.v33i01.33019103 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117523212A (zh) * | 2023-11-09 | 2024-02-06 | 广州航海学院 | 用于车辆款式图像数据的标签噪声识别方法、系统及设备 |
| CN117523212B (zh) * | 2023-11-09 | 2024-04-26 | 广州航海学院 | 用于车辆款式图像数据的标签噪声识别方法、系统及设备 |
| WO2025097727A1 (fr) * | 2023-11-09 | 2025-05-15 | 广州航海学院 | Procédé et système d'identification de bruit d'étiquette dans des données d'image de modèle de voiture, et dispositif |
| CN119904670A (zh) * | 2024-12-12 | 2025-04-29 | 河海大学 | 一种基于类别自适应的动态标签分布阈值的鲁棒分类方法及装置 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017213398A1 (fr) | Modèle d'apprentissage pour détection de région faciale saillante | |
| WO2020213842A1 (fr) | Structures multi-modèles pour la classification et la détermination d'intention | |
| WO2023080509A1 (fr) | Procédé et dispositif pour l'apprentissage d'étiquettes bruyantes par estimation efficace de la matrice de transition | |
| WO2020242090A1 (fr) | Appareil pour apprentissage de représentation profond et procédé associé | |
| WO2022197136A1 (fr) | Système et procédé permettant d'améliorer un modèle d'apprentissage machine destiné à une compréhension audio/vidéo au moyen d'une attention suscitée à multiples niveaux et d'une formation temporelle par antagonisme | |
| WO2022005090A1 (fr) | Méthode et appareil de fourniture de résultat de diagnostic | |
| WO2019245186A1 (fr) | Dispositif électronique et procédé de commande correspondant | |
| WO2018097439A1 (fr) | Dispositif électronique destiné à la réalisation d'une traduction par le partage d'un contexte d'émission de parole et son procédé de fonctionnement | |
| WO2019177373A1 (fr) | Dispositif électronique pour commander une fonction prédéfinie sur la base d'un temps de réponse d'un dispositif électronique externe à une entrée d'utilisateur, et procédé associé | |
| WO2021221490A1 (fr) | Système et procédé de compréhension fiable d'interrogations d'images basée sur des caractéristiques contextuelles | |
| WO2020045794A1 (fr) | Dispositif électronique et procédé de commande associé | |
| WO2024186177A1 (fr) | Procédé de préentraînement de transformateur langage-vision, et système d'intelligence artificielle comprenant un transformateur langage-vision préentraîné au moyen dudit procédé | |
| WO2017047876A1 (fr) | Procédé et système d'évaluation de la fiabilité d'après une analyse d'activité d'utilisateur sur les média sociaux | |
| WO2020080734A1 (fr) | Procédé de reconnaissance faciale et dispositif de reconnaissance faciale | |
| WO2019107674A1 (fr) | Appareil informatique et procédé d'entrée d'informations de l'appareil informatique | |
| WO2023182794A1 (fr) | Dispositif de contrôle de vision fondé sur une mémoire permettant la conservation de performances de contrôle, et procédé associé | |
| WO2021201365A1 (fr) | Dispositif électronique et son procédé de commande | |
| WO2020071618A1 (fr) | Procédé et système d'apprentissage partiel de réseau neuronal basé sur l'entropie | |
| WO2021246645A1 (fr) | Procédé d'optimisation d'algorithmes de traitement d'images et dispositif électronique le prenant en charge | |
| WO2023243886A1 (fr) | Dispositif électronique, station de base, et système de communication pour effectuer une prédiction de trafic | |
| WO2023085610A1 (fr) | Procédé et dispositif électronique pour effectuer un apprentissage de modèle multitâche | |
| WO2024034923A1 (fr) | Procédé et système de reconnaissance d'objet et d'analyse de modèle de comportement sur la base d'une surveillance vidéo à l'aide d'une intelligence artificielle | |
| WO2023054913A1 (fr) | Dispositif électronique qui identifie une force tactile et son procédé de fonctionnement | |
| WO2023033281A1 (fr) | Procédé de prédiction d'affinité entre un médicament et une substance cible | |
| WO2024257983A1 (fr) | Systèmes et procédés de prédiction d'un ensemble de classes probables pour des données de test |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22890244 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22890244 Country of ref document: EP Kind code of ref document: A1 |