WO2023143713A1 - Détermination d'une classification globale pour un ensemble à éléments multiples - Google Patents
Détermination d'une classification globale pour un ensemble à éléments multiples Download PDFInfo
- Publication number
- WO2023143713A1 WO2023143713A1 PCT/EP2022/051795 EP2022051795W WO2023143713A1 WO 2023143713 A1 WO2023143713 A1 WO 2023143713A1 EP 2022051795 W EP2022051795 W EP 2022051795W WO 2023143713 A1 WO2023143713 A1 WO 2023143713A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- ensemble
- training
- elm
- ebl
- rpr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- the invention relates to an approach for classi fying an ensemble consisting of a plurality of individual elements which can be , for example , physical obj ects or measurement data .
- the corresponding classi fication system applies an arti ficial neural network ANN which is trained based on training ensembles consisting of a plurality of training elements wherein ground truth is available for the overall training ensembles , but not for individual training element .
- a classi fication of an ensemble is required wherein the ensemble consists of a plurality of individual elements .
- the overall quality of the production process and/or of the produced batch as well as failure detection cannot be reasonably derived from observation of individual components due to their huge number .
- the medical sector in a microscopic analysis of a blood sample including a plurality of blood cells to determine a diagnosis observation of individual cells is hardly manageable with reasonable ef fort , again due to the huge amount of individual cells .
- a characterizing latent ensemble representation REP_EBL of the multi-element ensemble EBL is fed to a trained artificial neural network ANN (in the following in most cases only "network ANN") .
- the trained network ANN processes the characterizing latent ensemble representation REP_EBL of the multi-element ensemble EBL to derive the overall classification CLS.
- the invention assumes and allows that not all elements ELM(i) of an ensemble EBL are subject to the same classification.
- the characterizing latent ensemble representation REP_EBL of the multi-element ensemble EBL is calculated from an ensemble reproduction RPR_EBL, e.g. an image, of the multi-element ensemble EBL.
- the ensemble reproduction RPR_EBL includes corresponding element reproductions RPR_ELM(i) of the elements ELM(i) of the ensemble EBL.
- an ensemble reproduction RPR_EBL of an ensemble EBL can be an optical image depicting the ensemble EBL while the element reproductions RPR_ELM(i) of the elements ELM(i) can be sections of that image which are cropped from the image so that each element reproduction RPR_ELM(i) depicts essentially one of the elements ELM(i) .
- the characterizing latent ensemble representation REP_EBL is calculated by generating at least for each one of a subset of the elements ELM(i) , i.e. at least for each one of a subset of corresponding reproductions RPR_ELM(i) of such elements ELM(i) , a latent element representation REP_ELM(i) from the corresponding element reproduction RPR_ELM(i) of the respective element ELM(i) , e.g. from the cropped images of individual red blood cells.
- the characterizing latent ensemble representation REP_EBL of the ensemble EBL is calculated as a function of the generated latent element representations REP_ELM(i) of the elements ELM(i) , especially as an average of the generated latent element representations REP_ELM(i) .
- the function can be an averaging function, possibly a weighted averaging.
- each latent element representation REP_ELM(i) is generated from the corresponding element reproduction RPR_ELM(i) of the respective element ELM(i) by a feature extraction method, preferably by a convolutional autoencoder.
- This generation of a latent element representation REP_ELM(i) from the corresponding element reproduction RPR_ELM(i) can apply a feature extraction method using, for example, an encoder part of a previously trained convolutional auto-encoder (CAE) .
- CAE convolutional auto-encoder
- the element reproductions RPR_ELM(i) of the elements ELM(i) can be derived from an ensemble reproduction RPR_EBL, e.g. an image, of the multi-element ensemble EBL.
- the ensemble reproduction RPR_EBL of the ensemble EBL is an image IMA depicting at least a subset of the plurality PL of elements ELM(i) of the respective ensemble EBL.
- image IMA might show all the elements or at least a certain number N ⁇ PL of them.
- the element reproductions RPR_ELM(i) of the elements ELM(i) are sections of the image IMA, wherein each particular element reproduction RPR_ELM(p) at least depicts one particular element ELM(p) .
- different particular reproductions RPR_ELM(p) depict different particular elements ELM(p) .
- such a section of the image and the corresponding reproduction RPR_ELM(p) preferably shows at least the respective particular element ELM(p) in complete and, as the case may be, parts of e.g. neighboring or proximate elements ELM(k) with k ⁇ p.
- the overall classification CLS to be determined is a health status of a patient, e.g. "healthy” or "infected", wherein the elements ELM(i) forming the ensemble EBL are constituents of a sample of the patient, preferably blood cells of a blood sample of the patient, especially red blood cells and/or white blood cells.
- the ensemble can be consisting of the plurality of red blood cells.
- the overall classification CLS to be determined is a status indicator for a mass production process for producing a plurality of components, wherein each produced component corresponds to an element ELM(i) forming the ensemble EBL.
- the status indicator might represent the status of the production process as such or it might represent a quality indicator, e.g. "faulty” or "faultless", signaling the quality of the produced components .
- the elements ELM(i) are measurement values, e.g. sensor measurement values or any other value occurring during the processing, collected during a processing, e.g. production, of a component and the ensemble EBL is a data structure comprising the collected measurement values ELM(i) .
- the term "collecting” can mean, for example, “measuring”.
- the element reproductions RPR_ELM(i) are derived from the elements ELM(i) based on a given function EX.
- the ensemble reproduction RPR_EBL of the ensemble EBL is a data structure containing the element reproductions RPR_ELM(i) .
- the function EX might be the identity function, i.e. the element reproductions RPR_ELM(i) are identical to the elements ELM(i) themselves and the element reproductions RPR_ELM(i) are actually generated by collecting the elements ELM(i) .
- the step of deriving is directly fulfilled by collecting the elements ELM(i) .
- Each element ELM(i) might contain a plurality of measurement values, wherein all measurement values of a particular element ELM(i) might be collected from the same source, e.g. from the same sensor, or different measurement values of a particular element ELM(i) are collected from different sources, e.g. different types of sensors, e.g. temperature and pressure, preferably representing the same point in time of the processing of the component .
- a method for training and establishing, respectively, the artificial neural network ANN such that the trained network ANN is configured to provide, in a regular operation, i.e. after a training phase, the overall classification CLS for the multi-element ensemble EBL upon receiving a characterizing latent ensemble representation REP_EBL of such multi-element ensemble EBL as input data set includes a training data preparation phase PREP and a training phase TRAIN.
- a characterizing latent training ensemble representation REP_TEBL(t) is determined, wherein for each one of the multi-element training ensembles TEBL(t) an overall classification CLS (t) has been provided in advance.
- the network ANN is trained based on the characterizing latent training ensemble representations REP_TEBL(t) as input data sets and the respective previously provided overall classifications CLS (t) as aspired output data of the network ANN, i.e.
- the network ANN is expected to provide a given overall classification CLS (t) upon receipt of the respective given characterizing latent training ensemble representation REP_TEBL(t) as input data.
- training of a network ANN includes an iterative process which is conducted until the network ANN delivers the expected and aspired output as outlined below.
- a three-step process is executed for determining its characterizing latent training ensemble representation REP_TEBL(t) .
- a training element reproduction RPR_TELM (t, i) , e.g. a cropped image, is generated or extracted from a previously provided training ensemble reproduction RPR_TEBL(t) , e.g.
- the previously provided reproduction RPR_TEBL(t) can be an image of the training ensemble TEBL(t) and the reproductions RPR_TELM (t, i) can be sections of that image which are cropped from the image so that each reproduction RPR_TELM (t, i) depicts essentially one of the training elements TELM(t,i) .
- a latent training element representation REP_TELM (t, i) for each generated training element reproduction RPR_TELM (t, i) of a respective training element TELM(t,i) , i.e. for each i, a latent training element representation REP_TELM (t, i) , e.g.
- the characterizing latent training ensemble representation REP_TEBL(t) of the respective multielement training ensemble TEBL(t) is calculated as a function of the previously generated latent training element representations REP_TELM (t, i) .
- latent training element representation REP_TELM (t, i) can be embodied as a multi-parameter latent feature vector VEC(t,i) as outlined below.
- the previously provided training ensemble reproduction RPR_TEBL(t) of the multi-element training ensemble TEBL(t) is an image IMA(t) depicting at least a subset of the plurality of training elements TELM(t,i) of the respective training ensemble TEBL(t) .
- image IMA might show all the elements or at least a certain number N ⁇ PL of them.
- the training element reproductions RPR_TELM (t, i) of the training elements TELM(t,i) are sections of the image IMA(t) , wherein each particular training element reproduction RPR_TELM ( t, p) at least depicts one particular training element TELM(t,p) .
- different particular reproductions RPR_TELM ( t, p) preferably depict different particular training elements TELM(t,p) .
- each training element reproduction RPR_TELM (t, i) is generated from the previously provided training ensemble reproduction RPR_TEBL(t) by extracting or cropping, respectively, a section from the training ensemble reproduction RPR_TEBL(t) which at least includes and depicts, respectively, the respective training element TELM(t,i) .
- the overall classification CLS to be determined is a health status of a patient, e.g. "healthy” or "infected", wherein the training elements TELM(t,i) forming the training ensemble TEBL(t) are constituents of a sample of the patient, preferably blood cells of a blood sample of the patient, especially red blood cells and/or white blood cells.
- the ensemble can be consisting of the plurality of red blood cells.
- the overall classification CLS to be determined is a status indicator for a mass production process for producing a plurality of components, wherein each produced component corresponds to a training element TELM(t,i) forming the ensemble EBL.
- the status indicator might represent the status of the production process as such or it might represent a quality indicator, e.g. "faulty” or "faultless", signaling the quality of the produced components.
- the training elements TELM(t,i) are measurement values, e.g. sensor measurement values or any other value occurring during the processing, collected during a processing, e.g. production, of a component and the training ensemble TEBL(t) is a data structure comprising the collected measurement values TELM(t,i) .
- the training element reproductions RPR_TELM (t, i) are derived from the training elements TELM(t,i) based on a given function EX.
- the training ensemble reproduction RPR_TEBL(t) of the training ensemble TEBL(t) is a data structure containing the training element reproductions RPR_TELM (t, i) .
- the function EX can be, in the easiest case, the identity function .
- each training element TELM(t,i) might contain a plurality of measurement values, wherein all measurement values of a particular training element TELM(t,i) can be collected from the same source, e.g. from the same sensor, or different measurement values of a particular training element TELM(t,i) can be collected from different sources, e.g. different types of sensors, e.g. temperature and pressure, preferably representing the same point in time of the processing of the component.
- sources e.g. different types of sensors, e.g. temperature and pressure, preferably representing the same point in time of the processing of the component.
- each latent training element representation REP_TELM (t, i) is determined from the corresponding generated and extracted, respectively, training element reproduction RPR_TELM (t, i) by a feature extraction method, preferably by a convolutional auto-encoder.
- the characterizing latent training ensemble representation REP_TEBL(t) of the multi-element training ensemble TEBL(t) can be calculated as an average of the previously generated latent training element representations REP_TELM (t, i) .
- not all multi-element training ensembles TEBL(t) have the same overall classification CLS (t) , i.e. at least two training ensembles TEBL1, TEBL2 are differently classified.
- the ensembles of a first group of several training ensembles are classified with the same first overall classification CLS1, while the ensembles of a second group of another several training ensembles are classified with the same second overall classification CLS2.
- further groups of training ensembles with further overall classifiers might be applied.
- the technical solution provided herein addresses the question how to use computer learning on a dataset consisting of multiple training ensembles of data, wherein each training ensemble belongs to one of two or more different types and classifications, respectively, and wherein solid ground truths are only available for the overall training ensembles, but not for any one of the individual elements of the ensembles.
- CLSl diagnosis and classification
- CLS2 healthy
- red blood cells will be used as "elements" ELM(i) of the "ensemble” EBL.
- ELM(i) of the "ensemble” EBL.
- Another technical application is originating from the industrial sector, where a data structure consisting of a plurality of individual data points, e.g. sensor measurement values, shall be analyzed to conclude the status of a machine or a product produced by the machine.
- the individual sensor measurement values would correspond to the "elements" ELM(i) while the data structure, i.e. the entirety of the data points ELM(i) , corresponds to the ensemble EBL.
- a more concrete, but still only exemplary application could be a production process applying injection molding for large quantity production of plastic components. During the production, process parameters like temperature, pressure, etc. are observed via corresponding sensor measurements, resulting in sensor measurement values ELM(i) .
- a plurality of such measurement values ELM(i) again forms the ensemble EBL which is analyzed by the approach proposed herein to derive a classification.
- one element ELM(i) comprises a plurality of measurement values, e.g. ELM ( 1 ) comprises all temperature measurement values of a certain phase of the production process while ELM (2) comprises all pressure measurement values of that phase etc.
- an element ELM(i) is a tuple of different sensor measurement values, e.g. a temperature value and a pressure value, measured at the same point in time.
- different elements represent situations at different points in time.
- Another application would be optical inspection in mass production of components, where images are taken of the huge quantities of produced components.
- Each such image represents an ensemble EBL and each produced component represents an element ELM(i) .
- the classification CLS of the ensemble EBL allows to assess whether a batch of produced components ELM(i) is faulty or faultless.
- a classification CLS is determined for a multi-element ensemble EBL, wherein the individual classification and status, respectively, of individual elements ELM(i) is not in the focus, but only the overall classification CLS and status, respectively, of the ensemble EBL.
- FIG 1 shows an ensemble EBL of elements ELM ( i ) .
- FIG 2 shows flow chart for classi fying the ensemble EBL
- FIG 3 shows a first example for an element reproduction
- FIG 4 shows a second example for an element reproduction
- FIG 5 shows a third example for an element reproduction
- FIG 6 shows a smear image
- FIG 7 shows a first example of element reproductions from the smear image
- FIG 8 shows a second example of element reproductions from the smear image
- FIG 9 shows flow chart of a training procedure for training an arti ficial neural network ANN
- FIG 10 shows a second training step of the training procedure
- FIG 11 shows a first application scenario
- FIG 12 shows a second application scenario
- FIG 13 shows a third application scenario
- FIG 14 shows a training setup .
- the elements ELM(i) might be red blood cells or components produced in a production process or measurement data of an industrial process. Different ones of the elements ELM(i) might have a different status and, correspondingly, might be differently classified which is visualized in FIG 1 by elements ELM(i) with and without a black spot. I.e.
- ELM(l) , ELM(2) , ELM(3) , ELM(6) , and ELM(9) are in a different status and differently classified than ELM(4) , ELM(5) , EL(7) , ELM ( 8 ) , ELM(IO) , ELM(ll) , ELM(12) , ELM(13) , and ELM (14) .
- elements ELM(i) can be red blood cells of a blood sample of a patient and it is envisaged to determine whether the patient is infected with Malaria.
- Some red blood cells of the sample and ensemble EBL, respectively, are infected, e.g. the ones with the black spot in FIG 1, while others are still healthy.
- the overall classification CLS of that ensemble should be "infected".
- multi-element expresses in connection with an ensemble that such a multi-element ensemble comprises a plurality PL>1 of elements ELM(i) .
- the term "of the same kind” expresses that different elements ELM ( 1 ) , ELM(2) etc. of the ensemble EBL are alike each other.
- all elements ELM(i) of the same kind are red blood cells, e.g. in case of the application scenario of Malaria detection.
- the expression "of the same kind” allows, but does not demand, that different elements ELM(l) , ELM(2) etc. are in the same status or the same individual classification. Referring to the example of Malaria detection and red blood cells, some red blood cells and elements, respectively, of the ensemble EBL might be in a status "healthy", while other red blood cells and elements, respectively, of the same ensemble EBL might be in a status "infected” .
- all classification of an ensemble EBL expresses that the classification is valid for the ensemble EBL as an entirety, but not necessarily for individual elements ELM(i) of the ensemble. Again, coming back to the example of red blood cells in Malaria detection, wherein some red blood cells of the ensemble are “infected” and some are “healthy”, the overall classification CLS of this ensemble EBL would be "infected", although the ensemble comprises healthy red blood cells.
- the approach for determining an overall classification CLS for a multi-element ensemble EBL foresees to determine and supply a characterizing latent ensemble representation REP_EBL of the multi-element ensemble EBL to a previously trained artificial neural network ANN (in the following "network ANN") .
- the trained network ANN processes the characterizing latent ensemble representation REP_EBL to derive the overall classification CLS.
- the conceivable classifications CLS (c) as well as their number PLC of course depend on the concrete classification scenario.
- each element ELM(i) of the ensemble EBL can be individually classified with one of the plurality of different classifications CLS (c) , wherein different elements ELM ( 1 ) , ELM(2) etc. might be classified with different ones of the classifications CLS (l) , CLS (2) etc.
- the group of conceivable classifications CLS (c) is selected such that it comprises a classification CLS (c) for each one of the elements ELM(i) .
- a first classification step CS1 an ensemble reproduction RPR_EBL of the ensemble EBL is provided.
- the image and ensemble reproduction RPR_EBL of the ensemble EBL can be a microscopic smear image RPR_EBL depicting the red blood cells ELM(i) as shown in FIG 1.
- the ensemble reproduction RPR_EBL and image IMA of the ensemble EBL only show a subset of the plurality PL of elements ELM(i) of the respective ensemble EBL since it cannot be excluded that one or more first elements ELM(il) cover or overlap one or more second elements ELM (12) so that the second elements ELM (12) are not visible in the ensemble reproduction RPR_EBL and image IMA.
- image IMA might show all the elements or at least a certain number N ⁇ PL and subset of them.
- the ensemble reproduction RPR_EBL depicting the multielement ensemble EBL, is composed of and includes, respectively, a plurality of corresponding element reproductions RPR_ELM(i) of the elements ELM(i) of the ensemble EBL.
- FIGs 3-5 show exemplarily and based on FIG 2 examples for element reproductions RPR_ELM(8) , RPR_ELM(9) , and RPR_ELM(12) for elements ELM(8) , ELM(9) , and ELM(12) .
- the element reproductions RPR_ELM(i) of the elements ELM(i) can be sections of that image IMA which can be derived and cropped, respectively, from the image IMA so that each element reproduction RPR_ELM(i) depicts essentially and at least one of the elements ELM(i) .
- different particular reproductions RPR_ELM(p) depict different particular elements ELM(p) .
- such a section of the image IMA and the corresponding reproduction RPR_ELM(p) respectively, preferably shows at least the respective particular element ELM(p) in complete and, as the case may be, parts of e.g. neighboring or proximate elements ELM(k) with k ⁇ p, i.e. as can be seen in FIGs 3-5, the element reproductions RPR_ELM(i) might also depict parts of other, neighboring elements.
- those element reproductions RPR_ELM(i) are derived or generated, respectively, from the ensemble reproduction RPR_EBL at least for each one of a subset of the elements ELM(i) of the ensemble EBL, e.g. by suitable, known image processing approaches like "cropping". For example, in case two or more elements ELM(i) are overlapping in the image IMA, it could be imaginable to waive the generation of corresponding element reproductions RPR_ELM(i) for those overlapping elements.
- a third classification step CS3 comprises a generation of a latent element representation REP_ELM(i) for each one of the element reproductions RPR_ELM(i) generated in the second classification step CS2.
- one latent element representation REP_ELM(i) is generated for each one of the cropped images RPR_ELM(i) as exemplarily shown in FIGs 3-5.
- the generation of a latent element representation REP_ELM(i) from the corresponding element reproduction RPR_ELM(i) can apply a feature extraction method using, for example, an encoder part of a previously trained convolutional auto-encoder (CAE) .
- CAE convolutional auto-encoder
- this encoder is a module of the network ANN which has been introduced above and the training of which will be described in the following.
- each individual element reproduction RPR_ELM(i) can be encoded to generate the corresponding latent element representation REP_ELM(i) as a multi-parameter vector VEC(i) of its latent features.
- the vector VEC(i) might be a 64-parameter vector .
- a characterizing latent ensemble representation REP_EBL of the ensemble EBL is calculated as a function of the latent element representations REP_ELM(i) of the elements ELM(i) of the ensemble EBL generated in the third classification step CS3.
- the characterizing latent ensemble representation REP_EBL can be calculated as a weighted average of the individual latent element representations REP_ELM(i) .
- the weightings might be determined with a Principal Component Analysis which analyses which components of the latent element representations have higher relevance and which ones have lower relevance so that such components might have a higher weighting and lower weighting, respectively.
- a fifth classification step CS5 the trained network ANN processes the characterizing latent ensemble representation REP_EBL calculated in the fourth classification step CS4 to derive the overall classification CLS of the ensemble EBL.
- the third and fifth classification steps CS3, CS5 can be executed by the trained network ANN.
- the network ANN might also be configured to execute the second classification step CS2 and/or the fourth classification step CS4.
- FIG 6 shows a smear image IMA and ensemble reproduction RPR_EBL, respectively, for the scenario of a Malaria detection.
- the image IMA depicts the ensemble EBL comprising a plurality of red blood cells which represent in this scenario the elements ELM(i) of the ensemble EBL.
- FIG 7 shows purely exemplarily three element reproductions RPR_ELM(i) resulting from the second classification step CS2 applied with the image IMA of FIG 6.
- FIG 8 shows cases for which element reproductions RPR_ELM(i) are overlapping so that those elements ELM(i) are not considered for calculating latent element representations REP_ELM(i) in the third classification step CS3.
- the artificial neural network ANN has to be trained beforehand to enable it to determine the overall classification CLS of a multi-element ensemble EBL of elements ELM(i) . Such training is described in the following with reference to FIGs 9-10.
- FIG 9 shows a flow chart of a training procedure for training and establishing, respectively, the artificial neural network ANN such that the readily trained network ANN is configured to provide, in a regular operation, i.e. after a training phase, an overall classification CLS for a multi-element ensemble EBL of a plurality of elements ELM(i) upon receiving a characterizing latent ensemble representation REP_EBL of such multi-element ensemble EBL as input data set.
- the network ANN is not only trained and configured to be able to process a characterizing latent ensemble representation REP_EBL to determine an overall classification CLS, but it is trained and configured to determine the overall classification CLS upon receiving element reproductions RPR_ELM(i) .
- the network ANN is trained and configured to determine an overall classification CLS upon receiving an ensemble reproduction RPR_ELM.
- the training procedure comprises a training data preparation phase PREP and a subsequent training phase TRAIN.
- the term "previously classified” expresses that for each one of the multi-element training ensembles TEBL(t) an overall classification CLS (t) has been provided in advance.
- the training ensembles TEBL(t) are selected such that not all training ensembles TEBL(t) have the same overall classification CLS (t) , i.e. at least two training ensembles TEBL ( 1 ) , TEBL(2) are differently classified, e.g.
- the training ensembles of a second group of another several training ensembles TEBL(t2) with t2 X+l
- PLT and X ⁇ PLT are classified with the same second overall classification CLS (2) .
- further groups of training ensembles with further overall classifications might be applied.
- the network ANN is trained based on the characterizing latent training ensemble representations REP_TEBL(t) as input data sets and the respective previously provided overall classifications CLS (t) as aspired output data of the network ANN. I.e. the network ANN is expected to provide a given overall classification CLS (t) upon receipt of the respective given characterizing latent training ensemble representation REP_TEBL(t) as input data .
- the term "previously classified” expresses that for each training ensemble TEBL(t) a classification CLS (t) is known and provided for further processing and training.
- each training ensemble reproduction RPR_TEBL(t) depicting the corresponding multi-element training ensemble TEBL(t) , is composed of and includes, respectively, a plurality of corresponding training element reproductions RPR_TELM (t, i) of training elements TELM(t,i) of the training ensemble TEBL(t) .
- the training ensemble reproduction RPR_TEBL(t) is an image IMA(t) of the corresponding training ensemble TEBL(t)
- the training element reproductions RPR_TELM (t, i) of the training elements TELM(t,i) can be sections of that image IMA(t) .
- the training elements TELM(t,i) , the training ensembles TEBL(t) , and the respective training ensemble reproductions RPR_TEBL(t) are of the same kind as the elements ELM(i) , the ensemble EBL, and the ensemble reproduction RPR_EBL for which an overall classifications CLS shall be determined after training.
- different training ensembles TEBL(t) might be different pluralities of red blood cells of blood smears of infected and healthy patients
- the red blood cells of a given training ensemble TEBL(t) represent its training elements TELM(t,i)
- the training ensemble reproductions RPR_TEBL(t) can be the corresponding microscope images of the blood smears, depicting the corresponding red blood cells and training elements TELM(t,i) , respectively.
- training ensembles TEBL(t) and corresponding training ensemble reproductions RPR_TEBL(t) are available for all conceivable overall classifications, but at least for the overall classifications of interest. For example, in case of Malaria detection it might be sufficient to distinguish between "infected" and "healthy", while other application scenarios might require more than two different overall classifications.
- each multi-element training ensemble TEBL(t) and training ensemble reproductions RPR_TEBL(t) are available for each overall classification CLS (t) of interest.
- the overall classification CLS (t) i.e. the "ground truth”
- ground truth might be known from gene tests or other reliable diagnostic methods.
- the training ensemble reproductions RPR_TEBL(t) might be stored in and provided from a data base DB .
- the data base DB stores both the training ensemble reproductions RPR_TEBL(t) corresponding to the previously classified training ensembles TEBL(t) as well as the respective overall classifications CLS (t) of the training ensembles TEBL(t) . Consequently, the stored overall classifications CLS (t) of the training ensembles TEBL(t) are assigned to and applicable for the respective training ensemble reproductions RPR_TEBL(t) .
- a characterizing latent training ensemble representation REP_TEBL(t) is determined for each one of the plurality of training ensembles TEBL(t) and training ensemble reproductions RPR_TEBL(t) , respectively, available from the first training step TS1.
- the second training step TS2 comprises three substeps TS2.1, TS2.2, TS2.3 for each training ensemble reproduction RPR_TEBL(t) and corresponding characterizing latent training ensemble representation REP_TEBL(t) , respectively, i.e. for each t:
- a training element reproduction RPR_TELM (t, i) is generated from the corresponding training ensemble reproduction RPR_TEBL(t) at least for each one of a subset of training elements TELM(t,i) , i.e. for several i, preferably for each i, of the respective multi-element training ensemble TEBL(t) .
- the training element reproductions RPR_TELM (t, i) can be cropped from the image IMA(t) so that each training element reproduction RPR_TELM (t, i) depicts essentially and at least one of the training elements TELM(t,i) .
- different particular reproductions RPR_TELM ( t, p) depict different particular training elements TELM(t,p) .
- such a section of the image IMA(t) and the corresponding reproduction RPR_TELM ( t, p) shows at least the respective particular training element TELM(t,p) in complete and, as the case may be, parts of e.g. neighboring or proximate training elements TELM(t,k) with k ⁇ p.
- substep TS2.1 is essentially comparable to the second classification step CS2 described above.
- the training element reproductions RPR_TELM (t, i) are generated from the training ensemble reproduction RPR_TEBL(t) at least for each one of a subset of the training elements TELM(t,i) of the training ensemble TEBL(t) , e.g. by suitable image processing approaches like "cropping".
- suitable image processing approaches like "cropping".
- a latent training element representation REP_TELM (t, i) e.g. in the form of a multiparameter feature vector VEC(t,i) , is determined for each training element reproduction RPR_TELM (t, i) of a respective training element TELM(t,i) generated in the first substep TS2.1, e.g. by feature extraction, e.g. with a convolutional auto-encoder .
- the generation of a latent training element representation REP_TELM (t, i) from the corresponding training element reproduction RPR_TELM (t, i) can apply a feature extraction method using, for example, an encoder part of a previously trained convolutional auto-encoder (CAE) .
- This encoder can be a module of the network ANN which also provides the overall classification or it can be a separate network.
- each individual training element reproduction RPR_TELM (t, i) can be encoded to generate the corresponding latent training element representation REP_TELM (t, i) as a multi-parameter vector VEC(t,i) of its latent features.
- the vector VEC(t,i) might be a 64-parameter vector.
- the characterizing latent training ensemble representation REP_TEBL(t) of a respective multielement training ensemble TEBL(t) is calculated as a function of the latent training element representations REP_TELM (t, i) generated in the second substep TS2.2.
- the characterizing latent training ensemble representation REP_TEBL(t) of the multi-element training ensemble TEBL(t) is calculated as an average of the latent training element representations REP_TELM (t, i) generated in the second substep TS2.2.
- the latent training element representations REP_TELM (t, i) being embodied as multiparameter latent feature vectors VEC(t,i) as introduced above
- the characterizing latent training ensemble representation REP_TEBL(t) of the multi-element training ensemble TEBL(t) resulting from an averaging function could also be a corresponding multi-parameter latent feature vector VEC (t) .
- execution of the second training step TS2 including substeps TS2.1, TS2.2, TS2.3 for each t, i.e. for each training ensemble reproduction RPR_TEBL(t) received from the first training step TS1, provides a characterizing latent training ensemble representation REP_TEBL(t) for each t.
- the herewith completed training data preparation phase PREP results in a plurality of characterizing latent training ensemble representations REP_TEBL (t) .
- the characterizing latent training ensemble representations REP_TEBL(t) of the training data preparation phase PREP as well as corresponding, assigned overall classifications CLS (t) from the data base DB are further processed in the subsequent training phase TRAIN to train the network ANN.
- such training of a network ANN includes an iterative process which is conducted until the network ANN delivers the expected and aspired output with a certain accuracy .
- all characterizing latent training ensemble representations REP_TEBL(t) are selected for training.
- ANN-parameters of the network ANN e.g. weights etc., which define the behavior of the network ANN as a response to an input data set, are iteratively adjusted. For example, such training might follow the principles of supervised or semi-supervised learning in the machine learning domain.
- the characterizing latent training ensemble representation REP_TEBL(t) is used as an input data set and the assigned overall classification CLS (tO) is used as the aspired output OUT of the network ANN when processing the particular input data set REP_TEBL(tO) .
- the network ANN iteratively processes the provided input data set REP_TEBL(tO) and adjusts the parameters of the network ANN in one or more iteration steps.
- each iteration step j current ANN-parameters are applied with the input data set REP_TEBL(tO) to calculate the output OUT(j) .
- the output OUT(j) is compared with the aspired and expected output CLS (tO) .
- the next training loop and iteration step j+1 is applied.
- the ANN-parameters are varied and a new output OUT (j+1) is calculated based on the provided input data set REP_TEBL ( t 0 ) , but with the varied ANN-parameters.
- the adjustment of the parameters and the iteration steps are continued until the deviation between OUT(j) and CLS (tO) is acceptable and, in other words, until such output OUT(j) of the network ANN matches best with the particular known overall classification CLS (tO) assigned to the input data set REP_TEBL(tO) .
- the same iterative training procedure can be performed for each one of the selected characterizing latent training ensemble representations REP_TEBL(t) and training ensembles TEBL(t) , respectively.
- the above explanations occasionally referred to the concrete application of the proposed method for Malaria detection.
- the method can be understood as a computer implemented method executed by a computer 110 of a classification system 100, as shown in FIG 11.
- the computer 110 is configured for performing the method according to FIG 2 for classifying an ensemble EBL comprising a plurality of individual elements ELM(i) or, in other words, for determining an overall classification CLS for a multi-element ensemble EBL.
- the computer 110 is configured to apply the trained artificial neural network ANN.
- the system 100 for Malaria detection comprises a camera or microscope 120, respectively, for imaging a sample SAM of a patient.
- Such an image IMA, representing the ensemble reproduction RPR_EBL, is provided to the computer 110 for determining the overall classification as explained above.
- the method is of course not limited to Malaria detection, but to various methods for analyzing other components of blood or of some other sample, e.g. white blood cells, blood platelets, etc. or even a combination of different elements. I.e. in such a case those components suitable for achieving the aspired analysis and diagnosis represent the "elements" ELM(i) , and training elements TELM(t,i) , respectively. Thus, it can actually be applied for any kind of component and even for a mixture of components of a different kind.
- the overall classification CLS determined by a classification system 100 can be a status indicator for a process for mass production of a component 201 (due to the high number of components 201, only a few of them have reference signs) .
- the system 100 comprises the computer 110 for performing the method according to FIG 2 for classifying an ensemble EBL by applying the trained artificial neural network ANN.
- the system 100 comprises a camera 120 for depicting the produced components 201 and for generating the ensemble reproduction RPR_EBL, respectively.
- the status indicator CLS can represent the status of the mass production process executed by a production facility 200 as such or it might represent a quality indicator, e.g. "faulty” or "faultless", signaling the quality of the produced components 201.
- the components 201 represent the elements ELM(i) and, for example, the plurality of components ELM(i) of a batch forms an ensemble EBL.
- the ensemble reproduction RPR_EBL can be an optical image IMA of the batch EBL, created by a camera 120 of the system 100, depicting the plurality of components ELM(i) .
- FIG 13 Another exemplary embodiment, shown in FIG 13, focuses on analysis of an industrial process executed by a facility 300 during which process parameters, e.g. measurement values of different sensors 301 placed at different locations of the facility or phases of the corresponding industrial process, are collected.
- Different sensors 301 might be configured to measure different parameters, e.g. temperatures and pressures.
- the process might be a production process of a component and the classification CLS to be determined might again be a status or quality indicator of the process.
- the classification system 100 of this embodiment again comprises the computer 110 for performing the method according to FIG 2 for classifying an ensemble EBL by applying the trained artificial neural network ANN.
- the computer 110 can be directly connected to the sensors 301 for receiving the measurement values.
- the measurement values of the sensors 301 might be routed to the computer 110 via a control system (not shown) of the facility 300.
- the computer 110 itself might be the control system of the facility 300 or at least an integral part of the control system.
- individual measurement values of individual ones of the sensors 301 represent the elements ELM(i) and the ensemble EBL is a data structure containing the elements ELM(i) .
- the generation of an element reproduction RPR_ELM(i) from the respective element ELM(i) can be achieved by applying a given function EX to the element ELM(i) .
- the step of deriving is directly fulfilled by collecting the elements ELM(i) .
- the ensemble EBL being a data structure
- the ensemble reproduction RPR_EBL would also be realized as a data structure, containing the element reproductions RPR_ELM(i) .
- the information required for executing the classification steps CS1-CS5 to determine the classification CLS is available is this embodiment as well.
- each individual element ELM(i) contains not only a single measurement value, but a plurality of measurement values of the sensors 301. Therein, all measurement values of a particular element ELM(i) are collected from the same source, e.g. from the same sensor. Alternatively, different measurement values of a particular element ELM(i) might be collected from different sources, e.g. different types of sensors 301, e.g. temperature and pressure, preferably representing the same point in time of the processing of the component. In any case, such elements ELM(i) comprising a plurality of measurement values can be further processed as described above to determine their latent representation etc.
- FIG 14 shows a setup for performing the training procedure according to FIGs 9 and 10 for training and establishing, respectively, the artificial neural network ANN such that the readily trained network ANN is configured to provide, in a regular operation the overall classification CLS for a multielement ensemble EBL upon receiving a characterizing latent ensemble representation REP_EBL of such multi-element ensemble EBL as input data set .
- the setup includes a computer 110 which applies and executes the network ANN and which is configured to perform the training data preparation step PREP and the training step TRAIN of the training procedure .
- the setup includes the data base DB which stores and provides both the training ensemble reproductions RPR_TEBL ( t ) as well as the respective overall classi fications CLS ( t ) of the corresponding training ensembles TEBL ( t ) .
- the arti ficial neural network ANN is set up to provide an overall classi fication CLS upon receipt of an ensemble reproduction REP_ENS .
- the network ANN is configured to be applied for the approach for determining the overall classi fication CLS for the multi-element ensemble EBL as described above and in connection with FIG 2 .
- the ensemble EBL based training and evaluation method proposed herein which does not focus on an individual element ELM achieves reliable classi fication of the ensemble even in cases in which such classi fication cannot be derived from observation of individual elements .
- the proposed approach will allow correct classi fication of an ensemble of cells even in cases in which a human observer can not , or only with uncertainty, determine whether a patient is infected . This might be the case at an early stage of such infection when an individual smear image and ensemble reproduction, respectively, does not contain characteristic cells or only few of them . In such a situation, a human would not be able to recogni ze the infection while the network ANN, trained based on ensemble ground truth, detects features which are not perceptible by a human .
- one ensemble EBL comprises the elements ELM ( i ) depicted in one image IMA.
- the elements of several images form one ensemble .
- the scope of one ensemble is not necessarily defined by the contents of only one image .
- the elements ELM ( i ) depicted in one image are distributed to form two or even more ensembles .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
L'invention concerne un procédé et un entraînement correspondant pour déterminer une classification CLS d'un ensemble d'une pluralité d'éléments individuels. La classification peut être un indicateur de qualité ou d'état pour un processus industriel ou un diagnostic médical. Les éléments peuvent être, par exemple, des valeurs de mesure de capteur d'une pluralité de capteurs ou de composants produits dans un processus de production de masse dans l'application industrielle ou des cellules de sang dans l'application médicale. La classification CLS est déterminée en tant que classification globale pour tout l'ensemble EBL, tandis que la classification et l'état individuels, respectivement, d'éléments individuels ELM(i) ne se trouvent pas au centre des préoccupations.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2022/051795 WO2023143713A1 (fr) | 2022-01-26 | 2022-01-26 | Détermination d'une classification globale pour un ensemble à éléments multiples |
| EP22708356.5A EP4453891A1 (fr) | 2022-01-26 | 2022-01-26 | Détermination d'une classification globale pour un ensemble à éléments multiples |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2022/051795 WO2023143713A1 (fr) | 2022-01-26 | 2022-01-26 | Détermination d'une classification globale pour un ensemble à éléments multiples |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023143713A1 true WO2023143713A1 (fr) | 2023-08-03 |
Family
ID=80682407
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2022/051795 Ceased WO2023143713A1 (fr) | 2022-01-26 | 2022-01-26 | Détermination d'une classification globale pour un ensemble à éléments multiples |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4453891A1 (fr) |
| WO (1) | WO2023143713A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018140014A1 (fr) * | 2017-01-25 | 2018-08-02 | Athelas, Inc. | Classification d'échantillons biologiques par analyse d'image automatisée |
| WO2021247868A1 (fr) * | 2020-06-03 | 2021-12-09 | Case Western Reserve University | Classification de cellules sanguines |
-
2022
- 2022-01-26 EP EP22708356.5A patent/EP4453891A1/fr active Pending
- 2022-01-26 WO PCT/EP2022/051795 patent/WO2023143713A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018140014A1 (fr) * | 2017-01-25 | 2018-08-02 | Athelas, Inc. | Classification d'échantillons biologiques par analyse d'image automatisée |
| WO2021247868A1 (fr) * | 2020-06-03 | 2021-12-09 | Case Western Reserve University | Classification de cellules sanguines |
Non-Patent Citations (1)
| Title |
|---|
| KASSIM YASMIN M ET AL: "Clustering-Based Dual Deep Learning Architecture for Detecting Red Blood Cells in Malaria Diagnostic Smears", IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, IEEE, PISCATAWAY, NJ, USA, vol. 25, no. 5, 29 October 2020 (2020-10-29), pages 1735 - 1746, XP011853832, ISSN: 2168-2194, [retrieved on 20210510], DOI: 10.1109/JBHI.2020.3034863 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4453891A1 (fr) | 2024-10-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Rajalingham et al. | Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks | |
| Novakovic et al. | Evaluation of classification models in machine learning | |
| Alhazmi | [Retracted] Detection of WBC, RBC, and Platelets in Blood Samples Using Deep Learning | |
| Tahir et al. | A methodology of customized dataset for cotton disease detection using deep learning algorithms | |
| Shazia et al. | Automated early diabetic retinopathy detection using a deep hybrid model | |
| WO2023143713A1 (fr) | Détermination d'une classification globale pour un ensemble à éléments multiples | |
| Nazha et al. | How I read an article that uses machine learning methods | |
| Bhatia et al. | A proposed stratification approach for MRI images | |
| de Teresa et al. | Convolutional networks for supervised mining of molecular patterns within cellular context | |
| Bao et al. | Rare heart transplant rejection classification using diffusion-based synthetic image augmentation | |
| Syahda et al. | A Comparative Study of VGG16, MobileNet, and Ensemble Methods on Fundus Image-Based Cataract Detection | |
| da Silva et al. | Detecting and mitigating issues in image-based COVID-19 diagnosis | |
| Rigaux et al. | Quantifying neuronal differentiation using temporal topological persistence | |
| Milošević et al. | Unsupervised deep clustering as a tool for the identification of dark taxa in biomonitoring | |
| Mythili et al. | Deep Learning Based Microscopic Blood Cell Classification For Cancer Detection | |
| Nayaki et al. | Enhancing Heart Disease Prediction with GANs in Clinical Decision Support Systems | |
| Hiatt et al. | Improving Visual Neuroscience Cell Type Classification with Supervised Machine Learning | |
| Cho | Comparative Analysis and Optimization of Model-Method Combinations for Out-of-Distribution Detection in Medical Image Classification | |
| Alhamidi | Detection of Alzheimer's Disease using deep learning algorithm | |
| US20250139955A1 (en) | Method and a system for identification of cervical cancer cells | |
| Hidayat et al. | Comparative Evaluation of CNN Architectures for Skin Cancer Classification. | |
| Celik et al. | Deep-channel: A deep convolution and recurrent neural network for detection of single molecule events | |
| Priya et al. | Deep Learning Architectures for Accurate and Scalable Nitrogen Deficiency Detection in Rice Plants | |
| Arul Jothi et al. | Image Classification Using CNN to Diagnose Diabetic Retinopathy | |
| Taskforce | How I read an article that uses machine learning methods |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2022708356 Country of ref document: EP Effective date: 20240725 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |