US20030087456A1 - Within-sample variance classification of samples - Google Patents
Within-sample variance classification of samples Download PDFInfo
- Publication number
- US20030087456A1 US20030087456A1 US10/262,692 US26269202A US2003087456A1 US 20030087456 A1 US20030087456 A1 US 20030087456A1 US 26269202 A US26269202 A US 26269202A US 2003087456 A1 US2003087456 A1 US 2003087456A1
- Authority
- US
- United States
- Prior art keywords
- sample
- determining
- variance
- classification
- radiation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000000523 sample Substances 0.000 claims abstract description 242
- 238000000034 method Methods 0.000 claims abstract description 118
- 230000005855 radiation Effects 0.000 claims abstract description 48
- 238000013145 classification model Methods 0.000 claims abstract description 32
- 239000012472 biological sample Substances 0.000 claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims description 50
- 230000004044 response Effects 0.000 claims description 22
- 230000003993 interaction Effects 0.000 claims description 13
- 230000003287 optical effect Effects 0.000 claims description 11
- 238000005259 measurement Methods 0.000 claims description 8
- 238000010521 absorption reaction Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims 7
- 239000013074 reference sample Substances 0.000 claims 4
- 238000004519 manufacturing process Methods 0.000 claims 1
- 230000002159 abnormal effect Effects 0.000 abstract description 14
- 238000002329 infrared spectrum Methods 0.000 abstract description 9
- 238000010183 spectrum analysis Methods 0.000 abstract description 3
- 238000011282 treatment Methods 0.000 description 53
- 210000004027 cell Anatomy 0.000 description 45
- 230000003595 spectral effect Effects 0.000 description 34
- 238000012216 screening Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 9
- 238000010200 validation analysis Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000013103 analytical ultracentrifugation Methods 0.000 description 7
- 238000012937 correction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 206010008342 Cervix carcinoma Diseases 0.000 description 6
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 6
- 201000010881 cervical cancer Diseases 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 206010028980 Neoplasm Diseases 0.000 description 5
- 238000001574 biopsy Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000009595 pap smear Methods 0.000 description 5
- 238000000513 principal component analysis Methods 0.000 description 5
- 238000004566 IR spectroscopy Methods 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000002835 absorbance Methods 0.000 description 3
- 238000000862 absorption spectrum Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000000411 transmission spectrum Methods 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 206010012601 diabetes mellitus Diseases 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 230000003211 malignant effect Effects 0.000 description 2
- 239000003755 preservative agent Substances 0.000 description 2
- 230000002335 preservative effect Effects 0.000 description 2
- 230000006335 response to radiation Effects 0.000 description 2
- 238000004611 spectroscopical analysis Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 208000009458 Carcinoma in Situ Diseases 0.000 description 1
- 206010008263 Cervical dysplasia Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 206010018429 Glucose tolerance impaired Diseases 0.000 description 1
- 238000001069 Raman spectroscopy Methods 0.000 description 1
- 208000032124 Squamous Intraepithelial Lesions Diseases 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 229910001632 barium fluoride Inorganic materials 0.000 description 1
- 210000003679 cervix uteri Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000002573 colposcopy Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000009841 epithelial lesion Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000002949 hemolytic effect Effects 0.000 description 1
- 206010020718 hyperplasia Diseases 0.000 description 1
- 230000002390 hyperplastic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 201000004933 in situ carcinoma Diseases 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 230000001678 irradiating effect Effects 0.000 description 1
- 230000001000 lipidemic effect Effects 0.000 description 1
- ADKOXSOCTOWDOP-UHFFFAOYSA-L magnesium;aluminum;dihydroxide;trihydrate Chemical compound O.O.O.[OH-].[OH-].[Mg+2].[Al] ADKOXSOCTOWDOP-UHFFFAOYSA-L 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000001613 neoplastic effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012628 principal component regression Methods 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 208000020077 squamous cell intraepithelial neoplasia Diseases 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000002834 transmittance Methods 0.000 description 1
- 238000007473 univariate analysis Methods 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7232—Signal processing specially adapted for physiological signals or for diagnostic purposes involving compression of the physiological signal, e.g. to extend the signal recording period
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Definitions
- the present invention relates to spectral analysis of samples to determine if the samples are normal or abnormal or to otherwise classify the sample. More specifically, the present invention relates to classification of a biological sample on the basis of attenuation of infrared radiation at different wavelengths using a within-sample variance model.
- Infrared spectroscopy is sensitive to the rotational and vibrational energy levels of bonds, functional groups and molecules.
- the spectrum of a tissue sample thus contains information about the biochemical and morphological make-up of the sample. This information can be used to separate cells or tissues into classes according to some descriptive difference, such as cell type or disease status.
- Infrared spectroscopy offers the advantages of rapid, non-destructive, and automated testing using relatively inexpensive and robust equipment, all of which lead to cost-effective measurements.
- a simple univariate measure such as the peak height of an absorbance band can be used for classification.
- sophisticated multivariate techniques such as principal component analysis can combine the spectral values at many different wavelengths of light to provide classification ability.
- a classification model such as linear discriminant analysis is generated (or trained) from a set of spectral data taken from samples with known class assignments determined from an accurate, “gold standard” reference method.
- the goal of model generation is to seek some relationship (defined by the type of algorithm being used) between the spectral data and the known classes. This model is then used to predict the classes of new (test) samples. Comparing the classes predicted by the algorithm to the known classes provides estimates of the algorithm accuracy.
- the present invention comprises systems and methods for classifying a sample utilizing spectral analysis.
- a “sample” refers to what is being classified, for example, a sample can comprise a group of cells from an individual, collected from one or more collection sites and at one or more collection times; a sample can comprise cells from a group of individuals (where the group is to be classified); a sample can comprise extracts from one or more fluids to be classified; a sample can comprise tissue measured in vivo.
- “Classifying samples” includes determination of any property of the sample, including, as examples, membership in one or more classes, analyte concentration in the sample, and presence or extent of a particular material or property. Variance in response to radiation within a single sample can allow classification of a sample.
- the variance is often discussed herein in terms of variance among regions of a sample, where a “region” refers to a distinguishable determination of the response to radiation. Examples of regions include different spatial portions of a sample, different times for determination of a response, and different preparation methods applied before determining a response (e.g., a single cell collection event, followed by preparation of subsets of the collected cells in different manners).
- the present invention contemplates a single treatment of within-sample variance, and the combination of multiple treatments of within-sample variance for classification.
- the present invention also contemplates combining classification models, for example, combining a within-sample variance classification with other classification methods.
- a system can comprise means for generating light at a plurality of different wavelengths.
- the system can further comprise means for directing at least a portion of the generated light into a plurality of regions of a sample (e.g., cells in a biological sample).
- a sample e.g., cells in a biological sample.
- each region has an area of from about 100 ⁇ m 2 to about half the sample area. In a prepared slide, this would include from a fraction of a cell to many cells.
- the system can further comprise means for collecting at least a portion of the infrared light after it has interacted with each region. Means for determining the intensity of the collected infrared light for each region are included, with the intensity determined as a function of the wavelength.
- the system can also comprise means for storing a within-sample variance classification model which contains data indicative of a correct classification of known sample variances.
- a processor means is coupled to the means for determining the measured intensities and the means for storing the model. The processor means determines the classification of the sample as one of two or more types by use of the within-sample variance classification model and the measured intensities for each region.
- the stored classification model can be of various types related to the variance among the regions.
- One embodiment comprises a sample standard deviation model.
- Other embodiments comprise a sample mean absolute deviation model or a sample median absolute deviation model.
- a biological sample comprising a plurality of cells can be provided.
- the sample presents a substantially monocellular layer such as a sample prepared by the cytospin cell preparation technique or Cytyc Corporation's ThinPrep.
- Infrared light at a plurality of different wavelengths is generated.
- the infrared light irradiates a plurality of regions of a biological sample and an optical characteristic of each region determined.
- An optical characteristic is a property of how the region interacts with incident radiation, for example absorption, reflection, scattering, transmission, Raman effects, optical path lengths, and combinations thereof.
- An optical characteristic determined at a plurality of different incident radiation properties comprises a sample response spectrum.
- the optical characteristics of at least two of the plurality of regions can be used to classify the sample as one of two or more types, using a within-sample variance classification model. Examples of a within-sample variance classification model include a sample standard deviation model, a sample mean absolute deviation model, and a sample median absolute deviation model. Further, additional models can be applied to the spectral data to improve the accuracy of the classification.
- FIG. 1 is a schematic diagram of an apparatus useful in conducting the classifications contemplated by this invention.
- FIG. 2 is a flow chart of how samples were accepted into a study and how “gold standard” reference values were determined for those accepted samples.
- FIG. 3 is a schematic of model building, model validation, and bundling.
- FIG. 5 is an AUC performance metric for each of the 229 individual model treatments generated from within-sample spectral standard deviation data.
- FIG. 8 is an AUC performance metric for each of the 573 individual model treatments generated from within-sample spectral standard deviation data, within-sample spectral mean data and individual cell spectral data.
- FIG. 1 is a schematic representation of an example apparatus according to the present invention.
- a radiation source ( 9 ) supplies radiation to a collimating mirror ( 7 ).
- the collimated beam travels to beamsplitter ( 10 ) which is the beamsplitter of a Michelson interferometer.
- the beam is split into two beams which travel to two end mirrors of the interferometer ( 12 ) and ( 12 ′).
- Mirror ( 12 ) is the fixed mirror and mirror ( 12 ′) is the moving mirror of the interferometer.
- the beams then return to beamsplitter ( 10 ) where they recombine and exit towards mirror ( 11 ).
- Mirror ( 11 ) focuses the beam onto aperture ( 17 ), the size of which is adjustable.
- the beam then travels to focusing mirror ( 15 ) which re-images aperture ( 17 ) onto the specimen ( 23 ).
- Specimen ( 23 ) is mounted on a moving stage so that it can move in a plane perpendicular to the beam axis.
- Plan view ( 30 ) is a representation of a specimen conceptually separated into different regions or portions.
- a method for classifying a sample includes providing a sample that can be interrogated over a plurality of regions, for example, a sample comprising a plurality of cells spread over an area of a biological sample.
- the method can further include generating a plurality of different wavelengths of light and irradiating a plurality of regions of the sample with the plurality of different wavelengths. Intensity attenuations due to each region's interaction with the light can be measured to obtain a sample response spectrum comprising intensity information at multiple wavelengths for each of at least two of the plurality of regions.
- the sample can then be classified as one of two or more types from the measured intensity attenuations using a within-sample variance classification model.
- the mean absolute deviation is the average of the absolute values of the data centered by the mean.
- the median absolute deviation is the median of the absolute value of the data centered by the median.
- the statistic referred to as the variance is the mean value of the squares of the data centered by the mean of the data.
- population variance is as defined above for a population of data values. If a random sample of n data values (X 1 , . . .
- Mid-infrared MIR
- NIR Near-infrared
- VIS visible
- the number of regions of the sample can be selected to obtain a reliable estimate of variation based on statistics. Generally, more regions lead to more accurate determination of the variances.
- the number of regions can be from 2 to many. As an example, in a cervical cancer screening application, from 10 to 50 regions can be suitable.
- the area of each region can be large enough to obtain meaningful sample information; as an example, in classifying a sample comprising a plurality of cells, regions larger than one cell (e.g., an area large enough to include a plurality of cells) can be suitable.
- Each region can include a fraction of a cell to a number of cells conducive to obtaining a reliable estimate of variation based on statistics. When the number of cells to be measured is determined, the dimensions of the regions can be determined.
- the regions can have areas from about 100 ⁇ m 2 to about 150 mm 2 .
- the sample can be classified as one of two or more types based on the measured intensity attenuations.
- Table 1 shows some examples of classifications useful in some applications. TABLE 1 normal or abnormal For cancer screening/diagnosis and process monitoring normal, hyperplastic, dysplastic or For cancer screening/diagnosis neoplastic within normal limits, squamous intra- For cervical cancer screening/ epithelial lesion (high or low grade), diagnosis or carcinoma in-situ benign, pre-malignant, malignant For cancer screening/diagnosis Normal or In Need of Further Review For cancer screening/diagnosis male or female For gender screening hemolytic, lipemic or icteric For serum samples normal, prediabetic, or diabetic For screening or diagnosis of diabetes
- HPV Human Papiloma Virus
- a true is the actual absorbance spectrum
- T true is the actual cellular transmission spectrum
- T cell is the measured cellular transmittance spectrum
- f is the fraction of the aperture area not occupied by the cell
- T bgd is the measured background spectrum.
- Model Building The following sections on model building and validation are illustrated in FIG. 3 (up to bundling level 1).
- a linear discriminant analysis (LDA) classification algorithm was used to generate the various multivariate classification models.
- Other classification models can also be suitable, including, as examples, quadratic discriminant analysis (QDA), neural networks, unsupervised classification, classification and regression trees (CART), k-nearest neighbors, and combinations thereof.
- QDA quadratic discriminant analysis
- CART classification and regression trees
- the explanatory (predictor) variables were the scores of the spectra, and the dependent variable (class) was the binary normal or abnormal reference value from each sample.
- the LDA algorithm assumes the distribution of variables within each class is multivariate normal; it estimates the within-class mean value of each variable, and the covariance matrix between the different variables of all training samples.
- Model Validation When predicting the class of a validation (test) sample, we used the scores generated from within-sample spectral standard deviation as the input to our linear discriminant classifier. The output of our classifier was the posterior probability (PP) that the sample belonged to the normal class.
- PP posterior probability
- a sample's posterior probability is the classification model's estimate of the probability that the sample in question belongs to a given class. For example, a WNL PP of 0.9 means that there is a 90% probability that the sample belongs to the class of normal samples. The quantity 1-PP is therefore the probability that the sample belongs to the abnormal class. Due to the limited number of samples in our study, a bootstrapping algorithm was used to generate a set of 13 PPs for each of the 56 samples as follows (see FIG. 3).
- Table 2 lists the elements varied to produce the different model treatments. We generated 229 out of the possible 256 model treatment permutations. Each model treats the data differently, for example by using different spectral regions before data compression, thus each model should be expected to give different performance values. We purposely chose individual treatments that were expected to give some classification ability, based on various reports in the literature.
- a performance metric (the area under the receiver operating characteristic curve; AUC) for each model treatment was computed.
- AUC area under the receiver operating characteristic curve
- a PP threshold for normal class membership was first established, and samples with a PP above this value were classified as normal. For example, if the threshold was set to 0.2 and the sample PP was 0.23 (23% probability of being normal), the sample's class as predicted by the model was normal. These 56 predicted classes were compared to the true classes, and the fractions of abnormal samples correctly classified (true positive rate) and normal samples misclassified (false positive rate) by the model were computed. These rates were computed as the PP threshold was varied from 0 to 1 in increments of 0.05.
- FIG. 4 is an example of a Receiver Operating Characteristic Curve (ROC curve) generated from an individual model treatment, which has an AUC of 0.74.
- FIG. 5 shows the individual AUC performance metrics (computed using the median PP for each sample) for each model treatment.
- the AUCs vary from less than 0.5 (no classification ability) to 0.78.
- the current screening method for cervical cancer Pap smear followed by visual assessment of cells by a cytotechnologist and a pathologist
- a classification model is trained using a finite amount of data. Because of this, there will be uncertainty in the model's predictive ability, leading to a decrease in the claimable model accuracy. For example, a test sample whose predicted value is close to the boundary that is used to determine class membership will have a high degree of uncertainty associated with its predicted class. Bundling models reduces this uncertainty. Bundling therefore can allow a higher percentage of samples from the entire population to be predicted with confidence.
- a single classification model may provide acceptable accuracy for one subset (subset 1) of all possible samples, but may perform poorly for another subset (subset 2). Likewise, another model that emphasizes different spectral features or makes different assumptions about the distribution of classes may perform well on subset 2 but not on subset 1. Combining the outputs of these two models will therefore improve accuracy over the entire sample population.
- Bundling Bundling the output of multiple models was performed at two levels as shown in FIG. 3).
- the first bundling level combined the 13 bootstrap results for each sample within each model treatment by simply taking the median PP of each sample. We then had 1 PP for each of the 56 samples and each treatment.
- a performance metric the area under the receiver operating characteristic curve; AUC) for each model treatment was then computed, as it was used in the second level of bundling.
- the second bundling level combined the median PP (calculated within each model treatment) for each sample across model treatments.
- the 17 models with the highest individual AUC performance metrics were chosen as candidates for bundling (see FIGS. 3 and 5).
- Up to 11 model treatments were bundled as follows. First, a PP data matrix was formed for the 56 samples (rows) and 17 candidate models (columns). The 17 ⁇ 17 correlation coefficient matrix of the PP matrix was computed, and the two models treatments with the smallest correlation between the PPs for each sample were chosen for bundling. These two model treatments were removed and the selection process was repeated 5 more times. This yielded from 2-12 model treatments to bundle; the remaining description illustrates the 11-treatment bundling case.
- the performance of the 11 bundled models was evaluated using the AUC metric as well.
- For each PP threshold majority voting among 11 PP values for each sample was used to specify the predicted class. For example, if the threshold was 0.2, and 6 or more of the PPs were greater than 0.2, the sample was classified as normal. As before, the PP threshold was swept from 0 to 1, predicted classes were compared to true classes, true and false positive rates were calculated, and the AUC metric was computed.
- Other combinations of models can also be used. For example, certain models can be accorded greater or lesser weight, perhaps dependent on their performance on certain types of samples, in a voting scheme. Some models can be combined arithmetically, e.g., mean or median, before combination with other models. Patterns in the outputs of the models can also be used to derive the classification. Each vote in a voting scheme can also be weighted by its probability or confidence level. The models can also be combined after evaluation against thresholds.
- the second level encompasses a much broader scope by bundling across model treatments.
- the 17 model treatments with the highest individual AUCs were chosen as candidates for bundling. This down selection process ensures that the bundling operation begins with data that is useful on its own. However, bundling models that have identical performance on each test sample would not change the accuracy, as all model results are perfectly correlated. We therefore down selected further by choosing model treatments whose performances were good, but not identical.
- FIG. 6 shows how the AUC improves with bundling across model treatments.
- the AUCs for a single model treatment (first level bundling) ranged from 0.54 to 0.79.
- first level bundling we choose 165 different combinations of 3 out of 12 possible models and computed the AUC for each.
- the 3-model bundling case yielded AUCs ranging from 0.56 to 0.91, a statistically significant improvement over the 11 individual model results.
- the bundled AUC continued to improve with number of models bundled.
- FIG. 7 illustrates the ROC curve generated after 11 models were bundled together.
- Within-sample variance classification can also be bundled with other methods. For example, models can be generated using within-sample mean spectra. These models can then be bundled together with the models generated from the within-sample variance (e.g., standard deviation) spectra to improve the classification accuracy over either method.
- within-sample variance e.g., standard deviation
- FIG. 8 illustrates the individual AUC values for all 573 model treatments.
- the 14 model treatments with the highest individual AUCs were chosen as candidates for bundling.
- the ROC curve is plotted in FIG. 9 for the case of 11 treatments bundled, resulting in an AUC value of 0.91.
- test PP threshold would be fixed.
- sensitivity fraction of abnormal samples detected
- specificity fraction of normal samples detected
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Veterinary Medicine (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physiology (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10/262,692 US20030087456A1 (en) | 2001-10-08 | 2002-10-02 | Within-sample variance classification of samples |
| EP02768970A EP1444504A1 (fr) | 2001-10-08 | 2002-10-03 | Classification d'echantillons |
| PCT/US2002/031641 WO2003031954A1 (fr) | 2001-10-08 | 2002-10-03 | Classification d'echantillons |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US32800001P | 2001-10-08 | 2001-10-08 | |
| US10/262,692 US20030087456A1 (en) | 2001-10-08 | 2002-10-02 | Within-sample variance classification of samples |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20030087456A1 true US20030087456A1 (en) | 2003-05-08 |
Family
ID=26949398
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US10/262,692 Abandoned US20030087456A1 (en) | 2001-10-08 | 2002-10-02 | Within-sample variance classification of samples |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20030087456A1 (fr) |
| EP (1) | EP1444504A1 (fr) |
| WO (1) | WO2003031954A1 (fr) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150190091A1 (en) * | 2012-08-14 | 2015-07-09 | Nanyang Technological University | Device, system and method for detection of fluid accumulation |
| US20160349237A1 (en) * | 2014-02-19 | 2016-12-01 | Roche Diagnostics Operations, Inc. | Method and device for assigning a blood plasma sample |
| US20180045654A1 (en) * | 2015-02-17 | 2018-02-15 | Siemens Healthcare Diagnostics Inc. | Model-based methods and apparatus for classifying an interferent in specimens |
| US9907504B2 (en) | 2001-11-08 | 2018-03-06 | Optiscan Biomedical Corporation | Analyte monitoring systems and methods |
| US9913604B2 (en) | 2005-02-14 | 2018-03-13 | Optiscan Biomedical Corporation | Analyte detection systems and methods using multiple measurements |
| US10824959B1 (en) * | 2016-02-16 | 2020-11-03 | Amazon Technologies, Inc. | Explainers for machine learning classifiers |
| CN113408291A (zh) * | 2021-07-09 | 2021-09-17 | 平安国际智慧城市科技股份有限公司 | 中文实体识别模型的训练方法、装置、设备及存储介质 |
| US11248954B2 (en) * | 2019-06-27 | 2022-02-15 | Gasmet Technologies Oy | Back-to-back spectrometer arrangement |
| US20230194415A1 (en) * | 2019-10-17 | 2023-06-22 | Evonik Operations Gmbh | Method for predicting a property value of interest of a material |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7750299B2 (en) * | 2006-09-06 | 2010-07-06 | Donald Martin Monro | Active biometric spectroscopy |
| US20080161674A1 (en) * | 2006-12-29 | 2008-07-03 | Donald Martin Monro | Active in vivo spectroscopy |
| NL2009015C2 (en) * | 2012-04-10 | 2013-10-15 | Biosparq B V | Method for classification of a sample on the basis of spectral data, method for creating a database and method for using this database, and corresponding computer program, data storage medium and system. |
| JP6976257B2 (ja) * | 2016-01-28 | 2021-12-08 | シーメンス・ヘルスケア・ダイアグノスティックス・インコーポレーテッドSiemens Healthcare Diagnostics Inc. | マルチビューの特徴付けのための方法及び装置 |
| EP3408651B1 (fr) | 2016-01-28 | 2024-01-10 | Siemens Healthcare Diagnostics Inc. | Méthodes et appareil de détection d'interférant dans un échantillon |
| CN109459409B (zh) * | 2017-09-06 | 2022-03-15 | 盐城工学院 | 一种基于knn的近红外异常光谱识别方法 |
| EP4202413B1 (fr) * | 2021-12-21 | 2024-01-31 | Bruker Optics GmbH & Co. KG | Procédé mise en oeuvre par ordinateur pour prédire des propriétés d'échantillons sur la base de mesures spectrales et système informatique pour cela |
Citations (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3919530A (en) * | 1974-04-10 | 1975-11-11 | George Chiwo Cheng | Color information leukocytes analysis system |
| US4213036A (en) * | 1977-12-27 | 1980-07-15 | Grumman Aerospace Corporation | Method for classifying biological cells |
| US4250360A (en) * | 1978-01-05 | 1981-02-10 | Svensson Gustav E | Device to automatically activate or deactivate control means |
| US4495949A (en) * | 1982-07-19 | 1985-01-29 | Spectrascan, Inc. | Transillumination method |
| US4515165A (en) * | 1980-02-04 | 1985-05-07 | Energy Conversion Devices, Inc. | Apparatus and method for detecting tumors |
| US4975581A (en) * | 1989-06-21 | 1990-12-04 | University Of New Mexico | Method of and apparatus for determining the similarity of a biological analyte from a model constructed from known biological fluids |
| US4980551A (en) * | 1990-01-05 | 1990-12-25 | National Research Council Canada Conseil National De Recherches Canada | Non-pressure-dependancy infrared absorption spectra recording, sample cell |
| US4981138A (en) * | 1988-06-30 | 1991-01-01 | Yale University | Endoscopic fiberoptic fluorescence spectrometer |
| US5036853A (en) * | 1988-08-26 | 1991-08-06 | Polartechnics Ltd. | Physiological probe |
| US5038039A (en) * | 1990-01-29 | 1991-08-06 | Cornell Research Foundation, Inc. | Method of detecting the presence of anomalies in biological tissues and cells in natural and cultured form by infrared spectroscopy |
| US5137030A (en) * | 1986-10-01 | 1992-08-11 | Animal House, Inc. | Diagnostic methods |
| US5168039A (en) * | 1990-09-28 | 1992-12-01 | The Board Of Trustees Of The University Of Arkansas | Repetitive DNA sequence specific for mycobacterium tuberculosis to be used for the diagnosis of tuberculosis |
| US5197470A (en) * | 1990-07-16 | 1993-03-30 | Eastman Kodak Company | Near infrared diagnostic method and instrument |
| US5261410A (en) * | 1991-02-07 | 1993-11-16 | Alfano Robert R | Method for determining if a tissue is a malignant tumor tissue, a benign tumor tissue, or a normal or benign tissue using Raman spectroscopy |
| US5293872A (en) * | 1991-04-03 | 1994-03-15 | Alfano Robert R | Method for distinguishing between calcified atherosclerotic tissue and fibrous atherosclerotic tissue or normal cardiovascular tissue using Raman spectroscopy |
| US5303026A (en) * | 1991-02-26 | 1994-04-12 | The Regents Of The University Of California Los Alamos National Laboratory | Apparatus and method for spectroscopic analysis of scattering media |
| US5433197A (en) * | 1992-09-04 | 1995-07-18 | Stark; Edward W. | Non-invasive glucose measurement method and apparatus |
| US5539207A (en) * | 1994-07-19 | 1996-07-23 | National Research Council Of Canada | Method of identifying tissue |
| US5596992A (en) * | 1993-06-30 | 1997-01-28 | Sandia Corporation | Multivariate classification of infrared spectra of cell and tissue samples |
| US5616457A (en) * | 1995-02-08 | 1997-04-01 | University Of South Florida | Method and apparatus for the detection and classification of microorganisms in water |
| US5784162A (en) * | 1993-08-18 | 1998-07-21 | Applied Spectral Imaging Ltd. | Spectral bio-imaging methods for biological research, medical diagnostics and therapy |
| US5851835A (en) * | 1995-12-18 | 1998-12-22 | Center For Laboratory Technology, Inc. | Multiparameter hematology apparatus and method |
| US5976885A (en) * | 1995-11-13 | 1999-11-02 | Bio-Rad Laboratories, Inc. | Method for the detection of cellular abnormalities using infrared spectroscopic imaging |
| US5991028A (en) * | 1991-02-22 | 1999-11-23 | Applied Spectral Imaging Ltd. | Spectral bio-imaging methods for cell classification |
| US6146897A (en) * | 1995-11-13 | 2000-11-14 | Bio-Rad Laboratories | Method for the detection of cellular abnormalities using Fourier transform infrared spectroscopy |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4150360A (en) * | 1975-05-29 | 1979-04-17 | Grumman Aerospace Corporation | Method and apparatus for classifying biological cells |
-
2002
- 2002-10-02 US US10/262,692 patent/US20030087456A1/en not_active Abandoned
- 2002-10-03 WO PCT/US2002/031641 patent/WO2003031954A1/fr not_active Ceased
- 2002-10-03 EP EP02768970A patent/EP1444504A1/fr not_active Withdrawn
Patent Citations (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3919530A (en) * | 1974-04-10 | 1975-11-11 | George Chiwo Cheng | Color information leukocytes analysis system |
| US4213036A (en) * | 1977-12-27 | 1980-07-15 | Grumman Aerospace Corporation | Method for classifying biological cells |
| US4250360A (en) * | 1978-01-05 | 1981-02-10 | Svensson Gustav E | Device to automatically activate or deactivate control means |
| US4515165A (en) * | 1980-02-04 | 1985-05-07 | Energy Conversion Devices, Inc. | Apparatus and method for detecting tumors |
| US4495949A (en) * | 1982-07-19 | 1985-01-29 | Spectrascan, Inc. | Transillumination method |
| US5137030A (en) * | 1986-10-01 | 1992-08-11 | Animal House, Inc. | Diagnostic methods |
| US4981138A (en) * | 1988-06-30 | 1991-01-01 | Yale University | Endoscopic fiberoptic fluorescence spectrometer |
| US5036853A (en) * | 1988-08-26 | 1991-08-06 | Polartechnics Ltd. | Physiological probe |
| US4975581A (en) * | 1989-06-21 | 1990-12-04 | University Of New Mexico | Method of and apparatus for determining the similarity of a biological analyte from a model constructed from known biological fluids |
| US4980551A (en) * | 1990-01-05 | 1990-12-25 | National Research Council Canada Conseil National De Recherches Canada | Non-pressure-dependancy infrared absorption spectra recording, sample cell |
| US5038039A (en) * | 1990-01-29 | 1991-08-06 | Cornell Research Foundation, Inc. | Method of detecting the presence of anomalies in biological tissues and cells in natural and cultured form by infrared spectroscopy |
| US5197470A (en) * | 1990-07-16 | 1993-03-30 | Eastman Kodak Company | Near infrared diagnostic method and instrument |
| US5168039A (en) * | 1990-09-28 | 1992-12-01 | The Board Of Trustees Of The University Of Arkansas | Repetitive DNA sequence specific for mycobacterium tuberculosis to be used for the diagnosis of tuberculosis |
| US5261410A (en) * | 1991-02-07 | 1993-11-16 | Alfano Robert R | Method for determining if a tissue is a malignant tumor tissue, a benign tumor tissue, or a normal or benign tissue using Raman spectroscopy |
| US5991028A (en) * | 1991-02-22 | 1999-11-23 | Applied Spectral Imaging Ltd. | Spectral bio-imaging methods for cell classification |
| US5303026A (en) * | 1991-02-26 | 1994-04-12 | The Regents Of The University Of California Los Alamos National Laboratory | Apparatus and method for spectroscopic analysis of scattering media |
| US5293872A (en) * | 1991-04-03 | 1994-03-15 | Alfano Robert R | Method for distinguishing between calcified atherosclerotic tissue and fibrous atherosclerotic tissue or normal cardiovascular tissue using Raman spectroscopy |
| US5433197A (en) * | 1992-09-04 | 1995-07-18 | Stark; Edward W. | Non-invasive glucose measurement method and apparatus |
| US5596992A (en) * | 1993-06-30 | 1997-01-28 | Sandia Corporation | Multivariate classification of infrared spectra of cell and tissue samples |
| US5784162A (en) * | 1993-08-18 | 1998-07-21 | Applied Spectral Imaging Ltd. | Spectral bio-imaging methods for biological research, medical diagnostics and therapy |
| US5539207A (en) * | 1994-07-19 | 1996-07-23 | National Research Council Of Canada | Method of identifying tissue |
| US5616457A (en) * | 1995-02-08 | 1997-04-01 | University Of South Florida | Method and apparatus for the detection and classification of microorganisms in water |
| US5976885A (en) * | 1995-11-13 | 1999-11-02 | Bio-Rad Laboratories, Inc. | Method for the detection of cellular abnormalities using infrared spectroscopic imaging |
| US6031232A (en) * | 1995-11-13 | 2000-02-29 | Bio-Rad Laboratories, Inc. | Method for the detection of malignant and premalignant stages of cervical cancer |
| US6146897A (en) * | 1995-11-13 | 2000-11-14 | Bio-Rad Laboratories | Method for the detection of cellular abnormalities using Fourier transform infrared spectroscopy |
| US5851835A (en) * | 1995-12-18 | 1998-12-22 | Center For Laboratory Technology, Inc. | Multiparameter hematology apparatus and method |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9907504B2 (en) | 2001-11-08 | 2018-03-06 | Optiscan Biomedical Corporation | Analyte monitoring systems and methods |
| US9913604B2 (en) | 2005-02-14 | 2018-03-13 | Optiscan Biomedical Corporation | Analyte detection systems and methods using multiple measurements |
| US10368804B2 (en) * | 2012-08-14 | 2019-08-06 | Nanyang Technological University | Device, system and method for detection of fluid accumulation |
| US20150190091A1 (en) * | 2012-08-14 | 2015-07-09 | Nanyang Technological University | Device, system and method for detection of fluid accumulation |
| US10359416B2 (en) | 2014-02-19 | 2019-07-23 | Roche Diagnostics Operations, Inc. | Method and device for assigning a blood plasma sample |
| US9983192B2 (en) * | 2014-02-19 | 2018-05-29 | Roche Diagnostics Operations, Inc. | Method and device for assigning a blood plasma sample |
| US20160349237A1 (en) * | 2014-02-19 | 2016-12-01 | Roche Diagnostics Operations, Inc. | Method and device for assigning a blood plasma sample |
| US20180045654A1 (en) * | 2015-02-17 | 2018-02-15 | Siemens Healthcare Diagnostics Inc. | Model-based methods and apparatus for classifying an interferent in specimens |
| US11009467B2 (en) * | 2015-02-17 | 2021-05-18 | Siemens Healthcare Diagnostics Inc. | Model-based methods and apparatus for classifying an interferent in specimens |
| US10824959B1 (en) * | 2016-02-16 | 2020-11-03 | Amazon Technologies, Inc. | Explainers for machine learning classifiers |
| US11248954B2 (en) * | 2019-06-27 | 2022-02-15 | Gasmet Technologies Oy | Back-to-back spectrometer arrangement |
| US20230194415A1 (en) * | 2019-10-17 | 2023-06-22 | Evonik Operations Gmbh | Method for predicting a property value of interest of a material |
| US12055480B2 (en) * | 2019-10-17 | 2024-08-06 | Evonik Operations Gmbh | Method for predicting a property value of interest of a material |
| CN113408291A (zh) * | 2021-07-09 | 2021-09-17 | 平安国际智慧城市科技股份有限公司 | 中文实体识别模型的训练方法、装置、设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1444504A1 (fr) | 2004-08-11 |
| WO2003031954A1 (fr) | 2003-04-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11986268B2 (en) | System and method for the discrimination of tissues using a fast infrared cancer probe | |
| US20030087456A1 (en) | Within-sample variance classification of samples | |
| US5991653A (en) | Near-infrared raman spectroscopy for in vitro and in vivo detection of cervical precancers | |
| US6493566B1 (en) | Classification system for sex determination and tissue characterization | |
| US5596992A (en) | Multivariate classification of infrared spectra of cell and tissue samples | |
| US6587702B1 (en) | Classification and characterization of tissue through features related to adipose tissue | |
| CN100387969C (zh) | 用于改制光谱校准模型的方法和装置 | |
| US6501982B1 (en) | System for the noninvasive estimation of relative age | |
| AU775204B2 (en) | Apparatus and method for identification of individuals by near-infrared spectrum | |
| JP4216077B2 (ja) | 非侵襲血液検体の予測のための限定された較正モデルを実行する多段階方法 | |
| US6385484B2 (en) | Spectroscopic system employing a plurality of data types | |
| EP1380015B1 (fr) | Appareil et procede d'identification ou de verification biometrique d'individus par spectroscopie optique | |
| CN102088906A (zh) | 利用组织荧光确定糖基化终产物或疾病状态的测量的改进的方法和装置 | |
| Miljković et al. | Spectral cytopathology: new aspects of data collection, manipulation and confounding effects | |
| Ferguson et al. | Infrared micro-spectroscopy coupled with multivariate and machine learning techniques for cancer classification in tissue: a comparison of classification method, performance, and pre-processing technique | |
| US20190277755A1 (en) | Device and method for tissue diagnosis in real-time | |
| CN116559143A (zh) | 血液中葡萄糖成分的复合型拉曼光谱数据分析方法及系统 | |
| CN112716447A (zh) | 一种基于拉曼检测光谱数据深度学习的口腔癌分类系统 | |
| CN107303174A (zh) | 一种互联网+光谱肿瘤临床医学诊断方法 | |
| US11540722B2 (en) | Etalon mid-infrared probe for spectroscopic tissue discrimination | |
| WO2007066589A1 (fr) | Procédé et appareil pour examiner et diagnostiquer une maladie liée au mode de vie utilisant une spectroscopie de proche infrarouge | |
| US20250251346A1 (en) | Raman hyperspectroscopy of saliva and machine learning for sjogren's syndrome diagnostics | |
| Anto Win Shalini et al. | Enhancing the Diagnostic Evaluation of Thyroid Functionality Using Diffuse Reflectance Spectroscopy and Regression Models | |
| Cohen et al. | Real-Time, On-Site, Machine Learning Identification Methodology of Intrinsic Human Cancers Based on Infra-Red Spectral Analysis–Clinical Results | |
| del Real Mata et al. | Evaluation of machine learning and deep learning models for the classification of a single extracellular vesicles spectral library |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INLIGHT SOLUTIONS, INC., NEW MEXICO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JONES, HOWLAND D. T.;GARDNER, CRAIG M.;HULL, EDWARD L.;AND OTHERS;REEL/FRAME:013478/0683;SIGNING DATES FROM 20020926 TO 20020927 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |