EP4630813A2 - Procédé amélioré pour un flux de travail métabolomique non ciblé évolutif - Google Patents
Procédé amélioré pour un flux de travail métabolomique non ciblé évolutifInfo
- Publication number
- EP4630813A2 EP4630813A2 EP23901396.4A EP23901396A EP4630813A2 EP 4630813 A2 EP4630813 A2 EP 4630813A2 EP 23901396 A EP23901396 A EP 23901396A EP 4630813 A2 EP4630813 A2 EP 4630813A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- sample
- features
- samples
- relevant
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/0027—Methods for using particle spectrometers
- H01J49/0036—Step by step routines describing the handling of the data generated during a measurement
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
Definitions
- One aspect is a method of non-targeted determination of unique biological metabolites in individual samples of a sample set, each individual sample comprising a composition of chemical constituents, the method comprising: a. analyzing reference features obtained from a reference sample to identify non-relevant reference features; b. filtering the reference features by removing the non-relevant reference features, thereby producing a set of relevant reference features that characterize the unique biological metabolites; and, c. applying the non-relevant and/or relevant reference features to sample features obtained from an individual sample to determine the composition of unique biological metabolites in the individual sample.
- the method may comprise conducting the steps of analyzing, filtering and applying on each individual sample in the sample set, thereby determining the unique biological metabolites in all individuals of the sample set.
- the reference features are updated by re- analyzing the reference sample to obtain updated reference features the reference features are replaced with the updated reference features.
- the number of individual samples in the plurality of samples may be less than the entire number of individual samples in the sample set.
- the plurality of samples may comprise at least 2, samples, optionally at least 3 samples, optionally at least 5 samples, optionally at least 10 samples, optionally at least 20 samples, optionally at least 50 samples, optionally at least 100 samples, or optionally at least 150 samples.
- the plurality of samples my comprise at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 75%, at least 90%, or at least 95% of the samples in the sample set.
- the reference sample may comprise an aliquot from each of a plurality of the individual samples.
- the reference features may be obtained by a method comprising subjecting the reference sample to a separation technique and mass spectrometry.
- the separation technique may comprise chromatography, which may comprise gas chromatography or liquid chromatography.
- the non-relevant reference features may characterize artifacts, background chemical constituents and/or non-unique biological metabolites.
- the background chemical constituents may comprise contaminants.
- the non-unique biological metabolites may comprise adducts and/or fragment ions.
- the reference features may comprise reference separation data and reference mass data generated from the reference sample.
- the step of applying may comprise: identifying in the sample features those sample features corresponding to non-relevant reference features and removing those sample features that correspond to the non-relevant reference features to produce a set of relevant sample features; and, using the relevant sample features to determine the unique biological metabolites in the sample.
- the step of applying may comprise: identifying in the sample features those sample features corresponding to relevant reference features to produce a set of relevant sample features; and, using the relevant sample features to determine the unique biological metabolites in the sample.
- the method may comprise conducting the steps of obtaining and applying to each sample in the sample set, thereby determining the composition of the chemical constituents in all individual samples in the sample set.
- the method may comprise identifying one or more of the chemical constituents by comparing the relevant sample features to a library of information comprising features characterizing chemical entities.
- One aspect comprises a method of non-targeted determination of a composition of chemical constituents in individual samples in a sample set, comprising: a.
- One aspect of the disclosure comprises a method of non-targeted determination of a composition of chemical constituents in individual samples in a sample set, comprising: a. using a separation technique and mass spectrometry to produce reference features comprising separation data and mass data from a reference sample; b. collecting and storing the reference features; c.
- the step of using the filtered reference features may comprise: obtaining sample features from an individual sample; identifying sample features that correspond to the filtered reference features to produce a list of relevant sample features that characterize unique chemical constituents, thereby determining the composition of chemical constituents in the sample.
- a separation apparatus for separating the chemical constituents and producing separation-related features characterizing the chemical constituents
- a mass spectrometer for performing mass spectrometry on portions of the separated chemical constituents and producing mass spectrometry-related features characterizing the chemical constituents
- a first module for receiving, collecting and/or storing the separation-related features and/or the mass spectrometry -related features
- a user interface coupled to the module for making the separation-related features and/or the mass spectrometry -related features available in human- accessible form.
- the system may comprise a library of features characterizing chemical entities, the features produced using the separation apparatus and the mass spectrometer, wherein the library of features comprises separation features and mass spectrometry features characterizing identified chemical entities.
- the separation apparatus may comprise an apparatus for performing chromatography, which may comprise liquid chromatography or gas chromatography. Liquid chromatography may comprise HPLC or UPLC.
- the separation apparatus may comprise an electrophoresis apparatus, which may be coupled to the mass spectrometer.
- the separation-related features may comprise peak retention time, peak intensity, and/or peak width.
- the mass-spectrometry-related features may comprise mass or an m/z value.
- FIGS. 2 a & b show total ion chromatograms.
- FIG. 3 illustrates a pipeline for handling metabolomics data.
- Polar and lipid metabolites are extracted from plasma samples into 96-well plates for LC/MS analysis.
- a pooled sample is prepared for feature detection, MS/MS acquisition, and use as a QC sample.
- Untargeted metabolomics analysis is performed on all samples. After detecting features from the pooled sample, background features and degeneracies are filtered.
- FIGS. 4a & 4b show data pre-processing of a single batch.
- FIGS.5a-5c show retention time deviations in lipid metabolites.
- FIGS. 6a-6e shows a comparison between individual and pooled samples. a) Venn diagram showing the breakdown of features detected in at least one individual sample (red) versus in at least one pooled sample (green). A higher fraction of features missed in the pooled sample had no hits returned when searching the Human Metabolome Database and KEGG. b-c) Histograms showing the fraction of samples where a feature was detected.
- FIGS. 7a-7c show targeted extraction of peak areas versus traditional global peak integration.
- a-b) The pie charts represent the percentage of all measurements that are missing values (grey) when targeted extraction of peak areas with Skyline is performed (a) or when traditional peak integration within XCMS is performed (b). In (a) the missing value percentage is too small to be visible on the pie chart.
- FIGS.8a-8e illustrate correcting for batch effects in metabolomics data.
- PCA Principal components analysis
- a) Principal components analysis (PCA) of unnormalized lipid metabolic profiles shows strong batch effects. Each dot represents a unique sample. Dots are colored according to their corresponding batch number.
- PCA Principal components analysis
- FIG.9a-9e show correcting for batch effects in metabolomics data.
- PCA Principal components analysis
- Normalization score is the change in coefficient of variation (CV) in the research samples (relative to the unnormalized data) divided by the change in CV for the QC samples. A higher score indicates a reduction of technical variation.
- Violin plots showing the CV distribution of all compounds in the QC samples for each evaluated batch-correction algorithm. The lipid metabolite counterpart to these data is shown in FIG.8.
- FIGS. 10a & 10b show that QC samples enable discrimination of batch and biological effects. Metabolite intensities can change as a function of analysis batch. If batches are biased by biologically different sample groups, then the application of non-QC based batch normalization can remove biological variation in addition to technical variation because the different types of variation cannot be easily distinguished. However, QC samples enable technical shifts caused by batch number (such as DGTS 17:0) to be differentiated from biological differences between batches (such as inosine).
- the plots in (a) show the unnormalized metabolite intensities as a function of run order for QC samples (left) and research samples (right). In (b) the metabolite intensities after random forest correction are shown for QC and research samples for the same compounds.
- FIGS.11a-11d show internal standard variability.
- a-b Violin plots showing the distribution of coefficients of variation (CV) for the internal standards across all samples within a batch or across all batches (1-22) for both unnormalized (a) and random forest corrected data (b).
- c-d Principal components analysis (PCA) of internal standard intensities across all samples in the unnormalized (c) and random forest corrected data (d). Each dot is a sample. Samples are colored by batch.
- Violin plots showing the distribution of CVs for the internal standards across all samples for the 14 batch correction methods evaluated in this study.
- FIGS.12a-12d show that random forest normalization reduces batch effects in QC samples.
- PCA Principal components analysis
- FIG.15 shows that metabolites associated with geographic location are not reflective of subject age. Principal components analysis (PCA) of normalized metabolic profiles (polar and lipid metabolites) shows no age-dependent pattern in geographically associated metabolites. Each dot represents a unique sample.
- PCA Principal components analysis
- the present disclosure relates to an improved method for conducting a metabolomics workflow. More specifically, the present disclosure relates to a metabolomics workflow that reduces the computational burden required to analyze the abundance of compounds present in a sample, thereby allowing the process to be scaled for analysis of a larger number of samples than is possible with currently available techniques.
- the disclosed process achieves this scalability using a reference sample that represents the chemical complexity of an entire sample set. Analysis of the refence sample allows detection and identification of thousands of features, most of which do not characterize unique biological metabolites, within a single sample.
- the features may then be filtered to remove non-relevant features, which characterize non-relevant chemical constituents, yielding a list of relevant features.
- This list of relevant features may then be used to identify relevant features characterizing unique-biological metabolites within each individual sample, thereby reducing the computational burden necessary to analyze the sample set.
- a method of the disclosure may generally be practiced by analyzing a reference sample to obtain biologically relevant reference features and using these relevant reference features to identify relevant features in one or more individual samples in the sample set.
- the relevant features may also be quantitated, thereby producing a metabolomic fingerprint of the biological metabolites in the individual sample. In some relevant features and/or the metabolic fingerprint may be used to identify the individual biological metabolites present in the individual sample.
- the relevant features may be assessed in each individual sample in the sample set, serially or sequentially, to determine the metabolomic fingerprint for many or all the individual samples in the sample set.
- a metabolite refers to one or more metabolites.
- the terms “a”, “an”, “one or more” and “at least one” can be used interchangeably.
- the terms “comprising”, “including” and “having” can be used interchangeably.
- the term “comprising” may be replaced with “consisting” or with “consisting essentially of” in particular aspect, as desired.
- the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation.
- One aspect of the disclosure is a method of non-targeted determination of unique biological metabolites in individual samples of a sample set, each individual sample comprising a composition of chemical constituents, the method comprising analyzing reference features obtained from a reference sample to identify non- relevant reference features; filtering the reference features by removing the non-relevant reference features, thereby producing a set of relevant reference features that characterize the unique biological metabolites; and, applying the non-relevant and/or relevant reference features to sample features obtained from an individual sample to determine the composition of unique biological metabolites in the individual sample.
- non-targeted determination means characterizing the chemical composition of a complex sample containing chemical constituents without prior knowledge of one or more chemical constituents in the sample.
- chemical constituents refers to the molecules present within a sample.
- Chemical constituents encompasses any molecule present within a sample including but not limited to, metabolites, including endogenous metabolites, exogenous metabolites, unique metabolites, non-unique metabolites and contaminants.
- metabolite and “biological metabolite”, may be used interchangeably and refer to the set of small molecules comprising substrates, intermediates, and products of metabolism.
- Metabolites of the disclosure are generally less than about 2000 kDa in size, optionally less than about 1500 kDa in size. Metabolites include both endogenous metabolites (i.e., produced by an individual) and exogenous metabolites (i.e., drugs, environmental toxins, etc.). Examples of endogenous metabolites include, but are not limited to, organic acids, fatty acids, triglycerides, cholesterol, phospholipids, sugars, vitamins, and co-factors. It is understood by those of skill in the art that analysis of a metabolite using a technique such as mass spectrometry results in the production of numerous species of metabolite ions.
- the different species result from such things as fragmentation of the metabolite, incorporation into the metabolite of isotopes such as carbon- 13 or nitrogen-15, and binding of an ion species to such things salts or solvent, to form adducts. This may result in the production of several signals, each signal coming from a different species of the metabolite.
- one single ion species is chosen to represent each metabolite.
- a unique metabolite refers to a select, monoisotopic, single ion species of an intact metabolite.
- the term “unique metabolite” excludes fragmented species of metabolites, metabolites comprising naturally abundant stable isotopes, such as carbon-13 or nitrogen-15, and ion species of the metabolite other than the selected ion species, such as other adducts of the metabolite.
- a non-unique metabolite refers to a fragment of a metabolite, a non- monoisotopic form of the selected metabolite and additional adducts of the selected ion species of the metabolite.
- the term back “background” refers to features characterizing contaminants and artifacts.
- contaminant refers to chemicals present in the sample but that originate from outside of the sample (e.g., from the equipment used to analyze the sample).
- a tube holding a sample may contain plasticizers that may leech into the sample causing the sample to contain plasticizer.
- Plasticizer in the sample represents a contaminate.
- an “artifact” refers to features that do not characterize real chemicals, but that arise from electronic noise in the mass spectrometer or errors in downstream peak detection software. Methods of identifying contaminants and artifacts are known to those skilled in the art and are also disclosed herein.
- the term “feature” refers to the one or more points of data, such as a peak retention time or mass-to-charge (m/z) value, that characterize a chemical constituent.
- a feature obtained using a separation technique such as chromatography (e.g., HPLC)
- retention time rt
- mass spectrometry may comprise a m/z value.
- a feature obtained using a separation technique (e.g., chromatography) and mass spectrometry may comprise both a rt and a m/z value.
- a “relevant feature” is a feature that characterizes a unique metabolite.
- a “non-relevant feature” is a feature that characterizes non-unique metabolites, contaminants, and that are artifacts.
- Features (e.g., reference features, sample features, etc.) of the disclosure are obtained from samples (e.g., reference samples, individual samples) of the disclosure using various methods.
- features are obtained by subjecting a sample to a separation technique. Any separation that suitably separates chemical constituents within the sample may be used. Examples of separation techniques include, but are not limited to, chromatography and electrophoresis, including capillary electrophoresis.
- Suitable methods of chromatography include, but are not limited to, gas chromatography (GC), high performance liquid chromatography (HPLC), or ultra-performance liquid chromatography (UPLC).
- features are obtained by subjecting the sample to mass spectrometry. In certain aspects, features are obtained by subjecting the sample to chromatography, such as HPLC or UPLC, and mass spectrometry. Accordingly, in certain aspects, a feature comprises a rt and a m/z value.
- biological sample refers to a sample obtained from an individual, including a sample of biological tissue or fluid origin obtained in vivo or in vitro. Such samples may include, but are not limited to, blood, serum, plasma, urine, cerebrospinal fluid, tears, saliva, sputum, lymph fluids, dialysates, lavage fluids, and fluids derived from organs or tissue.
- sample refers to a biological sample obtained from a single individual.
- sample set refers to a defined plurality of individual samples.
- Examples include, but are not limited to, humans and other primates, including non-human primates such as chimpanzees and other apes and monkey species; farm animals such as cattle, sheep, pigs, seals, goats and horses; domestic mammals such as dogs and cats; laboratory animals including rodents such as mice, rats and guinea pigs; birds, including domestic, wild and game birds such as chickens, turkeys and other gallinaceous birds, ducks, geese, and the like.
- references samples used in methods of the disclosure should capture the complexity of the chemical constituents present in the sample set.
- any suitable reference sample that sufficiently captures such complexity may be used.
- the refence sample may a sample designed to represent “normal” human plasma.
- the reference sample may be produced by combining aliquots from a plurality of individual samples in the sample set. Any number of aliquots may be combined although it will be understood that the larger the number of aliquots combined, the more accurate the result.
- the refence sample comprises aliquots from at least about 10%, at least about 20%, at least about 30% ⁇ at least about 40% ⁇ at least about 50% ⁇ at least about 60% ⁇ at least about 70% ⁇ at least about 80% ⁇ at least about 90% ⁇ or from 100% of the individual samples in the sample set.
- the refence sample comprises aliquots from at least 2, at least about 5, at least about 10, at least about 25, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 500, at least about 750, at least about 1000, at least about 500, or at least about 2000 individual samples in the sample set.
- reference features are obtained from the refence sample.
- the reference features may have been obtained from the sample at a point in time significantly prior to (e.g., days or weeks) than the time at which analysis of the refence features is conducted.
- the entities obtaining the reference features from the reference sample and conducting the analysis may be, but need not be, the same entity.
- reference features may be stored, such as in a feature matrix, and retrieved at a later time for analysis.
- obtaining the reference features from the refence sample may be part of the disclosed method so that obtainment and analysis of the features is conducted relatively simultaneously.
- the refence features are obtained by subjecting the reference sample to a separation technique and/or mass spectrometry.
- the refence features are obtained by subjecting the reference sample to a separation technique and mass spectrometry to produce a reference feature comprising a rt and a m/z value.
- Filtering of the reference features may comprise identifying and removing non-relevant reference features to produce a set of relevant reference features (e.g., a feature matrix, feature table, feature list, etc.) that characterize unique biological metabolites.
- the identified non-relevant reference features and/or the relevant reference features may then be used to analyze data obtained from an individual sample to determine the composition of unique biological metabolites in the individual sample.
- Such a method is advantageous as it restricts the computation necessary to identify both the relevant and non-relevant features to the reference sample, thereby reducing the computational burden on the entire sample set of individual samples.
- using relevant reference features to analyze data from an individual sample may comprise applying the relevant reference features to sample features.
- applying the relevant reference features to sample features may comprise comparing the relevant reference features to the sample features, identifying those sample features corresponding to relevant refence features, and using those samples features that correspond to relevant reference features to determine the composition of unique biological metabolites in the individual sample.
- corresponding features are features from two different samples that comprise the same data points (e.g., the same rt, the same m/z value, or the same rt and the same m/z value).
- applying the relevant reference features to sample features may comprise limiting the analysis of data obtained from an individual sample to that data having retention times and/or m/z values of the relevant reference features.
- applying the non-relevant reference features to sample features may comprise ignoring (i.e., excluding from further analysis) sample features corresponding to non-relevant refence features.
- the remaining set of relevant sample features characterize the unique biological metabolites in the sample from which the sample features were obtained and may be used to identify the composition of unique biological metabolites in the individual sample.
- the instant disclosure has discussed applying the non- relevant, and relevant, reference features to sample features from one individual sample. However, it will be understood by those of skill in the art that because the reference sample reflects the complexity of the entire sample set, the aforementioned processes of applying the refence features to sample features may be iterated with each individual sample in the sample set.
- the result of such iteration is the determination of the composition of unique metabolites in every individual sample in the sample set.
- measurements may drift due to environmental factors (e.g., temperature, humidity, etc.) affecting the measurement instruments.
- environmental factors e.g., temperature, humidity, etc.
- Such shift may cause a change in the features obtained for a particular metabolite, making it difficult to compare and align features across samples analyzed at disparate times.
- iRT indexed retention time
- the reference features may be updated by periodically subjecting the reference sample to the separation technique and mass spectrometry, thereby obtaining updated reference features from the reference sample.
- the reference sample is re- analyzed to update the refence features after a plurality of individual samples is analyzed.
- the plurality of individual samples comprises at least 2, at least 3, at least 5, at least 10, at least 20, at least 50, at least 100 or at least 150 samples.
- Samples [0056] Blood samples were collected at participants’ homes in dipotassium ethylenediaminetetraacetic acid (K2- EDTA) collection tubes, which were immediately placed on a frozen gel pack. After shipment to the laboratory via courier or postal service, the samples were then centrifuged to isolate plasma, which was subsequently stored at -80 °C. Pooled samples were prepared from a subset of plasma samples to serve as quality control (QC) samples, and for use in peak list formation and metabolite identification (see Supporting Information). The QC sample was prepared from 58 samples of the first analysis batch and thus subjecting samples to an additional freeze-thaw cycle was avoided.
- K2- EDTA dipotassium ethylenediaminetetraacetic acid
- Polar metabolite extracts were directly analyzed (without any drying step) via hydrophilic interaction liquid chromatography (HILIC) coupled to HRMS in negative mode. For both LC/MS analyses, samples were randomized. Additionally, LC/MS/MS data were acquired to aid metabolite identification. [0059] Even though positive-mode and negative-mode data for both lipid and polar metabolite extracts was not collected, doing so would be beneficial when resources permit. Blank samples were injected at the beginning and end of each worklist and used for background peak detection/removal. [0060] Representative total ion chromatograms for blank, study, and QC samples for both lipid and polar metabolite extracts are shown in FIGS.2a & 2b.
- the R and Python scripts used to perform the peak detection analysis are available on GitHub (https://github.com/e-stan/metabolomics_workflow) and include the values of all parameters utilized.
- a peak list for the lipid metabolites was directly generated based on identifications from Lipid Annotator. Any workflow or software can be used to generate peaks lists.
- Metabolite identification [0064] Identification of polar metabolites was supported by matching the accurate mass and MS/MS fragmentation data to an in-house MS/MS library created from authentic reference standards and online MS/MS libraries with DecoID software. For online database searching, the top hit for each feature with a dot-product similarity of greater than 80 was considered as the putative identification.
- MSI identification levels are given in Table S1 and Table S2 for polar and lipid metabolites, respectively. Code and scripts used to perform the automated portion of the metabolite identification workflow are available on GitHub. Lipid iterative MS/MS data were annotated with the Lipid Annotator software (Agilent Technologies), and lipid identifications were provided as sum compositions because insufficient information was available to deduce specific fatty acid compositions. Lipid identifications were subject to the same manual curation as applied to the polar metabolite data. Any workflow or software can be used for compound identification.
- Extracting peak areas [0066] Following the generation of a peak list and metabolite identification, all data files were analyzed in Skyline (version 20.1.0.155) batch per batch to obtain peak areas. The m/z values of the metabolite target lists were used to extract peak areas under consideration of retention times or indexed retention times (iRT) (see Supporting Information). Due to the data being acquired over several months, 14 different batch correction approaches were tested for peak area normalization (see Supporting Information). Additionally, a report containing the acquisition times of all samples was exported to be used for batch correction.
- iRT indexed retention times
- Polar metabolites were separated on a SeQuant® ZIC®-pHILIC column (100 x 2.1 mm, 5 ⁇ m, polymer, Merck-Millipore) including a ZIC®-pHILIC guard column (2.1 mm x 20 mm, 5 ⁇ m). The use of an inline filter prior to the guard column is recommended.
- the column compartment temperature was maintained at 40 ⁇ C and the flow rate was set to 250 ⁇ L ⁇ min-1.
- the mobile phases consisted of A: 95% water, 5% acetonitrile, 20 mM ammonium bicarbonate, 0.1% ammonium hydroxide solution (25% ammonia in water), 2.5 ⁇ M medronic acid, and B: 95% acetonitrile, 5% water, 2.5 ⁇ M medronic acid.
- Medronic acid was used in mobile phase B (mainly acetonitrile) in this and previous studies.
- the method may also be practiced by adding 5 ⁇ M medronic acid to the aqueous mobile phase A, but no medronic acid in B.
- the m/z range was 50-1700. Data were acquired under continuous reference mass correction m/z 119.0363 and 966.0007. Samples were randomized prior to analysis. In addition, a quality control (QC) sample was injected after every 12th sample to monitor signal stability of the instrument. The HILIC column was equilibrated with three blank and four QC injections prior to starting the actual samples. Prior to every batch, the mass spectrometer was calibrated and before starting to inject the actual samples, the TIC, internal standard intensities, and mass accuracy of the QC samples was compared to previous batches. More details on best practices for system suitability testing can be found elsewhere.
- QC quality control
- LC/MS analysis of lipid metabolites [0065] An aliquot of 4 ⁇ L of lipid extract was subjected to LC/MS analysis by using an Agilent 1290 Infinity II LC-system coupled to an Agilent 6545 Q-TOF mass spectrometer with a dual Agilent Jet Stream electrospray ionization source. Lipids were separated on an Acquity UPLC® HSS T3 column (2.1 x 150 mm, 1.8 ⁇ m) including an Acquity UPLC® HSS T3 VanGuard Pre-Column (2.1 x 5mm, 1.8 ⁇ m) at a temperature of 60 ⁇ C and a flow rate of 250 ⁇ L ⁇ min-1.
- the mobile phases consisted of A: 60% acetonitrile, 40% water, 0.1% formic acid, 10 mM ammonium formate, 2.5 ⁇ M medronic acid, and B: 90% 2-propanol, 10% acetonitrile, 0.1% formic acid, 10 mM ammonium formate (dissolved in 1 mL water).
- the following linear gradient was used: 0-2 min, 30% B; 17 min, 75% B; 20 min, 85%; 23-26 min, 100% B; 26, 30% B followed by a re- equilibration phase of 5 min.
- Lipids were detected in positive ion mode at a scan rate of 2 spectra per second with the following source parameters: gas temperature 250 ⁇ C, drying gas flow 11 L ⁇ min-1, nebulizer pressure 35 psi, sheath gas temperature 300 ⁇ C, sheath gas flow 12 L ⁇ min-1, VCap 3000 V, nozzle voltage 500 V, Fragmentor 160 V, Skimmer 65 V, Oct 1 RF Vpp 750 V.
- the m/z range was 50-1700. Data were acquired under continuous reference mass correction at m/z 121.0509 and 922.0890 in positive ion mode. Samples were randomized before analysis. In addition, a QC sample was injected after every 12th sample to monitor signal stability of the instrument.
- the RP column was equilibrated with three blank and three QC injections prior to starting the actual samples. The same system suitability testing described for the polar metabolite analysis was used for the lipid metabolite analysis.
- Preparation of pooled samples [0067] A pool of samples from the first batch was prepared by mixing 330 ⁇ L of all samples that contained at least 500 ⁇ L (58 samples). Next, 160 aliquots of 110 ⁇ L each were frozen at -80 °C, so that two 50 ⁇ L pooled QC samples could be included in every batch. This pooled QC not only served as a reference sample across batches, but it was also used for feature detection and metabolite identification.
- Metabolite identifications were supported by the pooled QC samples and 8 additional pooled samples that were prepared by pooling 10 ⁇ L aliquots of 10 samples of 8 random batches.
- the eight additional pooled samples cover the four different locations and a wide age range.
- Three pooled samples contained only samples from Denmark, whereas the other five were a mix of the three different US locations.
- the average age ranged from 69.1 to 87.9 years, with an average standard deviation of 13.3 years. Across those pooled samples, 52% were from male and 48% from female participants. MS/MS data were acquired in negative mode (see details below) on 9 total pooled samples.
- MS/MS spectra for polar and lipids metabolites were acquired by using an iterative data dependent acquisition (iDDA) approach in the MassHunter Acquisition Software (Version 10.1.48, Agilent Technologies) on an Agilent 6545 QTOF. The same source settings as for MS1 data acquisition were used. MS/MS spectra were acquired at a scan rate of 3 spectra/s with different intensity thresholds and collision energies of 10, 20, and 40 V to increase identification rates.
- iDDA can be readily implemented by creating consecutive acquisition methods for the blank and study samples in the iterative workflow. Thus, exclusion lists no longer need to be manually generated after each run. Details on how to achieve iDDA on the Agilent system can be found in an application note.
- MS/MS data for polar metabolites were acquired on an Orbitrap ID-X Tribrid mass spectrometer (Thermo Scientific).
- a Vanquish Horizon UHPLC system was interfaced with the mass spectrometer via electrospray ionization in negative mode with a spray voltage of 2.8 kV.
- RF lens value was 60%.
- Data were acquired in data dependent acquisition (DDA) mode by using the built-in deep scan option (AcquireX) with a mass range of 67-900 m/z and 120K resolution for MS1 scans. MS/MS scans were acquired at 15K resolution from three distinct pooled samples.
- DDA data dependent acquisition
- AcquireX built-in deep scan option
- MS/MS scans were acquired at 15K resolution from three distinct pooled samples.
- iRT uses the retention time of selected “indexing” compounds to adjust the retention times of other compounds.
- iRT were implemented for each lipid class, selecting 2-3 lipid metabolites per class as indexing compounds (see Table 3). In selecting indexing compounds, lipids were chosen that were chromatographically well separated from other species and that had high intensity.
- the first step of the process is to compute the iRT of each lipid in a class. The calculation relies on the retention times of each lipid as measured from a single sample.
- iRT uses the retention time of the first ( ⁇ ⁇ ) and last ( ⁇ ⁇ ) eluting lipid in a class to convert a measured retention time for a compound ( ⁇ ⁇ ) to its corresponding iRT ( ⁇ ⁇ ).
- ⁇ ⁇ (1)
- the iRT of the indexing compounds ( ⁇ ) and the observed retention times for the indexing compounds in the sample are used to calculate a linear regression of iRT to retention time by minimizing the error between observed and fit retention times, according to Equation 2.
- each metabolite was normalized by computing the mean of the metabolite intensity in the flanking QC samples for a particular research sample and multiplying the metabolite’s intensity in this research sample by the mean of the metabolite’s intensity across all QC samples and dividing by the mean of the flanking QC samples.
- each batch of samples was QC normalized individually followed by an inter-batch correction with ComBat.
- SVR linear regression, and random forest, the respective model was fit for each metabolite by using the batch and run-order position of the QC samples against the deviation in the metabolite’s intensity from the mean QC intensity.
- the process reduces the data burden of untargeted metabolomics such that informatics tools typically applied to targeted studies can be leveraged to profile research samples efficiently and rapidly, without the need to subject each sample to computationally intensive analyses (e.g., peak detection, correspondence determination, peak grouping, metabolite identification, etc.).
- a schematic of the workflow is shown in FIG.3.
- the disclosed workflow was used to analyze a subset of ⁇ 2,000 human plasma samples from the Long Life Family Study (LLFS), conducted by the National Institute on Aging at the National Institutes of Health (NIH), Bethesda, Maryland, U.S.
- LLFS Long Life Family Study
- peak areas were extracted from the research and QC samples in a batch-by-batch fashion by using Skyline .
- the Skyline command-line interface can be used to automatically generate a document for each batch.
- retention times were stable for all samples within a batch (FIGS.4a & 4b).
- retention-time bounds were set by inspection of the QC samples within each batch, and these bounds were applied to all samples by importing peak boundaries for each sample.
- a Python script was used to generate the peak boundary import file in the current work, this is no longer necessary when using the “Synchronize Integration” function in Skyline 21.2, which was released after the instant data processing was performed.
- the list which was comprised of 5,894 total features, included features that were only detected from a single research sample.
- a total of 3,241 features (both identified and unknowns) from the list were detected in at least one replicate of the pooled sample (FIG. 6a).
- the m/z values of all features were searched against endogenous metabolites in the Human Metabolome Database and the Kyoto Encyclopedia of Genes and Genomes. In total, 40.4% of the features detected in the pooled sample had at least one hit in the databases.
- missing values must be removed from the data to facilitate downstream processing.
- missing values were infrequent ( ⁇ 0.03% of all measurements) and most likely arose from metabolites at concentrations below the limit of detection of the instrument rather than random metabolite dropout during peak detection.
- missing values were imputed with the half minimum approach.
- the disclosed method focuses on processing data from a small number of pooled samples that are created by mixing aliquots of individual samples from the study. By initially limiting data processing to only the pooled samples, the disclosed method leverages standard informatics tools in untargeted metabolomics that have been optimized for small sample sizes.
- the pooled samples are intended to capture the totality of unique compounds across the entire study cohort but, in practice, unique compounds from individual samples are missed.
- An analysis of the data shows that unique compounds are most often missed when they are only present in a few participant samples at low concentrations, which causes them to be diluted below the limit of detection during pooling.
- the results indicate that signals missing from the analysis of pooled samples are more likely to originate from rare exogenous compounds (e.g., chemicals from unique hygiene products or specific environments). While rare exogenous compounds may certainly be biologically interesting, their analysis will require the development of new peak detection, alignment, and annotation algorithms that can be scaled to thousands of samples without loss of functionality or accuracy.
- the present disclosure enables high-throughput applications of untargeted metabolomics at the population scale, such as observed in the LLFS.
- the disclosed workflow collects separation and mass spectrometry features for all individual samples in the sample set.
Landscapes
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
La présente divulgation concerne un procédé de flux de travail amélioré pour des analyses métaboliques d'un grand nombre d'échantillons dans un ensemble d'échantillons. Le procédé amélioré utilise un échantillon de référence qui représente la complexité chimique de l'ensemble d'échantillons. L'analyse de l'échantillon de référence identifie les caractéristiques pertinentes et non pertinentes, qui peuvent ensuite être utilisées pour centrer l'analyse d'échantillons individuels sur des parties spécifiques des données obtenues à partir de chaque échantillon individuel, ce qui permet de réduire la charge de calcul de l'analyse de tous les échantillons dans l'ensemble d'échantillons. Des systèmes pour mettre en œuvre les procédés divulgués sont également divulgués.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263386087P | 2022-12-05 | 2022-12-05 | |
| PCT/US2023/082308 WO2024123681A2 (fr) | 2022-12-05 | 2023-12-04 | Procédé amélioré pour un flux de travail métabolomique non ciblé évolutif |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4630813A2 true EP4630813A2 (fr) | 2025-10-15 |
Family
ID=91380079
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP23901396.4A Pending EP4630813A2 (fr) | 2022-12-05 | 2023-12-04 | Procédé amélioré pour un flux de travail métabolomique non ciblé évolutif |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP4630813A2 (fr) |
| AU (1) | AU2023391474A1 (fr) |
| WO (1) | WO2024123681A2 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120820662B (zh) * | 2025-09-17 | 2025-11-18 | 岛津企业管理(中国)有限公司 | 一种同步分析检测水产品中代谢物和脂质的方法 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7243030B2 (en) * | 2002-10-25 | 2007-07-10 | Liposcience, Inc. | Methods, systems and computer programs for deconvolving the spectral contribution of chemical constituents with overlapping signals |
| DK1875401T3 (da) * | 2005-06-30 | 2014-04-07 | Biocrates Life Sciences Ag | Indretning til kvantitativ analyse af en metabolitprofil |
| EP2270699A1 (fr) * | 2009-07-02 | 2011-01-05 | BIOCRATES Life Sciences AG | Procédé de normalisation dans les procédés d'analyse métabolomiques avec des métabolites de référence endogène |
| JP7273844B2 (ja) * | 2018-04-05 | 2023-05-15 | イーエニエーエスセー テック - インスティチュート デ エンゲンハリア デ システマス エ コンピュータドレス テクノロジア エ シエンシア | 試料からの成分の定量化値を予測する分光測光方法及び装置 |
-
2023
- 2023-12-04 AU AU2023391474A patent/AU2023391474A1/en active Pending
- 2023-12-04 EP EP23901396.4A patent/EP4630813A2/fr active Pending
- 2023-12-04 WO PCT/US2023/082308 patent/WO2024123681A2/fr not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024123681A2 (fr) | 2024-06-13 |
| AU2023391474A1 (en) | 2025-06-19 |
| WO2024123681A3 (fr) | 2024-07-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Muthubharathi et al. | Metabolomics: small molecules that matter more | |
| Courant et al. | Basics of mass spectrometry based metabolomics | |
| Naz et al. | Analytical protocols based on LC–MS, GC–MS and CE–MS for nontargeted metabolomics of biological tissues | |
| US8068987B2 (en) | Method and system for profiling biological systems | |
| Lindon et al. | Metabonomics in pharmaceutical R & D | |
| US7653496B2 (en) | Feature selection in mass spectral data | |
| Liu et al. | High-resolution metabolomics assessment of military personnel: evaluating analytical strategies for chemical detection | |
| Wanichthanarak et al. | Accounting for biological variation with linear mixed-effects modelling improves the quality of clinical metabolomics data | |
| Dervilly‐Pinel et al. | Metabolomics in food analysis: application to the control of forbidden substances | |
| Harlina et al. | Possibilities of liquid chromatography mass spectrometry (LC-MS)-based metabolomics and lipidomics in the authentication of meat products: a mini review | |
| Jiang et al. | An integrated metabonomic and proteomic study on Kidney-Yin Deficiency Syndrome patients with diabetes mellitus in China | |
| Southam et al. | Characterization of monophasic solvent-based tissue extractions for the detection of polar metabolites and lipids applying ultrahigh-performance liquid chromatography–mass spectrometry clinical metabolic phenotyping assays | |
| Quesada-Calvo et al. | Comparison of two FFPE preparation methods using label-free shotgun proteomics: Application to tissues of diverticulitis patients | |
| JP2009500617A (ja) | 化学試料を特徴づけるシステムおよび方法 | |
| AU2023391474A1 (en) | Improved method for scalable untargeted metabolomic workflow | |
| Shi et al. | MS based foodomics: An edge tool integrated metabolomics and proteomics for food science | |
| Stojiljkovic et al. | Evaluation of horse urine sample preparation methods for metabolomics using LC coupled to HRMS | |
| Al-Salhi et al. | Analytical strategies to profile the internal chemical exposome and the metabolome of human placenta | |
| Deda et al. | GC-MS-based metabolic phenotyping | |
| Wishart et al. | Metabolomics | |
| Liu et al. | Analysis of the lipidomic profile of vegetable oils and animal fats and changes during aging by UPLC-Q-exactive orbitrap mass spectrometry | |
| Çelebier et al. | Recent developments in CE-MS based metabolomics | |
| Lei et al. | SpecLipIDA: a pseudotargeted lipidomics approach for polyunsaturated fatty acids in milk | |
| Sunyer-Caldú et al. | Screening of Biological Samples with HRMS to Evaluate the External Human Chemical Exposome | |
| Gonzalez-Riano et al. | Advanced lipidomics using UHPLC-ESI-QTOF-MS/MS reveals novel lipids in hibernating syrian hamsters |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20250625 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |