[go: up one dir, main page]

WO2024195760A1 - Procédé de traitement d'informations, dispositif de traitement d'informations, et programme informatique - Google Patents

Procédé de traitement d'informations, dispositif de traitement d'informations, et programme informatique Download PDF

Info

Publication number
WO2024195760A1
WO2024195760A1 PCT/JP2024/010463 JP2024010463W WO2024195760A1 WO 2024195760 A1 WO2024195760 A1 WO 2024195760A1 JP 2024010463 W JP2024010463 W JP 2024010463W WO 2024195760 A1 WO2024195760 A1 WO 2024195760A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
mass spectrum
processing method
list
microorganisms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/010463
Other languages
English (en)
Japanese (ja)
Inventor
華奈江 寺本
慎一 岩本
勇地 関口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shimadzu Corp
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Shimadzu Corp
National Institute of Advanced Industrial Science and Technology AIST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shimadzu Corp, National Institute of Advanced Industrial Science and Technology AIST filed Critical Shimadzu Corp
Priority to JP2025508558A priority Critical patent/JPWO2024195760A1/ja
Publication of WO2024195760A1 publication Critical patent/WO2024195760A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/02Details
    • H01J49/10Ion sources; Ion guns
    • H01J49/16Ion sources; Ion guns using surface ionisation, e.g. field-, thermionic- or photo-emission

Definitions

  • the present invention relates to information processing for identifying microorganisms.
  • Non-Patent Document 2 discloses classification of Propionibacterium acnes at the subtype level by MALDI-MS (Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry) proteotyping.
  • the object of the present invention is to provide a technology that allows even users without specialist knowledge to perform self-calibration.
  • An information processing method is an information processing method for identifying microorganisms, and includes the steps of obtaining a mass spectrum by performing mass analysis on a sample containing the microorganisms, identifying the types of microorganisms assumed to be contained in the sample, obtaining the true m/z of a group of peaks common to the types of microorganisms, and calibrating the mass spectrum so that the m/z of a peak corresponding to the true m/z contained in the mass spectrum matches the true m/z.
  • An information processing device includes one or more processors and a storage device storing a program that, when executed by the one or more processors, causes the one or more processors to implement the information processing method described above.
  • a computer program when executed by one or more processors, causes the one or more processors to perform the information processing method described above.
  • a technique is provided that allows even users without specialist knowledge to perform self-calibration.
  • FIG. 1 is a schematic diagram showing a configuration of an analysis device 1 according to an embodiment of the present invention.
  • 4 is a flowchart of a process performed by a controller 10.
  • 3 is a flowchart of a subroutine of step S14 in FIG. 2.
  • 3 is a flowchart of a subroutine of step S18 in FIG. 2.
  • FIG. 13 is a diagram showing an example of a mass spectrum included in a report.
  • FIG. 13 is a diagram showing an example of a created dendrogram.
  • FIG. 13 is a diagram showing an example of a screen showing the results of conducting a biomarker search.
  • FIG. 1 shows an example of a mass spectrum calibrated using an external standard substance.
  • FIG. 9 is a diagram showing an example of the results of identification performed within an allowable error range of 800 ppm using the mass spectrum shown in FIG. 8 .
  • FIG. 9 is a diagram showing an example of the results of identification performed within an allowable error range of 200 ppm using the mass spectrum shown in FIG. 8 .
  • FIG. 13 is a diagram showing an example of the results of identification performed within a 200 ppm tolerance range using a mass spectrum calibrated by self-calibration using theoretical m/z values.
  • 13 is a flowchart showing an example of a process for constructing an m/zDB.
  • 13 is a flowchart showing an example of a process for determining genome data.
  • the analysis device 1 is a schematic diagram showing the configuration of an analytical device 1 according to an embodiment of the present invention.
  • the analytical device 1 is a mass spectrometer for performing mass analysis of substances contained in a sample, such as a MALDI-TOF MS (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry).
  • MALDI-TOF MS Microx-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry
  • the analytical device 1 includes a controller 10 and a measurement unit 20.
  • the controller 10 is an example of an information processing device, and performs mass calibration of the mass spectrum acquired from the measurement unit 20 by "self-calibration.” "Self-calibration" will be described later with reference to FIG. 2.
  • the measurement unit 20 ionizes substances in the sample using high voltage.
  • the ionized substances in the sample are shown as ions S in FIG. 1.
  • the movement of ions S in the measurement unit 20 is shown diagrammatically by arrow A1.
  • the measurement unit 20 separates ions S according to their time of flight, which correlates with m/z, and then detects them.
  • the measurement unit 20 includes an ionization unit 21, an ion acceleration unit 22, a mass separation unit 23, and a detection unit 24.
  • the ionization unit 21 ionizes substances in the sample by matrix-assisted laser desorption ionization (MALDI).
  • MALDI matrix-assisted laser desorption ionization
  • any soft ionization method such as electrospray ionization (ESI) can be used in addition to MALDI.
  • ESI electrospray ionization
  • the analysis device 1 further includes a liquid chromatograph to obtain high separation ability, and is configured to introduce substances in the sample separated by the liquid chromatograph into the ionization unit 21.
  • the ionization unit 21 includes an ion source including a sample plate holder (not shown) that supports a sample plate, and a laser device (not shown) that irradiates laser light onto the sample plate. After the sample is set on the sample plate, a matrix is added to the sample, and the sample is dried. The sample plate is then placed in the sample plate holder in the vacuum vessel of the ionization unit 21.
  • the matrix is, for example, sinapinic acid or ⁇ -cyano-4-hydroxycinnamic acid (CHCA).
  • the ionization unit 21 reduces the pressure of the vacuum vessel in which the sample plate is placed, and then irradiates each sample on the sample plate with laser light to sequentially ionize each sample.
  • laser light There are no particular limitations on the type of laser device that irradiates the laser light, as long as it can emit light that is absorbed by the selected matrix.
  • the matrix contains sinapinic acid or CHCA, for example, an N2 laser (wavelength 337 nm) can be used.
  • the ions S ionized in the ionization unit 21 are extracted by an electric field created by an extraction electrode or the like (not shown), and introduced into the ion acceleration unit 22.
  • the ion acceleration section 22 includes an acceleration electrode 221 and accelerates the introduced ions S.
  • the flow of accelerated ions S is appropriately focused by an ion lens (not shown) and introduced into the mass separation section 23.
  • the mass separation section 23 includes a flight tube 231. Two or more types of ions S are separated due to differences in flight time when flying inside the flight tube 231. Although a linear type flight tube 231 is shown in FIG. 1, the flight tube 231 may be of other types, such as a reflectron type or a multi-turn type. In addition, there are no particular limitations on the method of mass analysis, as long as it is possible to separate and detect the ions S contained in the sample.
  • the detection unit 24 includes an ion detector such as a multichannel plate, detects the ions S separated by the mass separation unit 23, and outputs a detection signal with an intensity corresponding to the number of ions incident on the detection unit 24.
  • the detection signal output from the detection unit 24 is input to the processing unit 11.
  • FIG. 1 the flow of the detection signal of the ions S from the detection unit 24 is shown diagrammatically by the arrow A2.
  • the controller 10 includes a processing unit 11 , a storage device 12 , and an input/output unit 13 .
  • the processing unit 11 may include a processor such as a CPU (Central Processing Unit).
  • the processing unit 11 functions as a main unit for controlling the analysis device 1.
  • various types of processing may be realized by the processor executing a program.
  • the program is non-temporarily stored in the storage device 12 (or a recording medium accessible by the processor).
  • the processing unit 11 includes a device control unit 111, a mass spectrum creation unit 112, a mass spectrum analysis unit 113, and a calibration unit 114.
  • the device control unit 111 controls the operation of the measurement unit 20 based on the analysis conditions input from the input unit 131, which will be described later.
  • the control of the measurement unit 20 by the device control unit 111 is shown diagrammatically by the arrow A3.
  • the mass spectrum creation unit 112 acquires measurement data including the amount of ions detected by the detection unit 24 and the flight time of the ions. The mass spectrum creation unit 112 then uses the measurement data to convert the flight time into an m/z value and creates a mass spectrum indicating the amount of ions detected corresponding to each m/z value.
  • the mass spectrum analysis unit 113 detects peaks in the mass spectrum and identifies the m/z that corresponds to the detected peak.
  • the mass spectrum analysis unit 113 uses a protein database to determine the substance to which the identified m/z corresponds. This identifies the substance contained in the sample.
  • the mass spectrum analysis unit 113 may further determine whether or not the sample contains a specific substance based on the m/z identified as described above, calculate the concentration of the specific substance in the sample, or identify living organisms (which may be microorganisms) contained in the sample.
  • the mass spectrum analysis unit 113 may also perform structural analysis of the substances contained in the sample.
  • the calibration unit 114 calibrates the mass spectrum by correcting the m/z of each peak obtained from the mass spectrum as described above to a highly accurate measured m/z or calculated m/z.
  • the storage device 12 is composed of a storage device that stores information in a non-volatile manner, such as a hard disk.
  • the input/output unit 13 is an interface through which the analysis device 1 inputs and outputs information between itself and an external device, and includes an input unit 131, an output unit 132, and a communication unit 133.
  • the input unit 131 is realized as an input device (for example, a mouse, a keyboard, various buttons, and/or a touch panel).
  • the output unit 132 is realized as an output device (for example, a display device such as an LCD monitor, a printer).
  • the communication unit 133 is realized as a communication device (for example, a communication interface for communicating with other devices via a network or directly).
  • controller 10 may be realized by an information processing device that is physically separate from the measurement unit 20.
  • Processing Flow Fig. 2 is a flowchart of the process performed by the controller 10.
  • the process of Fig. 2 is realized in the controller 10 by one or more processors executing a given program.
  • the controller 10 starts the process of Fig. 2 when it determines that the timing to start the analysis of the sample has arrived.
  • the timing to start the analysis of the sample arrives when the preparations for the analysis of the sample are complete (a sample to be analyzed is prepared by mixing a standard substance with a sample derived from a microorganism, the sample to be analyzed is set on a sample plate, a matrix is added to the sample, the sample is dried, and then the sample plate is placed in a sample plate holder in a vacuum vessel of the ionization unit 21).
  • step S10 the controller 10 acquires a mass spectrum created based on the results of sample detection by the measurement unit 20 (detection unit 24).
  • step S12 the controller 10 uses the mass spectrum acquired in step S10 to identify the sample (or the microorganisms contained therein).
  • sample identification is also performed in step S18, which will be described later.
  • the identification in step S12 is referred to as "primary identification,” and the identification in step S18 is referred to as "re-identification.”
  • step S12 the controller 10 identifies the microorganisms contained in the sample by comparing the m/z values of one or more peaks in the mass spectrum acquired in step S10 with data in a library stored in the memory device 12.
  • the data in the library includes the m/z values of each peak of one or more microorganisms.
  • the data in the library used in step S12 can be understood as a theoretical value database (hereinafter also referred to as the "theoretical value DB").
  • the theoretical value DB is a list of the m/z values of proteins estimated from genome base sequence information. Since it is difficult to predict the expression level and ionization efficiency of proteins from genome base sequence information, the theoretical value DB does not contain intensity information.
  • Step S12 is an example of a step for identifying the type of microorganisms assumed to be contained in the sample.
  • step S14 the controller 10 calls up a reference dataset from a database (DB) stored in the storage device 12.
  • the reference dataset includes theoretical m/z values of proteins that constitute microorganisms.
  • the database includes multiple reference datasets.
  • step S14 the controller 10 calls up a reference dataset that is selected based on the results of the primary identification. Note that step S14 is an example of a step of acquiring the true m/z of a group of peaks common to the types of microorganisms.
  • FIG. 3 is a flowchart of a subroutine of step S14 in FIG. As shown in FIG. 3, in step S140, the controller 10 reads out the result of the primary identification in step S14.
  • step S142 the controller 10 determines whether or not the microorganisms contained in the sample have been identified to the species level in the primary identification based on the results read out in step S140. If the controller 10 determines that the microorganisms have been identified to the species level (YES in step S142), the controller 10 advances control to step S144, and if not (NO in step S142), the controller 10 advances control to step S146.
  • step S144 the controller 10 calls up a species dataset from the calibration Species reference DB, which corresponds to the species identified in the primary identification of the microorganisms contained in the sample.
  • the called species dataset corresponds to the reference dataset.
  • the "species dataset” includes theoretical values of masses of one or more types of proteins common to the species. The controller 10 then returns control to FIG. 2. An example of a method for generating a "species dataset” will be described later as generation of a reference dataset, with reference to FIGS. 12 and 13.
  • step S146 the controller 10 calls up a genus dataset from the calibration Genus reference DB, which corresponds to the genus identified in the primary identification for the microorganisms contained in the sample.
  • the called genus dataset corresponds to the reference dataset.
  • the "genus dataset” includes theoretical values of masses of one or more proteins common to the genus.
  • the controller 10 then returns control to FIG. 2.
  • An example of a method for generating a "genus dataset” will be described later as generation of a reference dataset, with reference to FIGS. 12 and 13.
  • step S16 the controller 10 calibrates the mass spectrum acquired in step S10 using the data set called in step S14.
  • the waveform of the mass spectrum is modified so that the m/z of the peak used for calibration matches the m/z of the reference data set.
  • the m/z of the peak other than that used for calibration is interpolated by interpolation, extrapolation, or the like.
  • Step S16 is an example of a step of calibrating the mass spectrum so that the m/z of the peak corresponding to the true m/z contained in the mass spectrum of the sample matches the true m/z.
  • the calibration in step S16 is a self-calibration that uses the theoretical m/z value of the mass spectrum.
  • the m/z to be used in self-calibration may be selected from the data sets called in step S14.
  • One example of the selection criteria is whether the protein (or peak) expected to be contained in the sample is a ribosomal protein. If the protein expected to be contained in the sample is a ribosomal protein, a data set having an m/z corresponding to the ribosomal protein is selected.
  • ribosomal proteins When used for self-calibration, ribosomal proteins have at least the following advantages, as described in (Journal of the Mass Spectrometry Society of Japan 55(3): 209-216 (2007), ⁇ URL ttps://www.jstage.jst.go.jp/article/massspec/55/3/55_3_209/_pdf/-char/ja>).
  • Another example of a selection criterion is whether or not there are multiple m/z within a few hundred ppm in the recalled data set. If there are multiple m/z within a few hundred ppm in the recalled data set, selecting peaks corresponding to the multiple m/z is avoided.
  • step S16 all of the m/z contained in the read reference data set may be used for calibration, or the peaks used for calibration may be selected based on the intensity of the mass spectrum. That is, peaks with an intensity equal to or greater than a given value may be used for calibration. In one example, an m/z whose mass spectrum signal intensity exceeds a given threshold is selected as the target for taking the difference. In another example, an m/z whose mass spectrum does not have any other m/z peaks within several hundred ppm is selected as the target for taking the difference. In this case, increasing the number of m/z used for calibration (m/z from which the difference is taken) leads to improved accuracy.
  • the mass spectrum acquired in step S10 may be calibrated for each m/z.
  • step S18 the controller 10 uses the mass spectrum and the data in the library to re-identify the microorganisms contained in the sample.
  • FIG. 4 is a flowchart of the subroutine of step S18. As shown in FIG. 4, in step S180, the controller 10 sets a range of error permitted in the identification (allowable error range).
  • step S182 the controller 10 performs identification of the microorganisms within the allowable error range set in step S180.
  • step S184 the controller 10 identifies the microorganisms contained in the sample by comparing the m/z of one or more peaks in the mass spectrum after calibration in step S16 with data in a library stored in the memory device 12.
  • step S184 the controller 10 determines whether the identification in step S182 has identified the microorganism at the same level as the primary identification in step S12. "At the same level” means “up to the same taxonomic hierarchy (genus, species, etc.)." If the controller 10 determines in step S182 that the microorganism has been identified at the same level as the primary identification (YES in step S184), the controller 10 returns control to FIG. 2. On the other hand, if the controller 10 determines in step S182 that the microorganism has not been identified at the same level as the primary identification (NO in step S184), the controller 10 returns control to step S180. For example, if the microorganism has been identified at the species level in the primary identification but has only been identified to the genus level in step S182, the controller 10 returns control from step S184 to step S180.
  • step S184 the controller 10 may determine whether or not the identification in step S182 has been able to identify the microorganism to a more detailed level than the primary identification in step S12. For example, if the primary identification in step S12 has been able to identify the microorganism to the genus level, in step S184 it may be determined whether or not the identification has been able to be able to the species level.
  • step S180 When control is returned to step S180, the controller 10 sets the allowable error range so that the allowable error is smaller than that set in the previous control of step S180, and performs control from step S182 onwards. As a result, in the process of Figure 4, the identification in step S182 is repeated while the allowable error is gradually reduced (1000 ppm, 500 ppm, 200 ppm, 100 ppm).
  • the re-identification process in step S18 (identification in step S182) and the primary identification process in step S12 are similar except for the mass spectrum used.
  • the mass spectrum used in the primary identification in step S12 is the one before calibration in step S16.
  • the mass spectrum used in the re-identification in step S18 (identification in step S182) is the one after calibration in step S16.
  • step S20 the controller 10 uses the results of the re-identification in step S18 to assign the peaks detected in the mass spectrum to the corresponding proteins.
  • the controller 10 then advances control to both steps S22 and S26.
  • step S22 the controller 10 adds the attribution results from step S20 to the mass spectrum. Then, in step S24, the controller 10 creates a report including the attribution results from step S20 and ends the process of FIG. 2.
  • FIG. 5 is a diagram showing an example of a mass spectrum included in the report.
  • the mass spectrum in Fig. 5 includes waveforms W11 and W12 for two subspecies of microorganisms (Bifidobacterium longum subsp. infantis (hereinafter also referred to as “B. longum subsp. infantis”) (hereinafter also referred to as "Infantis type”) and "B. longum subsp. longum” (hereinafter also referred to as "Longum type”).
  • Waveform W11 corresponds to the Infantis type.
  • Waveform W12 corresponds to the Longum type.
  • Waveforms W11 and W12 have two peaks in common.
  • One of the two peaks corresponds to an m/z value of 9960.4 and is marked with the string "S17.”
  • the other peak corresponds to an m/z value of 10381 and is marked with the string "S19.”
  • Waveform W11 has a peak corresponding to an m/z value of 10356.8, and this peak is annotated with the string "S16(i).” Meanwhile, waveform W12 has a peak corresponding to an m/z value of 10354.9, and this peak is annotated with the string "S16(l).”
  • step S26 the controller 10 exports a list of components (proteins) to which the peaks detected in the mass spectrum in step S20 are assigned, to a data processing unit for the mass spectrum.
  • One example of a data processing unit is an application for creating dendrograms (e.g., "Strain Solution” manufactured by Shimadzu Corporation), and another example is an application for statistical processing (e.g., "eMSTAT Solution” manufactured by Shimadzu Corporation).
  • the program of the application that constitutes the data processing unit is installed in the controller 10.
  • step S28 the controller 10 functions as an application for creating a dendrogram. More specifically, in step S28, the controller 10 imports the data exported in step S26 as a "biomarker list,” creates a dendrogram, and ends the process in FIG. 2.
  • FIG. 6 shows an example of the dendrogram that is created.
  • step S30 the controller 10 functions as an application for statistical processing. More specifically, in step S30, the controller 10 imports the data exported in step S26 as an "Annotation list," performs a biomarker search, and ends the process of FIG. 2.
  • FIG. 7 is a diagram showing an example of a screen showing the results of the biomarker search.
  • Figure 7 shows the number of times each of the 18 proteins was detected in five mass spectrometry analyses for each of the five phylotypes (IA1, IA2, IB, II, III).
  • the columns showing the values of proteins that were detected at least once in the five mass spectrometry analyses are hatched. Based on the results in Figure 7, proteins that can be used as biomarkers for each phylotype can be identified.
  • antitoxin was extracted only from IA1 and in all five mass spectrometry measurements of IA1. This means that antitoxin can be used as a biomarker for AI1.
  • L06_a is extracted only from II and III, and is extracted in all five mass spectrometry measurements for each of II and III.
  • L13_c is extracted only from II, not from III, and is extracted in all five mass spectrometry measurements for II. Therefore, L13_c and L06_a can be used as biomarkers for II and III.
  • the application program that constitutes the data processing unit may be installed in a device that is located external to the controller 10 (analysis device 1).
  • the external device constitutes an example of a data processing unit.
  • the controller 10 exports data to the external device. Furthermore, the controls of steps S28 and S30 are performed in the external device.
  • the controller 10 calibrates the mass spectrum acquired in step S10 in step S16.
  • a reference data set is used in the calibration.
  • the reference data set is an example of a reference list of m/z.
  • the reference dataset is selected to correspond to the sample on which the acquired mass spectrum is based. More specifically, the selected reference dataset is a dataset generated for the classification that is expected to be assigned to the microorganisms contained in the sample. This eliminates the need for the user to select a dataset to be used in self-calibration. Therefore, even users without specialized knowledge can perform self-calibration.
  • a typical conventional self-calibration includes the following steps (a) to (d).
  • steps (b) to (d) in particular were difficult to follow. For this reason, such users identified the type of bacteria only from the results of the primary identification, without performing self-calibration. This resulted in problems such as relatively low identification accuracy.
  • m/z common to a group is prepared as a reference data set. This eliminates the need for the user to determine the peak assignment described above as (b). This is also explained as step S14 (calling the reference data set) and step S22 (adding the assignment result to the mass spectrum) in the process of FIG. 2. And the user is not required to perform the procedures described above as (c) and (d).
  • step (b) Even an expert attempting to determine peak assignment in step (b) may make a mistaken assignment between the Longum type and Infantis type of S16 as shown in Figure 10.
  • the mass spectrum can be calibrated more accurately, thereby avoiding misassignment.
  • the controller 10 selects the dataset of the identified species as the reference dataset in step S144. This dataset is a list of m/z values common to known microorganisms at the species level. On the other hand, if the microorganisms in the sample are identified only to the genus level in the primary identification, the controller 10 selects the dataset of the identified genus as the reference dataset in step S146. This dataset is a list of m/z values common to known microorganisms at the genus level.
  • the reference dataset selected is preferably a dataset corresponding to the lowest level of the classification identified in the primary identification (hereinafter also referred to as the "primary classification").
  • primary classification a dataset corresponding to the identified species (a list of m/z values common to the species) is selected as the reference dataset.
  • the reference data set may be selected according to the expected classification. That is, primary identification does not need to be performed to obtain the primary classification.
  • the user may input the expected classification (primary classification) to the controller 10.
  • the controller 10 may further obtain the "expected classification" (primary classification) input by the user in step S10.
  • the controller 10 may generate a reference dataset according to the primary classification.
  • the controller 10 extracts from the database an m/z list of one or more microorganisms that are classified into the same type as the type, such as a species or genus, included in the primary classification.
  • the controller 10 may then generate a reference dataset by integrating one or more m/z values that are common in the m/z lists of the one or more extracted microorganisms.
  • Each of the extracted "m/z lists of one or more microorganisms" may be a list of theoretical values or a list generated from the results of past mass spectrum measurements (a list of actual measured values).
  • Fig. 8 is a diagram showing an example of a mass spectrum calibrated using an external standard substance.
  • Fig. 8 includes a waveform W21 of "B. longum subsp. infantis" (Infantis type) and a waveform W22 of "B. longum subsp. longum” (Longum type).
  • the mass spectrum in Fig. 8 was acquired by MALDI-TOF MS. As shown in Fig. 8, both the waveform W21 and the waveform W22 have peaks corresponding to three types of ribosomal proteins S16, S17, and S19, respectively.
  • Figure 9 shows an example of the results of identification performed within an acceptable error range of 800 ppm using the mass spectrum shown in Figure 8.
  • Figure 9 shows the results of seven mass spectrum measurements for each of the Infantis type and the Longum type. More specifically, Figure 9 shows the number of times each peak was detected in the seven measurements.
  • Figure 10 shows an example of the results of identification performed with a tolerance of 200 ppm using the mass spectrum shown in Figure 8. As shown in Figure 10, when the tolerance was set to 200 ppm, a Longum type peak of S16 and an Infantis type peak of S16 were identified.
  • the results showed that for the Longum type, the S16 Longum type peak and the S19 peak were detected 7 times each. Meanwhile, for the Infantis type, the results showed that the S16 Longum type peak was detected 7 times and the S19 peak was detected 7 times. For both types, the S17 peak was not detected.
  • Figure 11 shows an example of the results of identification performed within an allowable error range of 200 ppm using a mass spectrum calibrated by self-calibration using the theoretical m/z value as described above.
  • the results showed that for the Longum type, the peaks of S16, S17, and S19 were each detected seven times. On the other hand, the results showed that for the Infantis type, the peaks of S16, S17, and S19 were each detected seven times.
  • a method for generating a reference data set will be described with reference to Fig. 12 and Fig. 13.
  • the process of generating a reference data set may be performed by an information processing device other than the controller 10. That is, the controller 10 may use a reference data set that has been generated in advance by another information processing device.
  • the process of generating the reference dataset may also be performed by the controller 10 executing a given program.
  • the controller 10 executing a given program.
  • the reference dataset (species dataset, genus dataset, etc.) is referred to as a "specific m/z list" generated as part of the m/z DB. More specifically, in the following description, a specific m/z list is generated as a list of proteins predicted to be included in a specific group. If the specific group is a specific "species,” the specific m/z list generated refers to a dataset for the specific "species.” If the specific group is a specific "genus,” the specific m/z list generated refers to a dataset for the specific "genus.”
  • FIG. 12 is a flowchart showing an example of a process for constructing an m/z DB. 12, in ST02, the controller 10 acquires genome data of a microorganism from a public genome DB. At this time, by acquiring genome data from a plurality of public genome DBs, it is possible to comprehensively collect genome data of clinically or industrially important microbial species.
  • the public genome DB is a database that contains genome data of organisms.
  • a genome is the genetic information on the nucleic acids (deoxyribonucleic acid (DNA), ribonucleic acid (RNA)) that an organism possesses, and includes the base sequence of the nucleic acid.
  • genome data mainly refers to DNA sequence data.
  • a public genome DB is typically a DB that contains a large amount of genome data of organisms that is publicly available, such as the genome DBs of NCBI (National Center for Biotechnology Information), DDBJ (DNA Data Bank of Japan), and EMBL (European Molecular Biology Laboratory). Note that examples of public genome DBs are not limited to these and may include genome DBs that are not publicly available.
  • the controller 10 integrates the acquired genome data and constructs a collected genome DB.
  • the controller 10 determines whether the genome data in the collected genome DB meets a predetermined criterion.
  • the criterion is set so that only high-quality genome data meets the criterion. The specific content of the criterion will be described with reference to FIG. 13.
  • the controller 10 constructs a high-quality genome DB that includes genome data that has been determined to meet the criteria.
  • the controller 10 predicts genes contained in the genome data contained in the high-quality genome DB.
  • a gene refers to a specific region on DNA that is translated into a protein, or the information contained in that region.
  • Gene prediction includes, for example, estimating a predicted gene region on the genome data that is translated into a protein, based on the translation start codon (ATG sequence) and stop codon (TGA sequence).
  • the controller 10 predicts the amino acid sequence after translation from the predicted gene. Predicting the amino acid sequence includes, for example, predicting the amino acids corresponding to each codon (three-base sequence) contained in the predicted gene region and linking them together.
  • the controller 10 predicts post-translational modifications for the protein consisting of the predicted amino acid sequence.
  • Post-translational modifications are modifications made to proteins so that the protein immediately after translation changes into a protein that actually functions in various parts of the body.
  • Post-translational modifications include, for example, protein degradation including removal of methionine and removal of signal peptides, and specific chemical modifications including phosphorylation.
  • Post-translational modifications are made to most proteins, changing their m/z. Therefore, by taking into account post-translational modifications, a more accurate m/z of a protein can be calculated.
  • the controller 10 predicts the protein to which the predicted post-translational modification has been added.
  • the controller 10 predicts an m/z list for each genome data based on the protein. Specifically, the m/z corresponding to the protein is calculated based on the masses of the atoms contained in the protein. Note that it is preferable to use the average mass of the element that reflects the isotope distribution of the element in nature as the atomic mass. This allows for a more accurate calculation of the m/z.
  • the controller 10 constructs an overall m/z DB, which is a database of mass-to-charge ratios that includes the m/z list.
  • the overall m/z DB includes all m/z predicted for each genome data.
  • the controller 10 links the annotation to the protein data predicted in ST16.
  • An annotation is generally information about a protein, including the protein's name, function, etc.
  • the annotation is linked, for example, using general software that adds annotations according to m/z, but is not limited to this.
  • the device 100 may create a table showing the relationship between m/z and annotations based on a public genome DB and a public classification DB, and link the annotation using the table.
  • the Public Taxonomy DB is a database that contains data on the classification of organisms (hereafter, "classification data"). Organisms are generally classified based on the affinities between organisms, as indicated by classes such as family, genus, and species. In the classification of microorganisms, classification has traditionally been based on multiple indicators, including morphological observation, phenotypic traits, chemical taxonomic indicators, protein analysis, and DNA analysis, both of which are based on phenotype and genome, but there are also classification systems based only on genome information, and multiple classification systems exist.
  • the public classification DB 80 is typically a DB that contains classification data of organisms that is publicly available, such as the Genome Taxonomy Database (GTDB), the Ribosomal Database Project (RDP), Silva, etc. Examples of public classification DBs are not limited to these, and may include, for example, DBs that are not publicly available.
  • GTDB Genome Taxonomy Database
  • RDP Ribosomal Database Project
  • Examples of public classification DBs are not limited to these, and may include, for example, DBs that are not publicly available.
  • annotation refers to information about a protein, including information about the group the protein is part of.
  • Information about the group of proteins includes at least one of the name, function, and family of the protein.
  • m/z corresponding to proteins included in a specific group can be selected based on the annotation and treated separately from m/z corresponding to other proteins. Therefore, for example, it is possible to selectively weight m/z corresponding to "a group of proteins that are likely to be expressed in the living organism of a microorganism and are likely to be detected as peaks when a mass spectrum is measured" to distinguish microorganisms.
  • samples to be selected by weighting "m/z corresponding to proteins that are likely to be expressed in the living organism and are likely to be detected as peaks when a mass spectrum is measured” compared to "m/z (false peaks) corresponding to proteins that are not actually expressed as proteins in the living organism of a microorganism or that do not appear in the mass spectrum even if expressed" in the m/z list predicted from genome data. Therefore, it is possible to prevent false peaks included in the predicted m/z list from becoming noise and reducing the accuracy of sample selection.
  • the group is selected based on at least one of the following conditions: the expression level is equal to or greater than a predetermined threshold; the function is essential for sustaining life; a predetermined percentage or more of microorganisms in a predetermined type (e.g., microorganisms belonging to a predetermined family) have an amino acid sequence similarity (homology) equal to or greater than a predetermined threshold; the protein is a basic protein; the mass-to-charge ratio can be analyzed within an error range of ⁇ 14 Da (more preferably ⁇ 3 Da) when measured by MALDI-MS; the protein mass is within the range of 4 to 30 kDa (more preferably 2 to 20 kDa); the number of types of proteins included in the group is equal to or greater than a predetermined number; and a predetermined percentage or more of microorganisms in a predetermined type (e.g.
  • the functions essential for maintaining life include at least one of the functions essential for cell maintenance and proliferation.
  • ribosomal proteins An example of a group that is determined based on these conditions is ribosomal proteins.
  • Other examples of groups are chaperones and DNA-binding proteins.
  • the group is not limited to proteins that are significantly expressed in microorganisms in general, as exemplified above, but may be proteins that are known to be significantly expressed in specific microorganisms. For example, by weighting specific proteins that are known to be significantly expressed in each genus and classifying the samples, the likelihood that the samples will be classified into the correct genus can be increased.
  • an example of a "significantly expressed protein" is a protein that shows an expression level above a predetermined threshold.
  • the controller 10 selects proteins predicted to be included in a specific group based on the information about the group included in the annotation. In the following ST26, the controller 10 predicts a specific m/z list that includes only m/z predicted from the selected protein. In ST20C, the controller 10 constructs a specific m/z DB, which is an m/z database that includes the specific m/z list.
  • Another advantage of linking annotations is that it makes it easier for the user to understand which protein each m/z in the m/z list corresponds to. From this perspective, in order to make annotations for m/z easier to use, in ST20B, the controller 10 builds an annotation DB that consolidates annotations for m/z included in the overall m/z DB.
  • annotations can be referred to in the m/z list of the m/z DB where the degree of match (match rate) between the sample list and the m/z pattern is determined to be high.
  • the m/z list contains many m/z corresponding to proteins that are estimated not to be expressed in the microorganism based on the annotation, the reliability of the m/z list itself is in doubt, so the validity of the comparison with the sample list is low in the first place, and the reliability of the sample discrimination is also low.
  • Annotations in the annotation DB are linked to the m/z contained in the m/zDB.
  • the m/zDB and annotation DB are associated so that when referencing an m/z in the m/z list contained in the m/zDB, the corresponding annotation contained in the annotation DB can also be referenced.
  • the annotation DB may be configured as part of the m/zDB, with annotations corresponding to the m/z contained in the m/zDB being added.
  • the controller 10 acquires classification data from the public classification DB 80.
  • the controller 10 constructs a collected classification DB that integrates the collected classification data.
  • the collected classification DB is constructed based on classification data from multiple public classification DBs 80, it is possible to incorporate a wide range of taxonomic systems. Therefore, by using the collected classification DB, it becomes possible to reflect various taxonomic systems in the microorganism discrimination results.
  • the collection classification DB may also include a genome ID, which is an ID for each genome.
  • the genome ID is created, for example, based on the collected classification data.
  • the classification data in the collection classification DB is associated with the data contained in the overall m/z DB, the specific m/z DB, and the annotation DB. Therefore, a genome ID can be added to each piece of genome data in the overall m/z DB and the specific m/z DB.
  • the contents of the collection classification DB can also be used to organize the overall m/z DB and the specific m/z DB, or can be reflected in the contents.
  • the collection classification DB can also be used for other purposes in the device 100, such as when determining the above-mentioned "specific proteins known to be significantly expressed only in specific species.”
  • microorganism DB This four associated DBs are collectively referred to as the microorganism DB.
  • the controller 10 After constructing the microorganism DB in ST20A-20D, the controller 10 temporarily terminates processing. This enables the device 100 to use the microorganism DB to identify samples using mass spectrometry.
  • FIG. 13 is a flowchart showing an example of a process for determining genome data. The process shown in FIG. 13 is performed to remove genome data of low quality contained in the collected genome DB.
  • the controller 10 judges the quality of the genome data based on the completeness of the genome data.
  • the completeness of the genome is measured using, for example, a group of single copy marker genes, each of which is known to exist in one copy in the genome of a microorganism, as an index. If the genome data is complete, all of the single copy marker genes should be present in the sample. However, if the genome data is incomplete, for example, when a part of the genome data is missing or misread, the single copy marker genes contained in the missing part will be lost. Therefore, the larger the part of the genome data that is missing or misread, the fewer the number of single copy marker genes on the genome data. Therefore, the number of single copy marker genes can be used as an index of the completeness of the genome data. Specifically, the presence of all single copy marker genes on the genome data is taken as 100%, and the completeness is calculated as a percentage in proportion to the number of single copy marker genes present.
  • the controller 10 determines whether the completeness of the genome data is greater than a reference value T1.
  • the reference value T1 is, for example, 50%. If the completeness is equal to or less than the reference value T1 (NO in ST060), the controller 10 removes the genome data in ST061. If the completeness is greater than the reference value T1 (YES in ST060), the controller 10 proceeds to ST062.
  • the controller 10 judges the quality of the genome data based on the rate of contamination of the genome.
  • Contamination refers to a phenomenon in which the DNA sequence of one genome data is mixed with the DNA sequence of another genome data for some reason. In other words, contamination typically occurs when the DNA sequences of multiple microorganisms are mixed. If the rate at which single copy marker genes are found is 100% when there is no contamination in the genome data, the rate will be greater than 100% when there is contamination. Therefore, for example, the rate is calculated based on the number of single copy marker genes found, assuming that the rate is 100% when there is no contamination and all single copy marker genes are present on the genome data. When the number of single copy marker genes found corresponds to (100+n)%, the rate of contamination is n%. n is a real number satisfying n>0. If the rate of contamination is high, it is highly likely that the DNA sequences of multiple types of microorganisms are mixed.
  • the controller 10 determines whether the contamination rate is smaller than a reference value T2.
  • the reference value T2 is, for example, 20%. If the contamination rate is equal to or greater than the reference value T2 (NO in ST062), the controller 10 removes the genome data in ST063. If the contamination rate is smaller than the reference value T2 (YES in ST062), the controller 10 proceeds to ST064.
  • the controller 10 judges the quality of the genome data based on the number of contigs.
  • a contig refers to a sequence in which a single DNA sequence is divided into multiple DNA sequences. Therefore, the more contigs there are, the more finely the DNA sequence is divided. If there are too many contigs, the gene region that expresses a protein will also be divided, and it may not be possible to read it accurately.
  • the number of contigs can be determined by counting how many DNA sequences the genome data contains that are divided into.
  • the controller 10 determines whether the number of contigs is smaller than a reference value T3.
  • the reference value T3 is, for example, 1000. If the number of contigs is equal to or greater than the reference value T3 (NO in ST064), the controller 10 removes the genome data in ST065. If the number of contigs is smaller than the reference value T3 (YES in ST064), the controller 10 proceeds to ST066.
  • the controller 10 judges the quality of the genome data based on the number of undetermined bases.
  • An undetermined base refers to a base that could not be determined as either AGCT when the DNA base sequence was deciphered. There is a high possibility that genes will not be found properly from a DNA sequence that contains many undetermined bases.
  • the controller 10 determines whether the number of undetermined bases is smaller than a reference value T4.
  • the reference value T4 is, for example, 100,000. If the number of undetermined bases is equal to or greater than the reference value T4 (NO in ST066), the controller 10 removes the genome data in ST067. If the number of contigs is smaller than the reference value T4 (YES in ST067), the controller 10 proceeds to ST068.
  • the controller 10 judges the quality of the genome data based on whether the number of genes meets a reference value.
  • This reference is for judging whether the number of genes inferred from the genome data is within a reasonable range. For example, if the number of genes inferred from the genome data is abnormally high, it is considered that for some reason, a part that is not actually a gene is inferred as a gene. For example, some reason may occur when decoding a DNA base sequence, and a sequence that is not actually related to the start or end of transcription or translation is decoded as a sequence related to the start or end of transcription or translation.
  • a sequence that does not actually express a protein may be mistaken for a sequence that expresses a protein, and there is a concern that the predicted m/z list may contain many erroneous peaks. If such an m/z list is included in the m/zDB, the quality of the m/zDB will decrease, and the accuracy of sample identification will also decrease.
  • the controller 10 determines whether the number obtained by dividing the number of genes in the genome data by the number of coding bases that code for the genes is smaller than a reference value T5.
  • the bases that code for genes generally refer to bases contained in a region of a DNA sequence that is related to protein expression.
  • the reference value T5 is, for example, 0.00180. If the divided number is equal to or greater than the reference value T5 (NO in ST068), the controller 10 removes the genome data in ST069. If the divided number is smaller than the reference value T5 (YES in ST068), the controller 10 adds the genome data to the high-quality genome DB.
  • the controller 10 performs steps ST060 to ST069 for all genome data contained in the collected genome DB.
  • the calculation methods for each criterion of completeness, contamination rate, number of contigs, number of undetermined bases, and validity of the number of genes are not limited to the above examples.
  • the validity of the number of genes may be determined by whether or not the number of genes included in one piece of genome data is smaller than a predetermined standard value.
  • genome data that does not meet the criteria and was included in the collected genome DB is removed.
  • low-quality genome data included in the public genome DB is removed, and only high-quality data is used to construct the m/zDB. This improves the quality of the m/zDB in the device 100.
  • mass spectra calibrated by self-calibration using theoretical values of m/z are used for identification of microorganisms by mass spectrometry, which improves the accuracy of mass spectrometry and enables highly accurate identification.
  • Self-calibration also utilizes a reference data set that contains peaks (theoretical protein masses) common to a given taxonomic hierarchy.
  • a "species data set” contains theoretical masses of one or more proteins common to a species.
  • a "genus data set” contains theoretical masses of one or more proteins common to a species. This allows the mass spectrum data to be calibrated in a manner that conforms to the taxonomic hierarchy identified in the primary identification. Thus, the mass spectrum data is more accurately calibrated, improving the accuracy of mass spectrometry.
  • the reference data set used in self-calibration is a list of "theoretical values”
  • the m/z contained in the mass spectrum after calibration in self-calibration is more likely to match the m/z contained in the theoretical value list used in identification in re-identification (step S18). This is expected to improve the identification rate in re-identification.
  • microorganisms are identified from the actually measured mass spectrum by the presence of peak groups corresponding to proteins or peptides common to classifications coarser than strains.
  • microorganisms are identified by classifications coarser than strains. Then, the mass shift is corrected so that the actually measured peak groups match the true m/z values of the above-mentioned common proteins or peptides.
  • Self-calibration Specialized knowledge is required for the calibration of mass spectra using theoretical values of m/z.
  • highly accurate information on measured m/z or calculated m/z of proteins of a wide variety of microorganisms is required.
  • high level of specialized knowledge is required.
  • theoretical values of mass (m/z) of proteins constituting microorganisms are prepared as a data set, and the data set is used. This makes it possible for users without specialized knowledge to calibrate mass spectra using theoretical values of m/z.
  • the theoretical value of the mass (m/z) of the protein that constitutes the microorganism is prepared as a data set, and the data set is used. This makes it possible for users without specialized knowledge to calibrate the mass spectrum using the theoretical value of m/z. This makes typing, which requires processing and analyzing the mass spectrum, accessible to more users.
  • An information processing method is an information processing method for identifying microorganisms, and may include the steps of obtaining a mass spectrum by mass spectrometry of a sample containing microorganisms, identifying the type of microorganism assumed to be contained in the sample, obtaining the true m/z of a group of peaks common to the type of microorganism, and calibrating the mass spectrum so that the m/z of a peak corresponding to the true m/z contained in the mass spectrum matches the true m/z.
  • the information processing method described in paragraph 1 provides a technology that allows even users without specialized knowledge to perform self-calibration.
  • the user does not need to obtain the primary classification in advance.
  • the information processing method described in paragraph 1 or 2 may further include a step in which, in the calibration step, the mass spectrum is compared with a reference list of m/z, and as the reference list, a list corresponding to the type of microorganism is selected from two or more lists.
  • the selection of a reference list becomes easy.
  • the step of selecting a list corresponding to the type of microbial organism may include selecting a list corresponding to the lowest taxonomic hierarchy of the type of microbial organism from the two or more lists.
  • a list of m/z values corresponding to microorganisms that are closest to the microorganisms contained in the sample is selected as a list corresponding to the primary classification, thereby allowing the mass spectrum to be more appropriately calibrated by self-calibration.
  • the method may further include a step of generating the reference list using m/z values common to m/z lists of two or more known microorganisms that have a common classification in one or more taxonomic hierarchies as the type of the microorganism.
  • the information processing method described in paragraph 5 does not require a list of candidates for the reference list to be prepared in advance.
  • the two or more known microorganisms may have a common classification with the type of the microorganism at the lowest taxonomic hierarchy of the type of the microorganism.
  • a list of m/z values corresponding to microorganisms that are closest to the microorganisms contained in the sample is generated as a list corresponding to the primary classification, thereby allowing the mass spectrum to be more appropriately calibrated by self-calibration.
  • the information processing method according to any one of items 1 to 6 may further include a step of performing identification of the sample using the mass spectrum after the calibration.
  • the information processing method described in paragraph 7 enables highly accurate identification of samples based on highly accurate mass spectrometry.
  • the information processing method described in paragraph 7 may further include a step of generating a report by adding to the mass spectrum protein components obtained in the identification using the calibrated mass spectrum.
  • the information processing method described in paragraph 8 reduces the burden on the user in creating reports regarding the identification of microorganisms.
  • the information processing method described in paragraph 7 or 8 may further include a step of outputting a list of protein components obtained in the identification using the mass spectrum after the calibration to a data processing unit related to the mass spectrum.
  • the list of protein components obtained in the identification can be effectively utilized.
  • the mass spectrum may be based on the detection result of the sample by a mass spectrometer using a MALDI (matrix assisted laser desorption/ionization) method.
  • MALDI matrix assisted laser desorption/ionization
  • the list of m/z values of proteins may be a list of theoretical values of m/z values of proteins that constitute microorganisms.
  • the information processing method described in paragraph 11 makes it easy to prepare a list of protein m/z values. This is because common m/z values for groups such as species or genus can be more easily extracted from theoretical protein m/z values than from actual measured values.
  • An information processing device may include one or more processors and a storage device storing a program that, when executed by the one or more processors, causes the one or more processors to implement the information processing method described in any one of paragraphs 1 to 10.
  • the mass spectrum of a sample is calibrated by referring to a reference database that contains theoretical values of the masses of proteins that constitute one or more microorganisms. This improves the accuracy of mass analysis of microorganisms when classifying the microorganisms using the results of mass analysis of the microorganisms.
  • a computer program according to one aspect may be executed by one or more processors to cause the one or more processors to implement the information processing method according to any one of paragraphs 1 to 10.
  • the mass spectrum of a sample is calibrated by referring to a reference database that contains theoretical values of the masses of proteins that constitute one or more microorganisms. This improves the accuracy of mass spectrometry of microorganisms when classifying the microorganisms using the results of the mass spectrometry of the microorganisms.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Plasma & Fusion (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

L'invention concerne un procédé de traitement d'informations qui comprend une étape (S10) consistant à acquérir un spectre de masse par la réalisation d'une spectrométrie de masse par rapport à un échantillon contenant des micro-organismes. Le procédé de traitement d'informations comprend également : une étape (S12) consistant à identifier les types de micro-organismes supposés être compris dans l'échantillon ; une étape (S14) consistant à acquérir un m/z vrai d'un groupe de pics commun aux types ; et une étape (S16) consistant à étalonner le spectre de masse de telle sorte que le m/z d'un pic correspondant au m/z vrai compris dans le spectre de masse coïncide avec le m/z vrai.
PCT/JP2024/010463 2023-03-22 2024-03-18 Procédé de traitement d'informations, dispositif de traitement d'informations, et programme informatique Pending WO2024195760A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2025508558A JPWO2024195760A1 (fr) 2023-03-22 2024-03-18

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023-045357 2023-03-22
JP2023045357 2023-03-22

Publications (1)

Publication Number Publication Date
WO2024195760A1 true WO2024195760A1 (fr) 2024-09-26

Family

ID=92841684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/010463 Pending WO2024195760A1 (fr) 2023-03-22 2024-03-18 Procédé de traitement d'informations, dispositif de traitement d'informations, et programme informatique

Country Status (2)

Country Link
JP (1) JPWO2024195760A1 (fr)
WO (1) WO2024195760A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006177953A (ja) * 2004-12-20 2006-07-06 Palo Alto Research Center Inc 堅牢な統計的方法を使用する質量スペクトルの自己較正
JP2018513382A (ja) * 2015-04-24 2018-05-24 ビオメリューBiomerieux 質量分析によって不明微生物亜群を一組の参照亜群の中から同定する方法
JP2019090654A (ja) * 2017-11-13 2019-06-13 株式会社島津製作所 較正方法、微生物の識別方法、質量分析装置、質量分析装置の較正用試薬、プログラム、および質量分析装置の較正用形質転換体
JP2021189111A (ja) * 2020-06-03 2021-12-13 株式会社島津製作所 リボソームタンパク質の判別方法、生物種の同定方法、質量分析装置およびプログラム
JP2022519884A (ja) * 2019-03-01 2022-03-25 マイクロマス ユーケー リミテッド 高分解能質量スペクトルの自己較正
JP2023014553A (ja) * 2021-07-19 2023-01-31 株式会社島津製作所 微生物の識別方法及び微生物識別システム
JP2023022721A (ja) * 2021-08-03 2023-02-15 株式会社島津製作所 質量分析装置及び質量較正方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006177953A (ja) * 2004-12-20 2006-07-06 Palo Alto Research Center Inc 堅牢な統計的方法を使用する質量スペクトルの自己較正
JP2018513382A (ja) * 2015-04-24 2018-05-24 ビオメリューBiomerieux 質量分析によって不明微生物亜群を一組の参照亜群の中から同定する方法
JP2019090654A (ja) * 2017-11-13 2019-06-13 株式会社島津製作所 較正方法、微生物の識別方法、質量分析装置、質量分析装置の較正用試薬、プログラム、および質量分析装置の較正用形質転換体
JP2022519884A (ja) * 2019-03-01 2022-03-25 マイクロマス ユーケー リミテッド 高分解能質量スペクトルの自己較正
JP2021189111A (ja) * 2020-06-03 2021-12-13 株式会社島津製作所 リボソームタンパク質の判別方法、生物種の同定方法、質量分析装置およびプログラム
JP2023014553A (ja) * 2021-07-19 2023-01-31 株式会社島津製作所 微生物の識別方法及び微生物識別システム
JP2023022721A (ja) * 2021-08-03 2023-02-15 株式会社島津製作所 質量分析装置及び質量較正方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TERAMOTO, KANAE: "Classification and Identification of Bacteria by Matrix-Assisted Laser Desorption/Ionization-Mass Spectrometry Using Ribosomal Protein as Biomarkers", J. MASS SPECTROM. SOC. JPN., vol. 59, no. 5, 2011, pages 85 - 94, XP055548534, DOI: 10.5702/massspec.11-24 *
TERAMOTO, KANAE: "Rapid Classification and Identification of Bacteria by Matrix-Assisted Laser Desorption/Ionization-Mass Spectrometry Using Ribosomal Proteins as Biomarkers", J. MASS SPECTROM. SOC. JPN., vol. 55, no. 3, 2007, pages 209 - 216, XP055427241 *

Also Published As

Publication number Publication date
JPWO2024195760A1 (fr) 2024-09-26

Similar Documents

Publication Publication Date Title
Nesvizhskii Protein identification by tandem mass spectrometry and sequence database searching
Rosenberger et al. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses
McHugh et al. Computational methods for protein identification from mass spectrometry data
Roth et al. Measuring codon usage bias
EP3438275B1 (fr) Procédé d'identification de micro-organismes
WO2004093644A2 (fr) Procedes et appareil pour l'evaluation genetique
JP5750676B2 (ja) 細胞識別装置及びプログラム
JP6709434B2 (ja) 微生物の識別方法
US20170108509A1 (en) Method For Using Protein Databases To Identify Microorganisms
Wadie et al. METASPACE-ML: context-specific metabolite annotation for imaging mass spectrometry using machine learning
Zuo et al. MS2Planner: improved fragmentation spectra coverage in untargeted mass spectrometry by iterative optimized data acquisition
JPWO2017168741A1 (ja) 微生物の識別方法
JP4058449B2 (ja) 質量分析方法および質量分析装置
Wadie et al. METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning
EP3030997B1 (fr) Procédé de déconvolution d'informations moléculaires mixtes dans un échantillon complexe afin d'identifier des organismes
Ahrné et al. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates
WO2024195760A1 (fr) Procédé de traitement d'informations, dispositif de traitement d'informations, et programme informatique
US20040044481A1 (en) Method for protein identification using mass spectrometry data
JP5610347B2 (ja) リボ核酸同定装置、リボ核酸同定方法、プログラムおよびリボ核酸同定システム
EP1481414A1 (fr) Procede d'identification de proteines au moyen de donnees de spectrometrie de masse
Ji et al. Deep learning enable untargeted metabolite extraction from high throughput coverage data-independent acquisition
US20250273299A1 (en) Microorganism identification method and microorganism identification device
Song et al. Beta-DIA: Integrating learning-based and function-based feature scores to optimize the proteome profiling of single-shot diaPASEF mass spectrometry data
Fröhlich et al. In-Depth Benchmarking of DIA-type Proteomics Data Analysis Strategies Using a Large-Scale Benchmark Dataset Comprising Inter-Patient Heterogeneity
Feng Improving quantification and identification in metabolomics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24774896

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025508558

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025508558

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE