WO2025080809A1 - Classification d'une maladie à l'aide d'images de fragment - Google Patents
Classification d'une maladie à l'aide d'images de fragment Download PDFInfo
- Publication number
- WO2025080809A1 WO2025080809A1 PCT/US2024/050739 US2024050739W WO2025080809A1 WO 2025080809 A1 WO2025080809 A1 WO 2025080809A1 US 2024050739 W US2024050739 W US 2024050739W WO 2025080809 A1 WO2025080809 A1 WO 2025080809A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cancer
- fragment
- subject
- dna
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
Definitions
- FIG.1 illustrates an example environment for predicting cancer types by converting fragmentomic information into images.
- FIG.6 illustrates an example report summarizing predicted categories of a cancer of a subject.
- FIG.7 illustrates an example process for predicting a type of cancer cell by converting fragmentomic data into an image.
- FIG.8 illustrates an example process for predicting a type of a cancer cell present in the body of a subject using a CNN.
- FIG.9 illustrates an example environment for sequencing various nucleic acid molecules.
- FIG.10 illustrates one or more devices configured to perform various operations described herein.
- a first channel may represent the presence of a base or base pair in a given DNA fragment at a given genomic position
- a second channel may represent the presence of a CG dinucleotide in the given DNA fragment at the given genomic position
- a third channel may represent the presence of a methylated cytosine in the given DNA fragment at the given genomic position
- deoxyribonucleic acid may refer to a polymer of nucleotides (also referred to as “nucleobases”) containing deoxyribose.
- the nucleotides in DNA include cytosine (C), guanine (G), adenine (A), and thymine (T).
- Each DNA nucleotide includes a deoxyribose and a phosphate group.
- a nucleotide is a monomer of DNA or RNA.
- a nucleotide for instance, is a chemical structure.
- the terms “3’ end,” “3-prime end,” and their equivalents, may refer to a terminus of a single- stranded nucleotide polymer that includes a base whose third carbon in its deoxyribose or ribose is bound to a hydroxyl group while being unbound to another base.
- the term “intron,” and its equivalents, may refer to a subset of DNA nucleotides in a gene that is not used to code for any functional RNA that is expressed by the organism.
- the term “exon,” and its equivalents may refer to a subset of DNA nucleotides in a gene that is used to code for a functional RNA.
- an exon may encode a polypeptide or protein that is expressed by the organism.
- a gene can be represented in data (e.g., as data representative of the sequence of DNA nucleotides in the gene) or as a chemical structure (e.g., as the sequence of DNA nucleotides itself).
- the term “insertion,” and its equivalents, can refer to a nucleotide in a subject sequence that is added with respect to a reference sequence.
- the term “deletion,” and its equivalents can refer to the removal of a nucleotide from a nucleotide sequence.
- the terms “copy number alternation,” “CNA,” “copy number variation,” “CNV,” and their equivalents can refer to a portion of a reference sequence that is repeated.
- the terms “rearrangement of fusion,” “fusion rearrangement,” “translocation,” and their equivalents can refer to a change in the relative position of one or more portions of a reference sequence, thereby generating a gene that was not present in the reference sequence.
- the term “sequencing,” and its equivalents may refer to a process of identifying the order and identity of monomers in a polymer chain, such as the order and identity of nucleotides in a DNA or RNA molecule.
- the terms “whole genome sequencing,” “WGS,” and their equivalents may refer to the process of sequencing an entire genome of a subject, including the introns and exons of the genes of the subject.
- whole exome sequencing may refer to the process of sequencing all exomes of a subject.
- targeted sequencing may refer to the process of sequencing a portion of the genome of a subject, such as sequencing a single gene of the subject.
- Various techniques can be utilized to sequence a DNA or RNA molecule, such as massively parallel sequencing (MPS), nanopore sequencing, direct sequencing, Sanger sequencing, or next- generation sequencing. In various cases, sequencing is performed on physical molecules (e.g., RNA or DNA) and is used to generate data.
- the term “bait molecule,” and its equivalents, may refer to a nucleic acid molecule having a portion that is complementary to a portion of a target molecule (e.g., cfDNA).
- a bait molecule includes, for instance, a nucleic acid molecule that can hybridize to (i.e., is complementary to) a target molecule can be used to capture the target molecule.
- the bait molecule is a capture oligonucleotide (or capture probe).
- the bait molecule is suitable for solution phase hybridization to the target molecule.
- the bait molecule is suitable for solid phase hybridization to the target molecule.
- the double-stranded nature of the nucleic acid molecule is maintained under stringent hybridization conditions.
- stringent hybridization conditions include an overnight incubation at 42 °C in a solution including 50% formamide, 5XSSC (750 mM NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5XDenhardt's solution, 10% dextran sulfate, and 20 ⁇ g/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 0.1XSSC at 50 °C.
- 5XSSC 750 mM NaCl, 75 mM trisodium citrate
- 50 mM sodium phosphate pH 7.6
- 5XDenhardt's solution 10% dextran sulfate
- the term “complementary,” and its equivalents, may refer to a state of two single-stranded nucleic acid molecules with respective sequences that cause the nucleic acid molecules to spontaneously hybridize to one another.
- One nucleic acid molecule for instance, may have a sequence that causes each nucleic acid to hydrogen bond to a respective nucleic acid in the other nucleic acid molecule.
- the terms “therapy,” “treatment,” and their equivalents may refer to a composition or process that can be used to remediate a health problem.
- Cancer therapies for instance, include surgery, radiotherapy, chemotherapy, immunotherapy, cell-based therapies, and the like.
- genomic position may refer to a location of one or more DNA base pairs relative to a reference genome.
- a genomic position is defined according to a locus.
- the genomic position may refer to a chromosome on which the base pair(s) are located, an arm (e.g., a p arm or a q arm) of the chromosome on which the base pair(s) are located, a distance (e.g., in terms of bases or megabases) from a centromere of the chromosome, a distance from a telomere on the p arm of the chromosome, a distance from a telomere on the q arm of the chromosome, a distance from an end of the p arm of the chromosome, a distance from an end of the q arm of the chromosome, or any combination thereof.
- the cfDNA includes fragments that are about 170 bases long and/or fragments that are about 340 bases long.
- the cfDNA includes fragments that are 100 to 240 bases long and/or fragments that are 270 to 410 bases long.
- the features of the ctDNA 108 are indicative of the expression of the cancer cells within the lesion 104. That is, the features of the ctDNA 108 may be indicative of one or more genes that are expressed by the cancer cells. [0089]
- the sample 106 is transported to a location that is remote from the subject 102 for further processing.
- Non-ctDNA sequencing is considered optional, for example, when pre- analytical means to enrich for the ctDNA component of cfDNA are used to generate the sequencing library (e.g., oversampling of shorter cfDNA fragments for inclusion in the sequencing library).
- the sequencer 112 includes one or more devices that are configured to generate the sequence read data 114 by processing at least a portion of the sample 106.
- the cfDNA including the ctDNA 108 and the non-ctDNA 110 is extracted from the sample 106. The extraction can be performed by the sequencer 112, by another device, manually (e.g., by a laboratory technician), or any combination thereof. Any appropriate extraction method known to those of ordinary skill in the art can be utilized.
- the cfDNA (e.g., the ligated cfDNA) may be amplified by generating multiple copies of the cfDNA using one or more techniques such as polymerase chain reaction (PCR), a non-PCR amplification technique, or an isothermal amplification technique.
- the sequencer 112 and any additional downstream data analysis may identify the length, position, and identity of the bases in the cfDNA by sequencing the cfDNA (e.g., the amplified and/or ligated cfDNA).
- the sequencer 112 utilizes first-generation sequencing (e.g., Sanger sequencing), second-generation sequencing (e.g., massive parallel sequencing), third-generation sequencing (e.g., nanopore sequencing), or a combination thereof.
- the electrical signal over time is indicative of the sequences of the cfDNA in the sample 106.
- the sequencer 112 in various implementations, is configured to generate the sequence read data 114 as digital data based on the analog signals detected by the sensor(s). For instance, the sequencer 112 includes one or more analog to digital converters (ADCs). In various cases, the sequencer 112 includes at least one processor configured to generate the sequence read data 114. [0094] In some examples, the sequencer 112 performs methylation sequencing.
- indications of the position, order, and amount of the methylated cytosines is further indicated in the sequence read data 114.
- This information may be referred to as “methylation data.”
- DNA methylation is an epigenetic factor that can impact genetic expression. For instance, higher methylation at promoters can result in lowered expression of associated genes. In some cases, methylation is indicative of cancer cell type. [0096] For instance, methylation data may be used to classify types of colorectal cancer (CRC). In various cases, global hypomethylation, which can lead to chromosomal instability (CIN), is an indicator of CFC type.
- CRC colorectal cancer
- CIN chromosomal instability
- CpG island methylator phenotype may extensively display multiple cancer specific genes promoter DNA hypermethylation at some specific set of CpG islands, which is associated with CRC type.
- sequences representing the ctDNA 108 and sequences representing the non- ctDNA 110 in the sequence read data 114 are differentiated from one another. For instance, the sequences representing the non-ctDNA 110 may be removed from the sequence read data 114. In some examples, the sequencer 112 and/or another computing device removes the sequences representing the non-ctDNA 110 from the sequence read data 114.
- the channel of the example pixel may have a second value (e.g., a maximum value, 255, 128, 64, or the like) if the example fragment has a base at the example genomic position.
- the second value e.g., the maximum value
- the second value is less than a maximum, saturated value of the individual channel. For instance, it has been observed that limiting a maximum value of a red channel of an RGB image to 128 or 64, rather than 255, can improve visualization and performance.
- a channel of the example pixel has a first value (e.g., a null value, zero, or the like) if the example fragment lacks a methylated CG base pair at the example genomic position or has a second value (e.g., a maximum value, 255, 128, 64, or the like) if the example fragment has a methylated CG base pair at the example genomic position.
- the example pixel has one of at least one fourth color if the example fragment has a methylated CG base pair at the example position.
- the image generator 116 creates the fragment image 118 based, at least in part, on comparing the sequence read data 114 to one or more reference sequences.
- the image generator 116 arranges the rows according to a start position, a middle position, or an end position of the relevant fragments. For instance, a first row of the fragment image 118 corresponds to a first fragment in the sample 106 and a second row of the fragment image 118 corresponds to a second fragment in the sample 106.
- the first base of the first fragment corresponds to a higher position in a reference genome than the first base of the second fragment. That is, the start position of the first fragment is higher than the start position of the second fragment.
- a first genomic position on a chromosome is “higher” or “earlier” than a second genomic position on a chromosome if the first genomic position is located closer to a telomere of a p arm of the chromosome than the second genomic position.
- the final base of the first fragment corresponds to an higher position in the reference genome than the final base of the second fragment.
- the end position of the first fragment is higher than the end position of the second fragment.
- a base at a center of the first fragment is higher than a base at a center of the second fragment. If the first fragment has an even number of bases, the base at the center may be either one of bases in the middle of the first fragment.
- rows are duplicated, such that multiple rows of the fragment image 118 reflect the same fragment sequence.
- rows are interpolated, such that the fragment image 118 includes a row that is a combination of two other rows. Rows can be interpolated in at least one fashion, such as by combining multiple rows.
- the image generator 116 adds to the number of rows in the fragment image 118 in order to achieve a predetermined pixel dimension (e.g., a predetermined number of rows and columns) of the fragment image 118.
- a predetermined pixel dimension e.g., a predetermined number of rows and columns
- the fragment image 118 may represent fragments obtained from multiple samples obtained at different times.
- the changes in cfDNA states may be relevant for predicting therapy responses and/or may be early indications of relapse.
- the fragment image 118 can be an n-dimensional image, wherein n is a positive integer that is greater than one.
- the fragment image 118 is defined in a radial space. [0112] Further, although a single fragment image 118 has been described with reference to FIG.1, implementations are not so limited. According to some implementations, multiple fragment images including the fragment image 118 are generated based on multiple regions of the genome of the subject 102.
- the predictive model 120 further analyzes additional biomarker data in order to generate the disease indicator(s) 122.
- the predictive model 120 may receive input data including the fragment image 118 as well as data indicating at least one of a genomic alteration, a mutational signature, an MSI status, a TMB, or a viral status of the subject 102 and/or lesion 104.
- the additional biomarker data may be generated based on the sample 106, medical images, or other samples obtained from the subject 102.
- the predictive model 120 includes at least one trained ML model configured to output the disease indicators 122 in response to receiving the fragment image 118 in input data.
- FMI Docket No.0032-P L&H Docket No.: F171-0008PCT parameters of the ML model(s) may have been previously optimized based on training data including the fragment image of regions in genomes of individuals within a population omitting the subject 102.
- the ML model(s) was trained using an unsupervised or semi-supervised learning technique, wherein the parameters were optimized to categorize (e.g., cluster) fragment images of the population.
- the predictive model 120 includes a CNN.
- the CNN may have multiple hidden layers. When the CNN receives an input (e.g., an input image), the layers may respectively perform operations to transform the input into an output (e.g., an output image).
- An individual hidden layer may be composed of neurons.
- CNNs include, for instance, residual neural networks (e.g., ResNet, see He et al., arXiv:1512.03385v1, 2015), U-Net (see, e.g., Ronneberger, et al., arXiv:1505.04597v1, 2015), and the like.
- the kernel and/or receptive field of at least one CNN hidden layer have various shapes.
- the kernel and/or receptive field are defined as 2D squares (e.g., 2x2 pixels, 3x3 pixels, 4x4 pixels, 5x5 pixels, or the like).
- the kernel and/or receptive field are defined as 2D rectangles.
- Visual characteristics of the fragment image 118 are correlated to cancer type. For instance, certain methylation patterns, variants, fragment lengths, and other characteristics apparent in the fragment image 118 are associated with the presence of cancer and/or cancer type. In various cases, these characteristics are subtle and would be impossible for a clinician (e.g., the care provider 105) to identify by manually reviewing the fragment image 118.
- the predictive model 120 may identify cancer-relevant characteristics of the fragment image 118, such that the predictive model 120 can accurately categorize a potential cancer of the subject 102. For instance, if the lesion 104 is a tumor, the predictive model 120 may indicate the type of cancer cells in the lesion 104.
- the primary site may be an organ or anatomical site in which the first tumor developed within the body of the subject 102.
- the primary site may include an adrenal gland, a bladder, blood, a bone, brain, a breast, a cervix, a colon, a rectum, an ear, a nose, a throat, endometrial tissue, an esophagus, a gastrointestinal tract, head, neck, intestine, a kidney, a larynx, bone marrow, liver, a lymph node, a lung, a nasopharynx, a mouth, an ovary, pancreas, pharynx, prostate, rectum, skin, stomach, testicle, thyroid, uterus, vasculature, or the like.
- the predictive model 120 may determine, based on the fragment image 118, whether the cancer cells are positive for a particular receptor (e.g., HER2), negative for the particular receptor, or a mixture of the two (e.g., 40% positive for the particular receptor).
- the disease indicator(s) 122 for instance, indicate one or more cancer subtypes of the subject 102.
- the predictive model 120 identifies at least one type of cancer of the subject 102.
- the disease indicator(s) 122 may include at least one likelihood (e.g., probability) that the subject 102 has at least one predetermined type of cancer, an indication that the likelihood that the subject 102 has a predetermined type of cancer exceeds a threshold (e.g., 90% certainty, 95% certainty, 99% certainty, or the like), or some other indication of the determined cancer type.
- the predictive model 120 is a binary classifier, such that the disease indicator(s) 122 include a Boolean (e.g., true or false) indication of a predetermine cancer type.
- the disease indicator(s) 122 includes predictions of whether the subject 102 has one or more types of cancers.
- the report 126 may indicate the results of additional analyses, such as the results of a histological study, whole transcriptome sequencing, cell-free RNA (cfRNA) sequencing, whole exome sequencing, whole genome sequencing, a cancer (e.g., DNA) hotspot panel test, a DNA methylation test, a tumor mutational burden (TMB) test, a DNA fragmentation test, an RNA fragmentation test, a microsatellite instability (MSI) test, a tumor mutational burden (TMB) test, or a viral status test.
- TMB tumor mutational burden
- MSI microsatellite instability
- TMB tumor mutational burden
- the report 126 may include a genomic profile of the subject 102 based on various combinations of the above analyses and tests.
- the care provider 105 may review the report 126 by interacting with the clinical device 128.
- the report 126 may enhance the clinical decision-making of the care provider 105.
- the care provider 105 may prepare and/or administer a therapy (e.g., a therapeutic agent, such as an anticancer agent) to the subject FMI Docket No.0032-P L&H Docket No.: F171-0008PCT 102 based on the report 126.
- a therapy e.g., a therapeutic agent, such as an anticancer agent
- the care provider 105 may initiate the therapy and/or refer the subject 102 to another care provider to receive the therapy.
- the sequence of the ctDNA 202 is indicative of the expression of mRNA and other functional RNA in the cancer cell 204.
- the expression of the cancer cell 204 can be determined without performing RNA sequencing (e.g., whole transcriptome sequencing), in some cases.
- RNA sequencing e.g., whole transcriptome sequencing
- the methylation status of various regions within the ctDNA 202 are also indicative of the expression of the cancer cell 204.
- the example fragment images 412 are obtained based on DNA fragments (e.g., cfDNA) of individuals within a population 416.
- the fragment images 412 are indicative of one or more regions of interest within the DNA fragments.
- the example disease indicators 414 may indicate diseases (e.g., cancer type or subtype) of the individuals in the population 416, a progression or state of the diseases of the individuals in the population 416, the effectiveness of one or more therapies in treating the diseases of the individuals in the population 416, or any combination thereof of the individuals within the population 416.
- the example disease indicators 414 may be generated based on samples obtained from the individual that are not limited to DNA fragments.
- the encoder is configured to extract relevant features from the input image.
- a decoder includes one or more hidden layers configured to increase one or more dimensions of an input image (e.g., an output image of an encoder).
- a decoder can be configured to build a segmentation mask based on features extracted by an encoder.
- U-Net is an example of a CNN architecture that includes an encoder and a decoder.
- the ML model(s) 404 include a nearest-neighbor model.
- One example of a nearest- neighbor model includes a k-nearest neighbor model.
- a nearest-neighbor model defines various “neighbors,” which are points within a feature space, with associated class labels.
- predictive attributes include, for instance, aberrant hypomethylation patterns in fragments that are otherwise mapping FMI Docket No.0032-P L&H Docket No.: F171-0008PCT to a region that is ordinarily hypermethylated in subjects without the disease.
- predictive attributes correspond to combinations of indications, such as a combination of methylation states, fragment lengths, and fragment endpoint locations.
- the report 600 is the report 126 described above with reference to FIG.1.
- the report 600 may be displayed to a patient and/or care provider.
- the report 600 is generated based on a fragment image of one or more regions in a sample (e.g., a liquid biopsy sample) obtained from the subject.
- the report 600 includes a disease indicator 602 that indicates a classification of at least one disease (e.g., cancer).
- the disease indicator 602 indicates whether cancer cells of the subject are within one or more predetermined types of cancer.
- the report 600 includes one or more therapy indicators 608.
- the therapy indicator(s) 608 convey whether the cancer is predicted to be resistant to one or more predetermined therapies and/or whether the cancer is predicted to be responsive to one or more predetermined therapies.
- the therapy indicator(s) 608 include a suggested treatment decision, which may specify at least one therapy in which the disease is predicted to be responsive to.
- the therapy indicator(s) 608 convey a specific dosage of one or more predetermined therapies that is predicted to treat the disease.
- FMI Docket No.0032-P L&H Docket No.: F171-0008PCT [0176]
- the report 600 includes one or more prognostic indicators 610.
- the prognostic indicator(s) 610 for instance, indicate a prognosis of the subject.
- the prognostic indicator(s) 610 may indicate a survivability, a recoverability, a quality-of-life indicator, or other information indicative of the prognosis of the subject.
- the report 600 may include a trial qualification 612 of the subject.
- the trial qualification 612 indicates whether the subject is predicted to be eligible or otherwise qualify for a predetermined clinical trial.
- the report 600 includes a metastasis profile 614 of the subject.
- the metastasis profile 614 indicates a likelihood (e.g., probability) that the cancer will metastasize (e.g., at a particular point in time), one or more tissues in which the cancer is predicted to metastasize, or the like.
- the DNA fragments may be amplified using PCR, a non-PCR amplification technique, or an isothermal amplification technique.
- the DNA fragments can be sequenced via a sequencer.
- FMI Docket No.0032-P L&H Docket No.: F171-0008PCT [0184]
- the DNA fragments are sequenced using NGS, MPS, WGS, whole exome sequencing, targeted sequencing, direct sequencing, Sanger sequencing, or any combination thereof.
- a sequencer may identify the sequence of bases in the DNA fragments by detecting signals using the amplified DNA fragments.
- FIG.8 illustrates an example process 800 for predicting a type of a cancer cell present in the body of a subject using a CNN.
- the process 800 is performed by an entity including at least one processor, at least one computing device (e.g., one or more server computers), a sequencer (e.g., the sequencer 112), an image generator (e.g., the image generator 116), a predictive model (e.g., the predictive model 120), a report generator (e.g., the report generator 124), a clinical device (e.g., the clinical device 128), or any combination thereof.
- a sequencer may identify the sequence of bases in the nucleic acid molecules by detecting signals using the amplified nucleic acid molecules.
- the detection signals include optical signals (e.g., such as in the case of sequencing-by-synthesis techniques) and/or electrical signals (e.g., such as in the case of nanopore sequencing techniques).
- the presence of methylated cytosines in the nucleic acid molecules are also determined.
- the entity may perform methylation sequencing on the nucleic acid molecules.
- the entity obtains sequence read data indicating the type, order, and length of bases in the nucleic acid molecules and/or methylation data indicating the location of methylated cytosines in the nucleic acid molecules.
- one or more channels indicate whether variants or predetermined sequences (e.g., promoters, CpG islands, or the like) are present at various genomic positions.
- the entity predicts, by inputting the image into a CNN, a type of cancer cell present in the body of the subject.
- the CNN may be defined by one or more hidden layers.
- the hidden layers are respectively defined by kernels.
- the CNN includes at least one kernel defined by a rectangular set of pixels, such that a height of the kernel (in pixels) is different than a length of the kernel (in pixels).
- the values defining the pixels of the kernel may be optimized based on training data.
- the nucleic acid molecules 902 include DNA that is complementary to RNA present in the sample.
- the nucleic acid molecules 902, in various cases, are ligated with adapters 904.
- the adapters 904 are hybridized to the nucleic acid molecules 902.
- the adapters 904, for example, include additional nucleic acid molecules.
- the adapters 904 have a shorter length than the nucleic acid molecules 902 being sequenced.
- the adapters 904 include amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences.
- FIG.9 illustrates adapters 904 being ligated to one end of each of the nucleic acid molecules 902, implementations are not so limited.
- FIG.10 illustrates one or more devices 1000 configured to perform various operations described herein.
- the device(s) 1000 include one or more processor(s) 1002.
- the processor(s) 1002 includes a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GPU, or other processing unit or component known in the art.
- the processor(s) 1002 is operably connected to memory 1004.
- the memory 1004 is volatile (such as random access memory (RAM)), non-volatile (such as read only memory (ROM), flash memory, etc.) or some combination of the two.
- the memory 1004 stores instructions that, when executed by the processor(s) 1002, causes the processor(s) 1002 to perform various operations.
- the transceiver(s) 1010 includes any sort of wireless transceivers capable of engaging in wireless communication (e.g., radio frequency (RF) communication).
- the communication network(s) 1012 includes one or more wireless networks that include a 3rd Generation Partnership Project (3GPP) network, such as a Long Term Evolution (LTE) radio access network (RAN) (e.g., over one or more LTE bands), a New Radio (NR) RAN (e.g., over one or more NR FMI Docket No.0032-P L&H Docket No.: F171-0008PCT bands), or a combination thereof.
- 3GPP 3rd Generation Partnership Project
- LTE Long Term Evolution
- NR New Radio
- the transceiver(s) 1010 includes other wireless modems, such as a modem for engaging in WI-FI®, WIGIG®, WIMAX®, BLUETOOTH®, or infrared communication over the communication network(s) 1012.
- the device(s) 1000 may further include the sequencer 112.
- the sequencer 112 includes one or more fluidic circuits 1014 configured to receive a sample 1016 derived from a subject 1018.
- the sequencer 112 in various cases, may be configured to generate data indicative of one or more sequences of nucleic acid molecules (e.g., DNA and/or RNA) present in the sample 1016.
- FIGS.11A to 12D were used as training images for the CNN.
- the example CNN has two hidden layers and one flatten layer with 128 nodes.
- a final layer in the CNN utilizes a sigmoid activation function.
- FIG.13 (top) illustrates a loss of the CNN over a number of training images used to optimize parameters of the CNN.
- FIG.13 (bottom) illustrates an accuracy of the CNN over the number of training images used to optimize the parameters of the CNN.
- a method including: providing a plurality of nucleic acid molecules obtained from a sample from a subject; extracting, from the sample, a plurality of cell-free DNA (cfDNA) molecules; ligating one or more adapters onto one or more of the cfDNA molecules; amplifying the one or more ligated cfDNA molecules; capturing all or a subset of the amplified cfDNA molecules; sequencing, by a sequencer, all or a subset of the captured cfDNA molecules to obtain a plurality of sequence reads that represent the captured cfDNA molecules; receiving, at one or more processors, sequence read data for the plurality of sequence reads; generating, using the one or more processors, a fragment image indicating at least a portion of the cfDNA corresponding to a region of a genome, wherein an example row of the fragment image corresponds to a fragment among the cfDNA represented in the sequence read data, a first channel of an example pixel of the example row indicating a presence of a
- generating, using the amplified ligated DNA molecules, the detection signals includes simultaneously: synthesizing, by a polymerase using fluorescently tagged nucleotide triphosphates (NTPs), a synthesized nucleic acid molecule based on one of the amplified ligated DNA molecules, and wherein detecting, by the at least one sensor, the detection signals includes: detecting, by at least one optical sensor, optical signals emitted by the fluorescently tagged NTPs upon binding to the synthesized nucleic acid molecule, the optical signals being indicative of at least one sequence of the DNA.
- the training data further includes labels indicating whether the example samples are obtained from at least one individual having cancer
- training the at least one ML model includes identifying, using supervised ML based on pairs of the labels and corresponding instances of the example fragment images, predictive attributes of the example fragment images that are indicative of the labels.
- generating the prediction of whether the cancer has cancer includes: identify instances of the predictive attributes associated with the fragment image; and generate the prediction of whether the subject has cancer based on the instances of the predictive attributes. 60.
- the report includes an instruction to perform a follow-up test on the subject.
- the follow-up test includes at least one of: a histological study; whole transcriptome sequencing; cfRNA sequencing; whole exome sequencing; whole genome sequencing; a cancer hotspot panel test; a DNA methylation test; a DNA fragmentation test; an RNA fragmentation test; a microsatellite instability (MSI) test; a tumor mutational burden (TMB) test; or a viral status test.
- MSI microsatellite instability
- TMB tumor mutational burden
- the follow-up test includes at least one of whole transcriptome sequencing; cfRNA sequencing; or an RNA fragmentation test.
- the genomic profile includes results from at least one of: a histological study; whole transcriptome sequencing; cfRNA sequencing; whole exome sequencing; whole genome sequencing; a cancer hotspot panel test; a DNA methylation test; a DNA fragmentation test; an RNA fragmentation test; a microsatellite instability (MSI) test; a tumor mutational burden (TMB) test; or a viral status test.
- a histological study whole transcriptome sequencing
- cfRNA sequencing whole exome sequencing
- whole genome sequencing whole genome sequencing
- a cancer hotspot panel test a DNA methylation test
- a DNA fragmentation test an RNA fragmentation test
- a microsatellite instability (MSI) test a microsatellite instability (MSI) test
- TMB tumor mutational burden
- viral status test 88.
- the method of clause 86 or 87, wherein the genomic profile of the subject includes: results from a nucleic acid sequencing-based test.
- 89. The method of any of clause
- the method of clause 95, wherein outputting the report includes: transmitting data indicating the report to an external device.
- the external device is associated with the subject or a healthcare provider.
- the method of clause 96 or 97, wherein the data indicating the report is transmitted over one or more communication networks.
- any of clauses 7 to 103 further including: identifying data indicative of additional biomarkers of the subject, wherein the input data further includes the data indicative of the additional biomarkers of the subject.
- the additional biomarkers include at least one of results from: a histological study; whole transcriptome sequencing; cfRNA sequencing; whole exome sequencing; whole genome sequencing; a cancer hotspot panel test; a DNA methylation test; a DNA fragmentation test; an RNA fragmentation test; a microsatellite instability (MSI) test; a tumor mutational burden (TMB) test; or a viral status test.
- MSI microsatellite instability
- TMB tumor mutational burden
- a viral status test refers to a test that identifies the presence of viral RNA or DNA in a subject.
- the test can identify viral load and/or viral identity.
- the viral status test can identify the presence of viral RNA or DNA FMI Docket No.0032-P L&H Docket No.: F171-0008PCT associated with the occurrence of certain cancers.
- Hotspot mutations also occur in the following genes: AKT2, BRCA1, BRCA2, ERC1, NSD1, POLH, PPM1G, PTEN, RAD18, RAD51, RAD51B, RB1, TERT, TP53, TP53Bp1, ALK, ARMT1, ATAD5, ATG7, ATIC, AXL, BIRC6, BRD3, BRD4, CAPRIN1, CCAR2, CCDC6, CDK5RAP2, CHD9, CIT, CTNNB1, CUL1, EBF1, EIF3E, HIP1, HMGA2, IRF2BP2, NOTCH1, NOTCH4, NPM1, OFD1, TACC1, TACC3, TERF2, TMEM106B, UBE2L3, USP10, WRDR48, 7AP1, ZEB2, and ZMYND8.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Analytical Chemistry (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des techniques de classification d'un ou de plusieurs cancers dont soufre un sujet. Un procédé illustratif consiste à identifier des données indiquant un ADN acellulaire (ADNcf) à partir d'un échantillon prélevé sur un sujet. Une image de fragment est générée sur la base des données. L'image de fragment indique une pluralité de fragments dans l'ADNcf correspondant à une région d'un génome. Des données d'entrée, comprenant l'image de fragment, sont amenées en entrée dans un modèle configuré pour générer une prédiction de la présence ou non d'au moins un cancer chez le sujet. Un rapport est généré sur la base de la prédiction.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363544134P | 2023-10-13 | 2023-10-13 | |
| US63/544,134 | 2023-10-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025080809A1 true WO2025080809A1 (fr) | 2025-04-17 |
Family
ID=95396373
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/050739 Pending WO2025080809A1 (fr) | 2023-10-13 | 2024-10-10 | Classification d'une maladie à l'aide d'images de fragment |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025080809A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200185055A1 (en) * | 2018-10-12 | 2020-06-11 | Cambridge Cancer Genomics Limited | Methods and Systems for Nucleic Acid Variant Detection and Analysis |
| US20230287500A1 (en) * | 2020-05-22 | 2023-09-14 | Aqtual, Inc. | Methods for characterizing cell-free nucleic acid fragments |
-
2024
- 2024-10-10 WO PCT/US2024/050739 patent/WO2025080809A1/fr active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200185055A1 (en) * | 2018-10-12 | 2020-06-11 | Cambridge Cancer Genomics Limited | Methods and Systems for Nucleic Acid Variant Detection and Analysis |
| US20230287500A1 (en) * | 2020-05-22 | 2023-09-14 | Aqtual, Inc. | Methods for characterizing cell-free nucleic acid fragments |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240141432A9 (en) | Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results | |
| US20220154284A1 (en) | Determination of cytotoxic gene signature and associated systems and methods for response prediction and treatment | |
| EP4110957A2 (fr) | Procédés d'analyse d'acides nucléiques acellulaires et applications associées | |
| US20250140348A1 (en) | Methods and systems for predicting an origin of an alteration in a sample using a statistical model | |
| US20200273537A1 (en) | High Throughput Patient Genomic Sequencing and Clinical Reporting Systems | |
| US20250272835A1 (en) | Predicting treatment efficacy by analyzing non-cancer cells | |
| WO2024081769A2 (fr) | Méthodes et systèmes de détection du cancer sur la base de la méthylation de l'adn de sites cpg spécifiques | |
| KR20240132282A (ko) | 단일 분자 게놈-와이드 돌연변이 및 무세포 dna의 단편화 프로파일 | |
| WO2025080809A1 (fr) | Classification d'une maladie à l'aide d'images de fragment | |
| KR20250128956A (ko) | 무세포 dna 단편화를 사용하는 간암 검출 | |
| US20250382667A1 (en) | Identifying patient conditions by transforming nucleic acid sequence data into alternate domains | |
| US20250197932A1 (en) | Disease subtype classification using genomic features and clustering | |
| WO2024259320A2 (fr) | Prédiction de l'expression d'une cellule cancéreuse par analyse de l'état de méthylation d'un adntc | |
| WO2024259316A2 (fr) | Identification et classification de tumeur à l'aide de caractéristiques fragmentomiques | |
| US20220301654A1 (en) | Systems and methods for predicting and monitoring treatment response from cell-free nucleic acids | |
| US20250139774A1 (en) | Methods and systems for machine learning-based prediction of gene alterations from pathology images | |
| US20250356486A1 (en) | Methods and systems for multiple instance learning of tissue sample images | |
| WO2025010296A2 (fr) | Classification pronostique basée sur des marqueurs génétiques | |
| US20250101537A1 (en) | Methods and systems for determining an origin of viral sequence reads detected in a liquid biopsy sample | |
| US20250125008A1 (en) | Methods and systems for evaluation of sex biases in identifying molecular biomarkers for disease | |
| US20250188536A1 (en) | Methods and systems for prediction of alt status | |
| WO2024215498A1 (fr) | Procédé de détection de patients ayant une charge mutationnelle tumorale systématiquement sous-estimée qui peuvent tirer avantage d'une immunothérapie | |
| WO2025024225A2 (fr) | Procédés et systèmes de prédiction d'activité de her2 | |
| WO2025178926A1 (fr) | Procédés et systèmes de classification d'hétérogénéité intra-tumorale | |
| WO2024229084A2 (fr) | Procédés et systèmes d'évaluation de l'hétérogénéité tumorale à l'aide d'une imagerie histopathologique |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24877994 Country of ref document: EP Kind code of ref document: A1 |