WO2025010296A2 - Classification pronostique basée sur des marqueurs génétiques - Google Patents
Classification pronostique basée sur des marqueurs génétiques Download PDFInfo
- Publication number
- WO2025010296A2 WO2025010296A2 PCT/US2024/036612 US2024036612W WO2025010296A2 WO 2025010296 A2 WO2025010296 A2 WO 2025010296A2 US 2024036612 W US2024036612 W US 2024036612W WO 2025010296 A2 WO2025010296 A2 WO 2025010296A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- features
- nucleic acid
- subject
- acid molecules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6816—Hybridisation assays characterised by the detection means
- C12Q1/6825—Nucleic acid detection involving sensors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- prognostic classifications Many types of cancer are associated with different types of prognostic classifications. For example, different patients with endometrial cancer may have vastly different mortality, morbidity, and appropriate treatment options. Clinical providers can determine the prognostic classification of a particular patient using a combination of genetic analysis, histological analysis, and other diagnostic testing. However, it may be difficult, costly, and time-consuming to determine the prognostic classification, and with currently-available techniques, despite this time and effort, prognostic classifications can be inaccurate.
- FIG. 1 illustrates an example environment for determining a prognostic classification of a cancer based on genetic characteristics.
- FIG. 2 illustrates an example environment for training and utilizing a predictive model to determine a prognostic classification of a cancer.
- FIG. 3 illustrates an example of training data utilized to train one or more machine learning models.
- FIG. 4 illustrates an example report summarizing a predicted classification of a cancer of a subject.
- FIG. 5 illustrates an example process for determining a prognostic classification.
- FIG. 6 illustrates an example process for training a predictive model to determine a prognostic classification.
- FIG. 7 illustrates an example environment for sequencing various nucleic acid molecules.
- FIG. 8 illustrates one or more devices configured to perform various operations described herein.
- nucleic acid molecules e.g., DNA and/or RNA
- the nucleic acid molecules are sequenced. Pertinent features are determined by analyzing data indicative of the sequenced nucleic acid molecules.
- a predictive model e.g., at least one machine learning model
- a prognostic classification of a tumor of the subject is determined using techniques described herein. Based on the prognostic classification, care of the subject may be significantly enhanced.
- Implementations of the present disclosure provide significant improvements to the technical field of cancer diagnosis and treatment.
- the prognostic classification of a tumor was dependent on genetic analysis and histological and/or immunohistological studies performed manually by a pathologist.
- the process of fixation, staining, and analysis in order to perform the histological and/or immunohistological studies could take days or weeks, which could lead to significant delays in diagnosis and prognostic classification.
- the delays in diagnosis and prognostic classification could lead to delays in treatment, which could cause significant harm to patients. Beyond this described delay, accuracy of the resultant diagnosis and prognostic classification could not be guaranteed. For patients with an inaccurate diagnosis and prognostic classification, additional significant harm ensues.
- Various implementations of the present disclosure relate to predictive models that are able to determine a prognostic classification with a high level of accuracy. Further, the predictive model may determine the prognostic classification in a relatively short amount of time, particularly when compared with histological studies.
- MMRD mismatch repair deficiency
- TMB tumor mutational burden
- deoxyribonucleic acid may refer to a polymer of nucleotides (also referred to as “nucleobases”) containing deoxyribose.
- the nucleotides in DNA include cytosine (C), guanine (G), adenine (A), and thymine (T).
- Each DNA nucleotide includes a deoxyribose and a phosphate group.
- An example single-stranded DNA (ssDNA) molecule includes a chain of covalently bonded DNA nucleotides.
- the phosphate group of the mth nucleotide is covalently bonded to the deoxyribose of the (/77-1 )th nucleotide, wherein m is a positive integer greater than 2 and less than or equal to the number of DNA nucleotides in the chain.
- DNA is double-stranded and includes two ssDNA molecules that are complementary to one another and coiled around each other in a double helix form.
- the nucleotides of one ssDNA molecule are hydrogen bonded to the nucleotides of the other ssDNA molecule.
- the pyrimidines (A and T) hydrogen bond to each other
- ribonucleic acid may refer to a polymer of nucleotides containing ribose.
- the nucleotides in RNA include cytosine (C), guanine (G), adenine (A), and uracil (U).
- Each RNA nucleotide includes a ribose and a phosphate group.
- RNA molecule the phosphate group of the nth nucleotide is covalently bonded to the ribose of the (n-1 )th nucleotide, wherein n is a positive integer greater than 2 and less than or equal to the number of RNA nucleotides in the chain.
- Messenger RNA is a type of RNA molecule that is synthesized (or “transcribed”) by RNA polymerase (an enzyme) to be complementary to a gene encoded in a DNA sequence, and is also used by a ribosome to synthesize a polypeptide or protein.
- RNA is therefore an example of a “coding RNA.”
- intron sequences are removed from an mRNA via a process known as “RNA splicing.”
- MicroRNA (“miRNA”) are single-stranded RNA molecules that perform post-transcriptional gene expression regulation.
- a miRNA may bind to a complementary mRNA molecule, thereby cleaving, destabilizing, or otherwise preventing the mRNA molecule from being translated into a polypeptide or protein by a ribosome.
- a miRNA has a length in a range of 21 to 23 RNA nucleotides.
- non-coding RNA may refer to a type of RNA that is not translated into a protein.
- non-coding RNA examples include miRNA, transfer RNA (tRNA), and ribosomal RNA (rRNA).
- RNA transfer RNA
- rRNA ribosomal RNA
- the term “functional RNA,” and its equivalents, may refer to any RNA molecule that impacts a biological process.
- functional RNA may include mRNA, miRNA, tRNA, rRNA, and the like.
- base may refer to a monomer of a polymer.
- a base of DNA or RNA is a nucleotide.
- a base pair may refer to a pair of complementary DNA nucleotides, which are hydrogen-bonded to one another in a double-stranded DNA molecule.
- a base pair includes a first base in a first ssDNA and a second base in a second ssDNA, wherein the first and second bases are complementary and hydrogen-bonded to one another.
- nucleotide As used herein, the terms “nucleotide,” “nucleobase,” “nucleic acid,” “nucleic acid molecule,” and their equivalents, may refer to an organic molecule that includes a nitrogenous base, a sugar, and a phosphate group. In various cases, a nucleotide is a monomer of DNA or RNA. A nucleotide, for instance, is a chemical structure.
- 3’ end may refer to a terminus of a singlestranded nucleotide polymer that includes a base whose third carbon in its deoxyribose or ribose is bound to a hydroxyl group while being unbound to another base.
- the terms “5’ end,” “5-prime end,” and their equivalents may refer to a terminus of a singlestranded nucleotide polymer that includes a base whose fifth carbon in its deoxyribose or ribose ring is unbound to another base. In some cases, the fifth carbon is bound to a phosphate group.
- the “length” of a polymer refers to a number of covalently bonded monomers that are included in the polymer.
- the length of a DNA molecule may be the number of covalently bonded nucleotides in at least one strand of the DNA molecule and/or the number of base pairs in the DNA molecule.
- the length of an RNA molecule may be the number of covalently bonded nucleotides in the RNA molecule.
- the term “gene,” and its equivalents, refers to a sequence of DNA nucleotides that is transcribed into a functional RNA.
- the functional RNA for instance, is RNA that is translated into a polypeptide or protein (e.g., mRNA) or that has some other biological function (e.g., miRNA, tRNA, etc.).
- a gene is “expressed” when it is used as a template to generate a functional RNA.
- a subject for instance, has numerous genes contained in the subject’s genome.
- a gene may include both introns and exons.
- the term “intron,” and its equivalents, may refer to a subset of DNA nucleotides in a gene that is not used to code for any functional RNA that is expressed by the organism.
- the term “exon,” and its equivalents may refer to a subset of DNA nucleotides in a gene that is used to code for a functional RNA.
- an exon may encode a polypeptide or protein that is expressed by the organism.
- a gene can be represented in data (e.g., as data representative of the sequence of DNA nucleotides in the gene) or as a chemical structure (e.g., as the sequence of DNA nucleotides itself).
- the term “genome,” and its equivalents, refers to the aggregate of genes of a subject.
- a genome represents the sequences of several linear DNA molecules that are present in a subject's chromosomes.
- a “reference genome” refers to an aggregation of genes of one or more reference subjects.
- a genome is represented in data.
- pangenome refers to an aggregate set of genes from multiple subgroups (e.g., strains) within a population (e.g., a clade) of subjects.
- a pangenome indicates genes that are present in all subjects within the population, as well as genes that are present in some of the subjects of the population.
- a pangenome is represented in data, for instance.
- transcriptome refers to the aggregate of RNA sequences of a subject. In some cases, a transcriptome is limited to mRNA sequences. In various examples, a transcriptome is represented in data.
- genomic DNA may refer to DNA molecules that are obtained from a chromosome and/or nucleus of a cell.
- DNA fragment may refer to DNA molecules that are excised and/or broken off from a larger DNA molecule.
- cell-free DNA may refer to DNA fragments that are non-encapsulated and obtained outside of cells within a sample (e.g., a liquid biopsy sample).
- circulating tumor DNA may refer to a cfDNA molecule that originates from a cancer cell.
- the term “promoter,” and its equivalents, may refer to a portion of a DNA molecule that binds one or more proteins in order to initiate transcription of a gene.
- the promotor is located “upstream” of the gene.
- the promotor is located between the 5' end of the DNA molecule and the gene.
- a promotor may include one or more binding sites for RNA polymerase, and/or one or more transcription factor binding sites.
- a promotor includes one or more CpG islands.
- a promoter for instance, includes a transcription start site.
- CpG island may refer to a continuous portion of a DNA molecule whose sequence includes greater than a threshold amount (e.g., greater than 50%) of G-C base pairs.
- a threshold amount e.g. 50%
- the term “enhancer,” and its equivalents may refer to a portion of a DNA molecule that binds one or more proteins in order to increase the chance that a gene will be transcribed. For instance, an enhancer includes one or more transcription factor binding sites. In various cases, an enhancer includes one or more CpG islands.
- cancer may refer to a condition of a subject in which particular cells (referred to as “cancer cells”) divide uncontrollably in the subject's body.
- a cancer is characterized by a location or tissue type from which the cancer cells originated.
- a cancer is characterized by a location or tissue type in which the cancer cells are located.
- tumor As used herein, the terms “tumor,” “neoplasm,” and their equivalents, may refer to a mass of tissue including cancer cells.
- liquid biopsy may refer to a process of obtaining a fluid sample from a subject's body.
- the sample for instance, can be referred to as a “liquid biopsy sample.”
- fluids that are sampled from the body include blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, and saliva.
- tissue biopsy may refer to a process of obtaining a sample of cells from a subject’s body.
- a tissue biopsy in various cases, is performed by cutting a mass of cells from the subject's body.
- a tissue biopsy is a procedure performed by a surgeon, interventional radiologist, interventional cardiologist, or other specialized clinician.
- tissue or tissue biopsy sample can be used to refer to the sample of cells obtained using a tissue biopsy.
- the term “subject,” and its equivalents, may refer to a human or non-human animal.
- a subject that is receiving care from at least one care provider may be referred to as a “patient.”
- machine learning may refer to the use of a computing devices to learn patterns in training data. The process of learning these patterns may be referred to as “training.” In particular cases, one or more computing devices may perform machine learning by executing a machine learning model.
- machine learning model may refer to data encoding instructions that, when executed by at least one computing device, causes the at least one computing device to learn patterns in training data by optimizing one or more metrics, values, or other types of parameters After training, an ML model, when executed by at least one computing device, causes the at least one computing device to utilize the optimized parameters in order to perform one or more tasks.
- variant may refer to a difference between a subject genetic sequence and a reference sequence.
- a variant may correspond to a difference between one or more nucleotides in a genome of a subject and one or more corresponding nucleotides in at least one reference genome or pangenome.
- a variant may be characterized by its identity (e.g., what nucleotides are different), its position (e.g., where are the nucleotides located in the genome, what chromosome contains the nucleotides, what gene contains the nucleotides, etc.), its length (e.g., how many nucleotides are different from the reference sequence), its type (e.g., substitution, insertion, deletion, copy number alternation, rearrangement of fusion, etc.), and other features that indicates its significance and/or relevance.
- a variant represents any apparent alteration in a sequence that has been read from a nucleic acid molecule with respect to the reference sequence, such as reads cleaved by restriction enzymes (RE).
- RE restriction enzymes
- a variant can be represented in data (e.g., by data characterizing the variant) or as a chemical structure (e.g., the nucleotides themselves).
- the term "mutation,” and its equivalents, may refer to a change in a gene.
- locus may refer to a position on a chromosome.
- the position is the location of a particular sequence-of-interest, such as a gene, genetic marker, or the like.
- a locus is defined based on the chromosome number at which the position is located, whether the position is defined on a short arm or a long arm of the chromosome, a region in which the position is located, a band in which the position is located, a sub-band in which the position is located, or any combination thereof.
- loci may refer to the plural form of the term “locus.”
- substitution can refer to a nucleotide in a subject sequence that is different than an equivalent nucleotide (e.g., a nucleotide at the same position) in a reference sequence.
- insertion can refer to a nucleotide in a subject sequence that is added with respect to a reference sequence.
- the term “deletion,” and its equivalents, can refer to the removal of a nucleotide from a nucleotide sequence.
- copy number alternation can refer to a portion of a genetic sequence (e.g., a genome) that is repeated. For instance, different individuals within a population may have a different number of repeated portions of the genetic sequence, that is, different copy numbers. In some cases, a copy number is defined based on a predetermined locus within a genome.
- the terms “rearrangement of fusion,” “fusion rearrangement,” “translocation,” and their equivalents can refer to a change in the relative position of one or more portions of a reference sequence, thereby generating a gene that was not present in the reference sequence.
- the term “sequencing,” and its equivalents may refer to a process of identifying the order and identity of monomers in a polymer chain, such as the order and identity of nucleotides in a DNA or RNA molecule.
- the terms “whole genome sequencing,” “WGS,” and their equivalents, may refer to the process of sequencing an entire genome of a subject, including the introns and exons of the genes of the subject.
- the term “whole exome sequencing,” and its equivalents, may refer to the process of sequencing all exomes of a subject.
- targeted sequencing and its equivalents, may refer to the process of sequencing a portion of the genome of a subject, such as sequencing a single gene of the subject.
- RNA or DNA RNA
- massive parallel sequencing may refer to a technique for simultaneously performing multiple reactions that can be used to identify the order and identity of monomers in multiple polymer chains.
- massive parallel sequencing can be performed using sequencing-by-synthesis on clonally amplified DNA molecules that are located in spatially separated regions, which are individually monitored by sensors.
- nanopore sequencing may refer to a technique for identifying the order and identity of monomers in a polymer chain by transporting the polymer chain from a first space to a second space, wherein the first space and the second space are separated by a substrate, by directing the polymer chain through a small hole (known as a “nanopore”) embedded in the substrate, and monitoring a relative electrical signal (e.g., a voltage or current) between the first space and the second space.
- a relative electrical signal e.g., a voltage or current
- the term “sensor,” and its equivalents, may refer to a physical device or other apparatus that is configured to detect one or more detection signals.
- detection signal may refer to a physical signal that can be identified, characterized, or otherwise perceived by a sensor.
- sequence read data may refer to data that is indicative of an order and identity of monomers in a polymer, such as the order and identity of nucleotides in a DNA or RNA sequence.
- sequence read data is generated via a sequencing operation.
- image may refer to 2D or 3D array of data indicative of an array of pixels or voxels.
- ligating may refer to a process of joining two molecules together, for example, with a chemical bond.
- the term “adapter,” and its equivalents may refer to an oligonucleotide that can be ligated to a target nucleic acid molecule. In various cases, an adapter prepares the target nucleic acid molecule for sequencing.
- the term “bait molecule,” and its equivalents may refer to a nucleic acid molecule having a region that is complementary to a region of a target molecule (e.g., cfDNA).
- a bait molecule includes, for instance, a nucleic acid molecule that can hybridize to (/.e., is complementary to) a target molecule can be used to capture the target molecule
- the bait molecule is a capture oligonucleotide (or capture probe).
- the bait molecule is suitable for solution phase hybridization to the target molecule.
- the bait molecule is suitable for solid phase hybridization to the target molecule.
- the bait molecule is suitable for both solution-phase and solid-phase hybridization to the target molecule.
- the design and construction of bait molecules is described in more detail in, e.g., International Patent Application Publication No. WO 2020/236941.
- amplifying may refer to a process of generating copies of a target molecule, such as a nucleic acid molecule.
- hybridization may refer to a process by which to complementary single-stranded nucleic acid molecules bind to one another, thereby forming a double-stranded nucleic acid molecule.
- double-stranded nature of the nucleic acid molecule is maintained under stringent hybridization conditions.
- Exemplary stringent hybridization conditions include an overnight incubation at 42 °C in a solution including 50% formamide, 5XSSC (750 mM NaCI, 75 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5XDenhardt's solution, 10% dextran sulfate, and 20 pg/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 0.1XSSC at 50 °C.
- 5XSSC 750 mM NaCI, 75 mM trisodium citrate
- 50 mM sodium phosphate pH 7.6
- 5XDenhardt's solution 10% dextran sulfate
- 20 pg/ml denatured, sheared salmon sperm DNA followed by washing the filters in 0.1XSSC at 50 °C.
- the term “complementary,” and its equivalents may refer to a state of two single-stranded nucleic acid molecules with respective sequences that cause the nucleic acid molecules to spontaneously hybridize to one another.
- One nucleic acid molecule for instance, may have a sequence that causes each nucleic acid to hydrogen bond to a respective nucleic acid in the other nucleic acid molecule.
- cancer may refer to a composition or process that can be used to remediate a health problem.
- Cancer therapies for instance, include surgery, radiotherapy, chemotherapy, immunotherapy, cell-based therapies, and the like.
- cancer therapies include abemaciclib (Verzenio), abiraterone acetate (Zytiga), acalabrutinib (Calquence), ado-trastuzumab emtansine (Kadcyla), afatinib dimaleate (Gilotrif), aldesleukin (Proleukin), alectinib (Alecensa), alemtuzumab (Campath), alitretinoin (Panretin), alpelisib (Piqray), amivantamab-vmjw (Rybrevant), anastrozole (Arimidex), apalutamide (Erleada), asciminib hydrochloride (Scemblix), atezolizumab (Tecentriq), avapritinib (Ayvakit), avelumab (Bavencio), axicabtagene ciloleucel (Yescarta
- cancer cells of a subject may be responsive to a particular treatment if, after the subject is administered the treatment, the cancer cells are diminished by a particular progression level (e.g., radiographic progression level, marker-based progression level, such as prostate-specific antigen (PSA) progression, etc.). Accordingly, the responsiveness of the cells to the type of therapy may indicate the effectiveness of that therapy.
- a particular progression level e.g., radiographic progression level, marker-based progression level, such as prostate-specific antigen (PSA) progression, etc.
- treatment-resistant may refer to a type of cancer that cannot be substantially killed using a predetermined type of therapy.
- metastasis profile may refer to a propensity of a type of cancer to metastasize into one or more differentiated tumor types besides the cancer’s tissue origin.
- the metastasis profile can further indicate the type of tissue in which the cancer can or is likely to metastasize.
- clinical trial may refer to a research study used to evaluate a hypothesis based on participation by one or more subjects.
- a clinical trial can be used to assess the efficacy and/or safety of a proposed therapy.
- a clinical trial may be performed in furtherance of approval of a treatment by a regulatory authority (e.g., the United States Food & Drug Administration (FDA)).
- FDA United States Food & Drug Administration
- FIG. 1 illustrates an example environment 100 for determining a prognostic classification of a cancer based on genetic characteristics.
- a subject 102 may present to a clinical environment with a lesion 104.
- the lesion 104 may be a tumor that includes cancer cells.
- the subject 102 has one or more types of cancer, such as adrenal cancer, bladder cancer, blood cancer, bone cancer, brain cancer, breast cancer, carcinoma, cervical cancer, colon cancer, colorectal cancer, corpus uterine cancer, ear, nose and throat (ENT) cancer, endometrial cancer, esophageal cancer, gastrointestinal cancer, head and neck cancer, Hodgkin's disease, intestinal cancer, kidney cancer, larynx cancer, leukemia, liver cancer, lymph node cancer, lymphoma, lung cancer, melanoma, mesothelioma, myeloma, nasopharynx cancer, a neuroblastoma, non-Hodgkin's lymphoma, oral cancer, ovarian cancer, pancreatic cancer, penile cancer, pharynx cancer, prostate cancer, rectal cancer, sarcoma, seminoma, skin cancer, stomach cancer, a teratoma, testicular cancer, thyroid cancer, uterine cancer, vaginal
- the subject 102 has a B cell cancer (multiple myeloma), a melanoma, breast cancer, lung cancer, bronchus cancer, colorectal cancer, prostate cancer, pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain cancer, central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine cancer, endometrial cancer, cancer of an oral cavity, cancer of a pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel cancer, appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, a cancer of hematological tissue, an adenocarcinoma, an inflammatory myofibroblastic tumor, a gastrointestinal stromal tumor (GIST), colon cancer, multiple myeloma (MM), myelodysplastic syndrome (MDS), myeloproliferative
- MM multiple myel
- the subject 102 has acute lymphoblastic leukemia (Philadelphia chromosome positive), acute lymphoblastic leukemia (precursor B-cell), acute myeloid leukemia (FLT3+), acute myeloid leukemia (with an IDH2 mutation), anaplastic large cell lymphoma, basal cell carcinoma, B-cell chronic lymphocytic leukemia, bladder cancer, breast cancer (HER2 overexpressed/amplified), breast cancer (HER2+), breast cancer (HR+, HER2-), cervical cancer, cholangiocarcinoma, chronic lymphocytic leukemia, chronic lymphocytic leukemia (with 17p deletion), chronic myelogenous leukemia, chronic myelogenous leukemia (Philadelphia chromosome positive), classical Hodgkin lymphoma, colorectal cancer, colorectal cancer (dMMR/MSI-H), colorectal cancer (KRAS wild type), cryopyrin-associated periodic
- the subject 102 has endometrial cancer.
- the lesion 104 includes an endometrial tumor that is present in endometrial tissue of the subject 102.
- a care provider 105 is responsible for diagnosing and/or treating the subject 102.
- the lesion 104 may be initially identified using a noninvasive technique.
- the lesion 104 may be visualized using an imaging modality, such as ultrasound, x-ray, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), single photon emission CT (SPECT), or any combination thereof.
- CT computed tomography
- MRI magnetic resonance imaging
- PET positron emission tomography
- SPECT single photon emission CT
- the care provider 105 may identify the presence of the lesion 104, but may be unable to determine whether the lesion 104 is a cancerous tumor using noninvasive diagnostic methodologies.
- the care provider 105 may be unable to identify whether the tumor is metastatic or benign, or may be unable to otherwise categorize the tumor.
- the care provider 105 is unable to determine a prognostic classification of the lesion 104 (e.g., a tumor) using noninvasive techniques.
- the term "prognostic classification,” and its equivalents may refer to a characteristic of a subject presenting with a disease (e.g., cancer), wherein the characteristic is determinative of, or at least correlated with, an effectiveness of at least one therapy at treating the disease, an ineffectiveness of at least one therapy at treating the disease, a survivability (e.g., a likelihood that the subject will survive by a predetermined date or time), an expected quality of life, at least one predetermined symptom, at least one comorbidity, or any combination thereof.
- a disease e.g., cancer
- the care provider 105 could classify the lesion 104 by initiating a tissue biopsy on the subject 102. For instance, the care provider 105 could surgically remove a tissue sample from the lesion 104 and/or review the tissue sample using histochemistry and/or immunohistochemistry.
- the tissue sample may not be classifiable using conventional histological techniques, such as conventional immunohistochemical staining and review.
- the single care provider 105 would be trained to perform the tissue biopsy (which would be performed by a surgeon), to administer anesthesia to the subject 102 during the tissue biopsy (which would be performed by an anesthesiologist), and the analysis of the tissue biopsy (which would be performed by a trained pathologist), such that the classification would utilize multiple highly trained care providers. Even if the lesion 104 was classifiable by these means, the coordinated efforts of these care providers could delay classification of the lesion 104 and could cause significant expense to the subject 102. In various examples, the delay in classification could cause significant emotional hardship to the subject 102, who could be prevented from receiving an informed prognosis for weeks. Further, the delay in classification could delay a therapy of the lesion 104, which could cause lasting harm to the subject 102, particularly in cases in which the lesion 104 is representative of an aggressive form of cancer.
- a prognostic classification of the lesion 104 is determined without performing histochemistry and/or immunohistochemistry.
- a sample 106 is obtained from the subject 102.
- the sample 106 includes a tissue biopsy sample.
- the sample 106 is obtained by removing cells from the lesion 104 and from the subject 102.
- the tissue biopsy sample is surgically excised from the subject 102.
- the sample includes a liquid biopsy sample.
- the liquid biopsy sample 106 includes blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, saliva, or some other fluid obtained from the body of the subject 102.
- a blood sample is obtained intravenously from the subject 102.
- the liquid biopsy sample 106 is a plasma sample obtained from the blood of the subject 102.
- the liquid biopsy sample 106 can be obtained in a minimally invasive procedure, which could be performed by a medical technician rather than a surgeon.
- the sample 106 includes nucleic acid molecules 108.
- the nucleic acid molecules 108 include genomic DNA (gDNA).
- the nucleic acid molecules 108 include chromosomal DNA that is located in, or extracted from, cells in the sample 106.
- the DNA is extracted from nuclei and the cells in the sample 106 using mechanical shearing and/or the introduction of a chemical (e.g., a detergent).
- the DNA may be subsequently isolated from proteins and other cellular materials.
- the nucleic acid molecules 108 indicate an entire genome of the subject 102 and/or the lesion 104. Thus, a genome of the subject 102 and/or the lesion 104 can be determined by sequencing the DNA in the nucleic acid molecules 108.
- the nucleic acid molecules 108 include RNA.
- the nucleic acid molecules 108 include messenger RNA (mRNA), microRNA, non-coding RNA, functional RNA, or any combination thereof.
- mRNA messenger RNA
- RNA in the nucleic acid molecules 108 may be indicative of proteins expressed in the cells of the subject 102 and/or the lesion 104.
- the sample 106 includes cell-free DNA (cfDNA).
- the cfDNA includes circulating tumor DNA (ctDNA) and/or non-ctDNA.
- cancer cells within the lesion 104 will lyse and release the ctDNA into the bloodstream of the subject 102. Further, other cells additionally release non-ctDNA into the bloodstream of the subject.
- the cfDNA includes fragments with lengths that are in a range of 1 to 500, 3 to 500, or 100 to 500 bases long.
- the cfDNA includes fragments that are about 170 bases long and/or fragments that are about 340 bases long.
- the cfDNA includes fragments that are 100 to 240 bases long and/or fragments that are 270 to 410 bases long.
- the sample 106 is transported to a location that is remote from the subject 102 for further processing.
- the sample 106 is removed from the subject 102 in a clinical environment (e.g., a hospital) and is then transported to a remote laboratory for further testing and analysis.
- a sequencer 112 is configured to generate sequence read data 114 indicating the sequences of the nucleic acid molecules 108.
- the sequencer 112 for instance, includes one or more devices that are configured to generate the sequence read data 114 by processing at least a portion of the sample 106.
- the nucleic acid molecules 108 are extracted from the sample 106. The extraction can be performed by the sequencer 112, by another device, manually (e.g., by a laboratory technician), or any combination thereof. Any appropriate extraction method known to those of ordinary skill in the art can be utilized.
- the sequencer 112 is configured to perform one or more processes (e.g., chemical reactions) on the nucleic acid molecules 108 in order to prepare the nucleic acid molecules 108 for sequencing.
- the sequencer 112 may ligate adapters onto the nucleic acid molecules 108 and/or amplify the nucleic acid molecules 108, such that numerous copies of the ligated nucleic acid molecules 108 are available for sequencing.
- the adapters include, for example, amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences.
- the nucleic acid molecules 108 may be amplified by generating multiple copies of the nucleic acid molecules 108 using one or more techniques such as polymerase chain reaction (PCR), a non-PCR amplification technique, or an isothermal amplification technique.
- PCR polymerase chain reaction
- the sequencer 112 may identify the length, position, and identity of the bases in the nucleic acid molecules 108 by sequencing the nucleic acid molecules 108 (e.g., the amplified and/or ligated nucleic acid molecules 108).
- the sequencer 112 utilizes first-generation sequencing (e.g., Sanger sequencing), second-generation sequencing (e.g., massive parallel sequencing), third-generation sequencing (e.g., nanopore sequencing), or a combination thereof.
- first-generation sequencing e.g., Sanger sequencing
- second-generation sequencing e.g., massive parallel sequencing
- third-generation sequencing e.g., nanopore sequencing
- the sequencer 112 is configured to sequence substantially all of the nucleotides of all of the nucleic acid molecules 108 fragments obtained from the sample 106.
- the sequencer 112 is configured to perform targeted sequencing. For instance, the sequencer 112 may determine whether the nucleic acid molecules 108 fragments contain one or more predetermined sequences.
- the sequencer 112 includes one or more sensors that are configured to detect physical signals (also referred to as "detection signals”) that are indicative of the nucleotide sequences of the nucleic acid molecules 108.
- the sequencer 112 may perform sequencing-by-synthesis.
- the sequencer 112 may include one or more optical sensors configured to detect optical signals emitted from fluorescently tagged dNTPs that are joined together in a synthesized DNA strand using the ligated nucleic acid molecules 108 as templates.
- the optical signals detected by the optical sensor(s) for instance, are indicative of the sequences of the nucleic acid molecules 108.
- the sequencer 112 may perform nanopore sequencing.
- the sequencer 112 includes one or more electrical sensors configured to measure an electrical signal (e.g., an electrical current) across a substrate as the ligated nucleic acid molecules 108 are directed through a nanopore extending through the substrate.
- the electrical signal over time is indicative of the sequences of the nucleic acid molecules 108 in the sample 106.
- the sequencer 112 in various implementations, is configured to generate the sequence read data 114 as digital data based on the analog signals detected by the sensor(s). For instance, the sequencer 112 includes one or more analog to digital converters (ADCs). In various cases, the sequencer 112 includes at least one processor configured to generate the sequence read data 114.
- ADCs analog to digital converters
- the sequencer 112 performs RNA sequencing (RNA-seq) on the nucleic acid molecules 108.
- the nucleic acid molecules 108 include RNA that is extracted from the sample 106.
- the RNA in the nucleic acid molecules 108 is fragmented.
- complementary DNA (cDNA) is generated using reverse transcriptase, such that the cDNA includes sequences that are complementary to the RNA in the nucleic acid molecules 108 from the sample 106
- the cDNA can be sequenced using the DNA sequencing techniques described above. Accordingly, in some cases, the sequence read data 114 indicates sequences of RNA present in the sample 106, which may be indicative of the transcriptome of the subject 102 and/or the lesion 104.
- the sequencer 112 performs sequencing on a subset of the nucleic acid molecules 108.
- the sequencer 112 may perform targeted sequencing on one or more predetermined genes, such as any of the genes described hereien.
- a feature selector 116 identifies features 118 of the nucleic acid molecules 108 by analyzing the sequence read data 114. In various implementations, the feature selector 116 identifies, calculates, or otherwise determines the features 118 based on the sequences of the nucleic acid molecules 108 indicated in the sequence read data 114. One or more types of features are identified by the feature selector 116.
- the features 118 include a mismatch repair deficiency (MMRD) probability score.
- the MMRD probability score indicates a likelihood tone or more MMR pathways of cells in the sample 106 are ineffective at performing mismatch repair.
- the MMRD probability score is determined by determining genomic features by analyzing the sequence read data 114, inputting the genomic features into at least one trained machine learning model trained to generate the MMRD probability score based on previously analyzed data from a population omitting the subject 102.
- the genomic features relevant to the MMRD probability score include, for instance, a fraction unstable score, a composite COSMIC single-base substitution signature, a COSMIC indel signature, a copy number signature, a tumor mutational burden score, a blood-based tumor mutational burden score, a germline status for a mutation in one or more genes associated with DNA mismatch repair (MMR) (also referred to as “MMR genes”), a methylation status for the one or more MMR genes, a methylation status for one or more promoters associated with the one or more MMR genes, a methylation status of one or more enhancers associated with the one or more MMR genes, or any combination thereof.
- MMR DNA mismatch repair
- MMR genes include, for instance, MSH2, MSH6, PMS2, or MLH1.
- the features 118 include a copy number state of one or more genetic loci indicated by the sequence read data 114.
- a number of copies of a predetermined sequence at a given locus in the genome of the subject 102 and/or the lesion 104 (also referred to as a “copy number” of the locus) is determined.
- the copy number state may indicate copy numbers of one or more loci in the genome of the subject 102 and/or the lesion 104.
- the copy number state may indicate the presence and/or amount of copies of various sequences present in the genome of the subject 102 and/or the lesion 104, which may be due to copy number variation.
- the sequence read data 114 may represent a genome of the subject 102 and/or the lesion 104. Various portions of the sequence read data 114 are aligned with at least one reference sequence (e.g., a reference genome).
- the aligned data is segmented using at least one segmentation technique (e.g., a circular binary segmentation (CBS) method, a maximum likelihood method, a hidden Markov chain method, a walking Markov method, a Bayesian methods, a long-range correlation method, a change point method, or any combination thereof), thereby generating non-overlapping segments of the sequence read data 114, wherein a sequence associated with a given segment is associated with the same copy number (e.g., a number of instances in which the sequence appears in the segment).
- Various genetic loci are binned, or otherwise sorted, with respect to the segments of the genome of the subject 102 and/or the lesion 104.
- the copy number state for instance, is representative of the respective copy numbers associated with the genetic loci.
- the features 118 include the presence or absence of a pathogenic variant in one or more genes associated with classifying the lesion 104.
- the genes include one or more of ABL1 , ACVR1 B, AKT1 , AKT2, AKT3, ALK, ALOX12B, AMER1 , APC, AR, ARAF, ARFRP1 , ARID1A, ASXL1 , ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1 , BARD1 , BCL2, BCL2L1 , BCL2L2, BCL6, BCOR, BCORL1 , BCR, BRAF, BRCA1 , BRCA2, BRD4, BRIP1 , BTG1 , BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1 , CCND2, CCND3, CCNE1 , CD22, CD274, CD70, CD74, CD
- the genes include one or more of ABL, ALK, ALL, B4GALNT1, BAFF, BCL2, BRAF, BRCA, BTK, CD19, CD20, CD3, CD30, CD319, CD38, CD52, CDK4, CDK6, CML, CRACC, CS1 , CTLA-4, dMMR, EGFR, ERBB1, ERBB2, FGFR1-3, FLT3, GD2, HDAC, HER1 , HER2, HR, IDH2, IL-1 p, IL-6, IL-6R, JAK1 , JAK2, JAK3, KIT, KRAS, MEK, MET, MSI-H, mTOR, PARP, PD-1, PDGFR, PDGFRa, PDGFR
- relevant genes may include TP53, PTEN, POLE, MKI67, FAT3, TAF1 , ZFHX3, RPL22, SPTA1 , FAM135B, CSMD3, GIGYF2, CSDE1 , MLL4, ATR, CTNNB1 , USH2A, LIMCH1 , RRN3P2, FBXW7, CDH19, USP9X, COL11A1 , BOOR, ARID1A, ZNF770, ARID5B, SLC9A11 , KRAS, PNN, INPP4A, CTCF, CHD4, AMY2B, RBMX, PPP2R1A, TNFAIP6, PIK3R1, SGK1, HOXA7, METTL14, HPD, MIR1277, CCND1 , MECOM, NFE2L2, and ESR1.
- the features 118 may include the presence of one or more pathogenic variants in POLE, TP53, CTNNNB1 , L1CAM, PTEN, an estrogen receptor (ER) gene (e.g., ESR1 , ESR2, etc.), a progesterone receptor (PR) gene (e.g., PGR).
- ER estrogen receptor
- PR progesterone receptor
- microsatellite instability is highly polymorphic DNA-repeat regions
- “microsatellite” refers to a repetitive nucleic acid having repeat units of less than about 10 base pairs or nucleotides in length
- a microsateili te refers to a tract of tandemly repeated (Le. adjacent) DNA motifs ranging from one to six or up to ten nucleotides, with each motif repeated 5 to 50 repeated times
- mutations e.g., insertions or deletions
- MMR pathways are impaired (e.g., the MMR genes of the hosting cell include variants that impede function), then the mutations at the microsatellites may be substantially retained.
- MMR pathways are impaired (e.g., the MMR genes of the hosting cell include variants that impede function), then the mutations at the microsatellites may be substantially retained.
- MMR instability refers to genetic instability in the microsatellite regions. Cancer patients with microsatellite instability classified as being high (MSI-H or MSI-High) frequently exhibit an accumulation of somatic mutations in tumor cells that leads to a range of molecular and biological changes including high tumor mutational burden, increased expression of neoantigens and abundant tumor-infiltrating lymphocytes. Chang et al.
- MSI score refers to an amount of instability in one or more microsatellites.
- an MSI score can be represented as a fraction (i.e. , an “MSI fraction”) of instability in the one or more microsatellites.
- MSI fraction a fraction of instability in the one or more microsatellites.
- Other types of portions of DNA may be associated with a high likelihood of mutations.
- the features 118 include a fraction unstable score, indicative of mutations in the microsatellites and other portions of the genome that are prone to mutations.
- an MSI score can be determined based on a predetermined set of repetitive loci (e.g., 2000 repetitive loci, each with a minimum of 5 repeat units of mono-, di-, and trinucleotides).
- the feature selector 116 may determine lengths of repetitive sequences corresponding to the loci. If an example locus among the loci corresponds to a predetermined repeat length, the locus is considered to be “unstable.”
- the MSI score for instance, is determined by determining an amount of the unstable loci (e.g., a fraction of the unstable loci with respect to the total number of repetitive loci evaluated).
- the MSI score is used to determine whether the subject 102 and/or lesion 104 is MSI-High (MSI-H). For example, MSI-H status may be applicable if the MSI score is greater than a threshold (e.g., 0.5%).
- a threshold e.g. 0.8%
- the features 118 include a mutation signature.
- a mutational signature can represent an amount and/or identity of mutations (e.g., insertions, deletions, double-base substitutions, single-base substitutions, or any combination thereof) indicated in the nucleic acid molecules 108 from the subject 102.
- the mutational signature indicates an amount (e.g., number or percentage) of individual classes of base substitutions present in the nucleic acid molecules 108.
- the classes include single-base substitutions including C>A, C>G, C>T, T>A, T>C, and T>G.
- a mutational signature can be derived by comparing the sequences indicated in the sequence read data 114 to at least one reference sequence, such as a reference genome.
- the features 118 may include a Catalogue Of Somatic Mutations In Cancer (COSMIC) mutational signature, such as a COSMIC indel signature.
- COSMIC Catalogue Of Somatic Mutations In Cancer
- the features 118 include a single-base substitution signature.
- the features 118 include a tumor mutational burden (TMB) score
- TMB tumor mutational burden
- TMB Tumor mutational burden
- TMB score refers to the number of somatic mutations in a tumor’s genome and/or the number of somatic mutations per area of the tumor's genome.
- TMB refers to the number of somatic mutations per megabase (Mb) of DNA sequenced.
- Mb megabase
- germline (inherited) variants are excluded when determining TMB, given that the immune system has a higher likelihood of recognizing these as self, in various cases, driver mutations are excluded from a TMB calculation.
- the features 118 include the presence, amount, type, or any combination thereof, of one or more hotspot mutations.
- Hotspots can refer to loci in the genome of the subject 102 and/or the lesion 104 that are prone to mutation. Examples of hotspots include CpG islands, microsatellites, centromeric DNA, telomers, subtelomeric regions, common fragile sites, palindromic AT-rich repeats (PATRRs), G-quadruplexes, R-loops, and the like.
- Hotspot mutations give rise to oncological outcomes.
- PhyloP, SIFT, Grantham, COSMIC and PolyPhen-2 are in silico tools that can be used to assess pathogenicity of identified variants.
- Exemplary hotspot genes and mutations include EGFR exon 19 activating mutation, EGFR exon 19 deletion, EGFR exon 19 insertion, EGFR exon 19 sensitizing mutation, EGFR exon 20 activation mutation, EGFR exon 20 insertion, EGFR G719 mutation, EGFR L858R mutation, EGFR L861 mutation, EGFR S768 mutation, EGFR T790M mutation, C797 mutation, KIT activating mutation, KRAS activating mutation, MET activating mutation, NRAS activating mutation, PMS2 promoter mutations, among many others.
- Hotspot mutations also occur in the following genes: AKT2, BRC.A1 , BRCA2, ERC1 , NSD1, POLH, PPM1G, PTEN, RAD18, RAD51 , RAD51 B, RB1, TERT, TP53, TP53Bp1, ALK, ARMT1, ATAD5, ATG7, ATIC, AXL, BIRC6, BRD3, BRD4, CAPRIN1, CCAR2, CCDC6, CDK5RAP2, CHD9, GIT, CTNNB1, CUL1 , EBF1 , EIF3E, HIP1 , HMGA2, IRF2BP2, NOTCH1, NOTCH4, NPM1 , OFD1 , TACC1, TACC3, TERF2, TMEM106B, UBE2L3, USP10, WRDR48, YAP1 , ZEB2, and ZMYND8.
- the features 118 include the presence, amount, type, or any combination thereof, of one or more aneuploid events.
- the features 118 may indicate whether the subject 102 and/or the lesion 104 includes one or more extra chromosomes (e.g., greater than a pair of 23 chromosomes) or one or more missing chromosomes (e.g., less than the pair of 23 chromosomes).
- the features 118 include additional biomarker data.
- features 118 may include data indicating at least one of a histological and/or immunohistological image of the sample 106 or another sample of the lesion 104, a genomic alteration, or a viral status of the subject 102 and/or lesion 104.
- the additional biomarker data may be generated based on the sample 106, medical images, or other samples obtained from the subject 102.
- the additional biomarker data includes an image of a stained section of the lesion 104. For instance, the stained section is stained with hematoxylin and eosin (H&E) and/or at least one immunostain.
- H&E hematoxylin and eosin
- a predictive model 120 is configured to generate a prognostic classification 122 based on the features 118.
- the predictive model 120 may include one or more mathematical and/or computer- based models that are configured to predict the prognostic classification 122 based on the features 118.
- the predictive model 120 may include a regression model, threshold rule, confidence interval, or other type of statistical model capable of categorizing the cancer based on the features 118.
- the predictive model 120 includes at least one classifier configured to generate the prognostic classification 122 based on the features 118.
- the predictive model 120 includes at least one trained ML model configured to output the prognostic classification 122 in response to receiving the features 118 in input data.
- parameters of the ML model(s) may have been previously optimized based on training data including f features of individuals within a population omitting the subject 102.
- the ML model(s) was trained using an unsupervised or semisupervised learning technique, wherein the parameters were optimized to categorize (e.g., cluster) the features of the population.
- the ML model(s) was trained using a supervised learning technique, wherein the training data further included ground truth prognostic classifications of the individuals in the population, such that the parameters were optimized to minimize a loss between predicted prognostic classifications generated by the ML model(s) based on the features of the population and the ground truth prognostic classifications of the cancers experienced by the individuals in the population.
- the population represented by the training data may include individuals without cancer, as well as individuals with a variety of cancer types and metastasis states.
- ML models can be included in the predictive model 120, such as a neural network (e.g., a convolutional neural network (CNN)), a nearest-neighbor model, a regression analysis model, a clustering model, a principal component analysis model, a gradient boosting model, a random forest, or any combination thereof.
- the predictive model 120 includes a hybrid model, that includes multiple types of ML models.
- the predictive model may include a CNN and a clustering model.
- the predictive model 120 is unable to conclusively categorize the cancer of the subject 102.
- the predictive model 120 may determine that, based on the features 118, the probabilities that the cancer of the subject 102 is within predetermined prognostic classifications are all below a threshold probability.
- the predictive model 120 may output an indication that that the categorization of the cancer is inconclusive.
- a report generator 124 is configured to generate a report 126 based, at least in part, on the prognostic classification 122.
- the report 126 for example, includes consumable data that can inform the care provider 105 about the prognostic classification of the subject 102.
- the report 126 may indicate the results of additional analyses, such as the results of a histological study, whole transcriptome sequencing, cfRNA sequencing, whole exome sequencing, whole genome sequencing, a cancer (e.g., DNA) hotspot panel test, a DNA methylation test, a tumor mutational burden (TMB) test, a DNA fragmentation test, an RNA fragmentation test, a microsatellite instability (MSI) test, a tumor mutational burden (TMB) test, or a viral status test.
- TMB tumor mutational burden
- MSI microsatellite instability
- TMB tumor mutational burden
- the report 126 may include a genomic profile of the subject 102 based on various combinations of the above analyses and tests.
- the report 126 indicates that a follow-up test of the subject 102 is indicated.
- the report generator 124 may generate the report 126 to indicate that one or more additional tests (e.g., a histological study, genome sequencing, exome sequencing, additional DNA sequencing, RNA sequencing, transcriptome sequencing, etc.) should be performed in order to identify the cancer of the subject 102.
- the report 126 is output to a clinical device 128.
- the report generator 124 transmits the report 126 to the clinical device 128.
- the clinical device 128 is a computing device that is operated by, owned by, or otherwise associated with the care provider 105.
- the clinical device 128 may be a desktop computer, a laptop computer, a smart phone, or some other computing device associated with the care provider 105.
- the clinical device 128, in various cases, outputs the report 126 to the care provider 105.
- the clinical device 128 includes a display (e.g., a screen) that visually presents the report 126.
- the clinical device 128 includes a speaker that outputs a sound indicative of the report 126.
- the clinical device 128, in various cases, may output the information in the report 126 using one or more output mechanisms or devices.
- the care provider 105 may review the report 126 by interacting with the clinical device 128.
- the report 126 in various cases, may enhance the clinical decision-making of the care provider 105.
- the care provider 105 may prepare and/or administer a therapy to the subject 102 based on the report 126.
- the care provider 105 may initiate the therapy and/or refer the subject 102 to another care provider to receive the therapy.
- the care provider 105 may develop a diagnosis and/or prognosis of the subject 102 based on the report 126. In various implementations, the care provider 105 may communicate information in the report 126 to the subject 102.
- FIG. 1 illustrates various elements that can be embodied in one or more computing devices.
- the sequencer 112 the feature selector 116, the predictive model 120, the report generator 124, the clinical device 128, or any combination thereof, is performed by one or more processors in at least one computing device.
- Examples of computing devices include server computers, desktop computers, laptop computers, tablet computers, mobile phones, wearable devices, Internet of Things (loT) devices, and the like.
- instructions for performing at least a portion of the functions of these elements are stored in memory and/or in a non-transitory computer readable medium. The instructions, for instance, are executed by the processor(s).
- FIG. 1 also illustrates various types of data.
- the sequence read data 114, the features 118, the prognostic classification 122, the report 126, or any combination thereof includes data.
- the various types of data illustrated in FIG. 1 may be stored, such as in memory or in non-transitory computer readable media.
- at least a portion of the data is transmitted or otherwise output by one or more computing devices.
- a computing device may transmit one or more communication signals to another computing device, wherein the communication signal(s) encode at least a portion of the data.
- Examples of communication signals include electromagnetic signals, optical signals, ultrasonic signals, optical signals, and electrical signals.
- communication signals can be transmitted wirelessly and/or in a wired fashion.
- the communication signals are transmitted over one or more wireless channels and/or one or more wired channels (e.g., optical cabling, electrical cabling, etc.).
- the communication signal(s) are transmitted over one or more communication networks.
- a communication network for instance, may be defined according to one or more physical channels, such as one or more frequency spectra.
- a communication network is defined according to one or more communication protocols and/or standards.
- Examples of communication networks include fiber optic networks, Institute of Electrical and Electronics Engineers (IEEE) networks (e.g., WI-FITM networks, WiMAX networks, BLUETOOTHTM networks, etc.), cellular networks (e.g., a 3 rd Generation Partnership Project (3GPP) radio network, such as a Long Term Evolution (LTE) network, a New Radio (NR) network; or a cellular core network such as a 3 rd Generation (3G) core, a 4 th Generation (4G) core, a 5 th Generation (5G) core, etc.), ultrasonic networks, and the like.
- 3GPP 3 rd Generation Partnership Project
- LTE Long Term Evolution
- NR New Radio
- a cellular core network such as a 3 rd Generation (3G) core, a 4 th Generation (4G) core, a 5 th Generation (5G) core, etc.
- ultrasonic networks and the like.
- the data is broadcasted from one device to multiple other devices.
- Endometrial cancer is classified into four potential prognostic classifications, as defined by The Cancer Genome Atlas (TCGA) Consortium (see, e.g., Levine et al., Nature 497, 67-73 (2012)).
- TCGA Cancer Genome Atlas
- Each of the four prognostic classifications may be associated with associated effective therapies, likelihood of recurrence, survivability, and other prognostic characteristics.
- the prognostic classifications include (1) a first group defined by POLE exonuclease deficiency, resulting in ultramutation-level TMB (also referred to as “POLE ultra-mutated”); (2) a second group defined by MMR deficiency, with high MSI fraction (e.g., an MSI that is above a threshold fraction), resulting in numerous characteristic mutations (also referred to as ''microsatellite instability hypermutated”; (3) a third group defined by a lack of extensive copy number alterations (also referred to as “copy number low”); and (4) a fourth group defined by a substantial number of copy number alterations, associated TP53 mutations, and, frequently, serous-like histological features.
- cancers associated with the first and second groups are responsive to known treatments, such as chemotherapies.
- Cancers associated with the fourth group are often recurrent, resistant to known treatments, and have a high likelihood of metastasis.
- the third group is a “catch-all” group that includes cancers that are not clearly delineated into the first group, the second group, and the fourth group. Due to the disparate risks for morbidity and mortality associated with different types of endometrial cancer, it is highly clinically relevant to determine a patient’s prognostic classification quickly after an initial diagnosis of endometrial cancer.
- the subject 102 presents to a clinic with the lesion 104, which is an endometrial tumor.
- the care provider 105 e.g., an oncologist
- the care provider 105 would order a tissue biopsy of the lesion 104.
- the care provider 105 would provide a sample of the lesion 104 to a laboratory for whole-genome testing, and also to a pathologist who would perform a histological study on the sample.
- the pathologist could examine the sample in order to determine whether cells in the lesion had the serous-like histological features associated with the fourth group.
- the pathologist could be located remotely from the clinic in which the care provider 105 works. Due to the chemical processes used to perform histological studies, backlogs, transport of the sample, tissue biopsy scheduling, and other sources of delays, it could take the subject 102 and the care provider 105 weeks to determine which prognostic classification was appropriate for the lesion 104.
- the care provider 105 extracts the sample 106 from the subject 102 and provides it to the sequencer 112 for sequencing.
- the sample 106 is a smaller and/or less invasive sample than would be necessary to perform a full histological analysis.
- the feature selector 116 analyzes the sequence read data 114 in order to identify one or more features 118 that are pertinent to classifying the lesion 104 by the predictive model 120.
- the predictive model 120 is configured to make a highly accurate prognostic classification using previously unidentified techniques.
- Some of the features 118 may include the genetic features described, for instance, in Levine et al., Nature 497, 67-73 (2012). However, some of the features 118 may include characteristics of the sequences of the nucleic acid molecules 108 that are not identified in previous publications. For instance, a particular MMRD probability score or copy number state may be highly relevant to whether the lesion 104 is associated with one of the specific prognostic classifications of endometrial cancer that are described above.
- the features 118 may be relevant to additional prognostic classifications that have not been previously reported.
- the third “catch-all” group of endometrial cancer could be representative of multiple prognostic classifications that have not yet been identified.
- the predictive model 120 is able to differentiate between distinct prognostic groups that would have been classified in the single, third group using previous techniques.
- the feature selector 116 specifically identifies the MMRD probability score and copy number state of the lesion 104 by analyzing the sequence read data 114.
- the MMRD probability score and copy number state (as well as, optionally, other features 118) are input into the predictive model 120.
- the predictive model 120 outputs four metrics, indicating likelihoods that the lesion 104 is associated with the respective prognostic classifications defined by TCGA.
- the predictive model 120 may determine that there is a 5% likelihood that the lesion 104 is associated with the first group, a 7% likelihood that the lesion 104 is associated with the second group, a 98% likelihood that the lesion 104 is associated with the third group, and a 2% likelihood that the lesion 104 is associated with the fourth group.
- the predictive model 120 indicates the likelihoods of the groups in the prognostic classification 122.
- the predictive model 120 indicates, in the prognostic classification, that the lesion 104 is predicted to be associated with the third group, upon determining that the likelihood of the third group is greater than the likelihoods of the other groups and/or determining that the likelihood of the third group is above a predetermined threshold (e.g., 95%).
- the predictive model 120 is able to differentiate the third group in to two sub-classifications.
- the predictive model 120 may include a clustering model that was trained in an unsupervised fashion to identify predictive attributes associated with the two sub-classifications, which may be referred to as group 3A and group 3B.
- group 3A and group 3B predictive attributes associated with the two sub-classifications.
- cancers associated with group 3A are highly responsive to a predetermined immunotherapy, whereas cancers associated with group 3B are highly resistant to the predetermined immunotherapy.
- the predictive model 120 may further input the same features 118, or possibly different features 1 18 into the clustering model. For instance, the predictive model 120 defines the features 118 in a feature space that includes two clusters: one cluster associated with group 3A and another cluster associated with group 3B. In particular, the predictive model 120 determines that the features 118 are within a threshold distance of a center of the cluster associated with group 3A. Therefore, the predictive model 120 may output, in the prognostic classification 122, an indication that the lesion 104 is predicted to be associated with group 3A.
- a threshold e.g., 95% likelihood of being associated with the third group as defined by TCGA
- the care provider 105 reviews the report 126 indicating the prognostic classification 122.
- the care provider 105 may be able to inform the subject 102 of their prognosis in view of the predicted prognostic classification 122.
- the care provider 105 may recommend, prepare, administer, or any combination thereof, the immunotherapy to the subject 102 in view of the prediction that the lesion 104 is associated with group 3A.
- FIG. 2 illustrates an example environment 200 for training and utilizing a predictive model 202 to determine a prognostic classification of a cancer.
- the predictive model 202 includes the predictive model 120 described above with reference to FIG. 1.
- the predictive model 202 includes a classifier 204, which may include one or more ML models.
- a trainer 206 is configured to optimize various parameters 208 of the classifier 204 based on training data 210.
- the training data 210 includes example features 212 and example prognostic classifications 214.
- the example features 212 in various cases, are obtained based on nucleic acids obtained from individuals within a population 216.
- the example prognostic classifications 214 may include categorizations of pathologies (e.g., cancers) experienced by the individuals within the population 216.
- the example prognostic classifications 214 may be generated based on a combination of genetic analysis and immunohistochemistry studies.
- the classifier 204 include one or more model types.
- the classifier 204 include an artificial neural network.
- An artificial neural network includes various layers that respectively process input data.
- an artificial neural network includes an input layer, one or more hidden layers, and an output layer.
- the input layer performs a pre-processing operation on the input data.
- the hidden layer(s) may perform various processing operations on the output from the input layer.
- the output layer processes the output from the hidden layer(s).
- Each layer in some cases, includes one or more nodes, which are defined by individual operations.
- the hidden layer(s) include nodes that are connected to each other in parallel and/or series.
- the operations performed by the layers and/or nodes within an artificial neural network included in the classifier 204 is defined according to the parameters 208.
- the parameters 208 may include weights, thresholds, filters, kernels, or other data objects that are utilized to perform operations of the classifier 204.
- the classifier 204 include a nearest-neighbor model.
- a nearest- neighbor model includes a k-nearest neighbor model.
- a nearest-neighbor model defines various "neighbors,” which are points within a feature space, with associated class labels.
- the new data point When a new data point is mapped to the feature space, the new data point is classified based on the proximity (e.g., Euclidian distance, Manhattan distance, Minkowski distance, etc.) of its “neighbors” to the new data point as well as their associated classes. In some cases, the new data point is classified as belonging to a particular class if greater than a threshold number of neighbors within a threshold distance of the new data point are members of the class. For instance, the parameters 208 may include k (e.g., the number of neighbors compared to the new data point), the threshold distance, and so on.
- the parameters 208 may include k (e.g., the number of neighbors compared to the new data point), the threshold distance, and so on.
- the classifier 204 include a regression analysis model.
- the regression analysis model for example, is defined by a regression function that defines relationships between one or more independent variables and one or more dependent variables.
- the regression function may further define one or more unknown parameters that define a relationship between the independent and dependent variables.
- the unknown parameters and/or the type of regression function e.g., linear, quadratic, etc.
- the classifier 204 include a clustering model.
- a clustering model maps various data points (e.g., training data) to a feature space. Based on the proximity of groups of those data points in the features pace, one or more “clusters” are defined. An additional data point may be classified according to one or more of the clusters based on its proximity to the clusters (e.g., a center of the clusters, a boundary of the cluster, etc.). Examples of clustering models include k-means clustering, mean-shift clustering, expectation-maximization (EM) clustering, and agglomerative hierarchical clustering.
- the parameter(s) 208 for example, include a threshold proximity within which a new data point is classified within a cluster, a density of points used to define a cluster, and the like.
- the classifier 204 include a principal component analysis model.
- a principal component analysis defines a collection principal components of unit vectors within a coordinate space based on a data set (e.g., training data).
- the model for example, is an orthogonal linear transformation of the data set.
- Various weights of the model for example, are included in the parameter(s) 208.
- the classifier 204 includes a gradient boosting model.
- the gradient boosting model is defined as a collection of prediction models (e.g., decision trees) that iteratively classify observed data.
- the type of prediction model, weights in the prediction models, and the like, are defined by the parameter(s) 208.
- the classifier 204 includes a random forest.
- the random forest for instance, includes multiple decision trees that classify data in an ensemble fashion.
- the decision trees are defined by the parameter(s) 208.
- the trainer 206 is configured to optimize the parameters 208 based on the training data 210.
- the trainer 206 may input first example features (corresponding to a first individual among the population 216) among the example features 212 into the predictive model 202, and may receive a predicted category.
- the trainer 206 may compute a loss (e.g., determine a discrepancy) between a first example category (corresponding to the first individual) among the example prognostic classifications 214 and the predicted category. Further, the trainer 206 may alter the parameters 208 in order to minimize the loss.
- the trainer 206 optimizes the parameters 208 iteratively based on the entire set of the training data 210.
- the optimization of the parameters 208 enables the predictive model 202 to identify predictive attributes of the features 212 that are correlated to or otherwise associated with the example prognostic classifications 214.
- the predictive model 202 may determine that a particular copy number state represented in the example features 212 is highly correlated with a copy number high prognostic classification of endometrial cancer.
- the predictive model 202 may therefore classify cancers based on features outside of the example features 212 by recognizing or otherwise identifying the predictive attributes.
- the predictive model 202 may be ready to classify a new set of data.
- the predictive model 202 may receive input data including features 218 of a subject.
- the features 218, for instance, may include one or more of the predictive attributes.
- the predictive model 202 may perform various operations on the input data based on the trained classifier 204 and the optimized parameters 208.
- the predictive model 202 outputs output data including one or more category indicators 220 based on the features 218.
- the category indicator(s) 220 for instance, include one or more predicted categories of a cancer experienced by the subject.
- FIG. 2 is primarily described as referring to supervised learning, implementations are not so limited.
- the training data 210 omits the example prognostic classifications 214 and the trainer 206 is configured to optimize the parameters 208 using the example features 212 and an unsupervised learning technique.
- FIG. 3 illustrates an example of training data 300 utilized to train one or more ML models.
- the training data 300 may be the training data 210 described above with reference to FIG. 2.
- the training data 300 may represent m samples, wherein m is a positive integer.
- the m samples are respectively obtained from m individuals within a population, although implementations are not so limited.
- multiple samples may be obtained from the same individual at different times.
- the training data 300 includes first to mth example features 302-1 to 302-m.
- the first to mth example features 302-1 to 302-m include features derived from nucleic acid molecules in the respective m samples.
- the first to mth example features 302-1 to 302-m include fragmentomic features.
- the training data 300 may further include first to mth example categories 304-1 to 304-m.
- the first to mth example categories 304-1 to 304-m for instance, include prognostic classifications of cancers represented by the m samples.
- FIG. 4 illustrates an example report 400 summarizing a predicted classification of a cancer of a subject.
- the report 400 is the report 126 described above with reference to FIG. 1 .
- the report 400 may be displayed to a patient and/or care provider.
- the report 400 is generated based on features of a sample (e.g., a liquid biopsy sample, tissue sample, etc.) obtained from the subject.
- the report 400 includes a prognostic classification 402 of the cancer.
- the prognostic classification 402 for instance, is indicative of a prognosis of the cancer.
- the prognostic classification 402 indicates a categorization of the cancer associated with a prognosis of the subject.
- the prognostic classification 402 may indicate a survivability, a recoverability, a quality of life indicator, or other information indicative of the prognosis of the subject.
- the report 400 includes one or more therapy indicators 408.
- the therapy indicator(s) 408 convey whether the cancer is predicted to be resistant to one or more predetermined therapies and/or whether the cancer is predicted to be responsive to one or more predetermined therapies.
- the report 400 may include a trial qualification 412 of the subject.
- the trial qualification 412 indicates whether the subject is predicted to qualify for a predetermined clinical trial.
- the report 400 includes a metastasis profile 414 of the subject.
- the metastasis profile 414 indicates a likelihood that the cancer will metastasize (e.g., at a particular point in time), one or more tissues in which the cancer is predicted to metastasize, or the like.
- the report 400 includes recommended follow-up tests 416.
- the report 400 may include a recommendation to perform an additional analysis on the subject (e g., a specialized immunohistological study), particularly in cases if the cancer cannot be categorized above a threshold certainty.
- the report 400 may include a genomic profile 418 of the subject.
- the genomic profile 418 includes or is generated based on the results of a genomic analyses of the subject.
- FIG. 5 illustrates an example process 500 for determining a prognostic classification.
- the process 500 is performed by an entity, such as a computer, at least one processor, the sequencer 112, the feature selector 116, the predictive model 120, the report generator 124, the clinical device 128, the predictive model 202, the classifier 204, the device(s) 800, or any combination thereof.
- entity such as a computer, at least one processor, the sequencer 112, the feature selector 116, the predictive model 120, the report generator 124, the clinical device 128, the predictive model 202, the classifier 204, the device(s) 800, or any combination thereof.
- the entity identifies data indicative of sequences of nucleic acid molecules derived from a subject.
- the data includes sequence read data associated with the nucleic acid molecules.
- the nucleic acid molecules may include RNA and/or DNA in a sample obtained from the subject.
- the subject has a lesion.
- the subject has cancer, such as endometrial cancer.
- the lesion is an endometrial tumor.
- the sample is obtained from the lesion
- the sample includes a liquid biopsy sample and/or a tissue biopsy sample.
- the entity identifies the data by sequencing the nucleic acid molecules.
- the entity receives the data from a sequencer that sequences the nucleic acid molecules.
- the data is indicative of a full genome of the sample, an RNA transcriptome of the sample, a whole exome of the sample, or a predetermined panel of genes of the sample.
- the entity identifies features based on the data.
- the features include an MMRD probability score of the sample.
- the MMRD probability score may indicate a likelihood that one or more MMR genes in a genome of the sample are nonfunctional or otherwise deficient.
- the MMRD probability score is indicative of at least one pathogenic variant in at least one MMR gene, a methylation status of the at least one MMR gene, or a methylation status of at least one promoter associated with the at least one MMR gene.
- the MMRD probability score is indicative of a functional deficiency in at least one MMR gene, which may be related based on the presence of one or more variants in the MMR gene(s).
- the MMRD probability score is generated using a predictive model (e.g., at least one ML model).
- the entity may determine characteristics of the sequence read data that are associated with MMRD (which may include, for instance, the presence of one or more variants and/or a methylation status of an MMR gene, a promoter associated with an MMR gene, or an enhancer associated with an MMR gene), input the characteristics into the predictive model, and receive the MMRD probability score as an output of the predictive model
- the features include a copy number state of the sample.
- the copy number state is of at least one genetic locus of the sample.
- a copy number refers to a number of copies of a sequence present at a given genetic locus of the sample.
- the copy number state in various cases, indicates an amount and/or type of one or more copy number respectively associated with one or more genetic loci of the sample.
- the copy number state is generated by determining a minor allele coverage ratio and a major allele coverage ratio for multiple genetic loci indicated in the data.
- the data is representative of a genome of the sample.
- the genome indicated by the data is divided into genomic segments, such as based on the minor allele coverage ratio, the major allele coverage ratio, or a total coverage ratio.
- Input data for at least one model e.g., at least one copy number grid model
- the features include the presence of a pathogenic variant in one or more genes.
- the genes may include at least one of POLE, TP53, CTNNB1 , L1CAM, PTEN, at least one ER gene, or at least one PR gene.
- the features include a fraction unstable score.
- the features include non-genetic biomarker data.
- the features may include an image of the sample, such as a histological image of the sample.
- the image represents a photograph of a stained portion of the sample, which may be stained hematoxylin and eosin (H&E) and/or at least one immunostain.
- H&E hematoxylin and eosin
- the entity determines a prognostic classification of the subject based on the features.
- the entity generates input data based on the features.
- the entity for instance, provides the input data to a predictive model.
- the predictive model includes a classifier.
- the predictive model includes at least one ML model.
- the predictive model in various cases, identifies the presence of one or more predictive attributes in the features. Based on the predictive attributes, the predictive model may predict the prognostic classification of the subject.
- the entity identifies data indicative of sequences of nucleic acid molecules derived from individuals in a population.
- the data includes sequence read data associated with the nucleic acid molecules.
- the nucleic acid molecules may include RNA and/or DNA in samples respectively obtained from the individuals in the population.
- at least one of the individuals has a predetermined pathology, such as cancer. In some cases, at least one of the individuals does not have the predetermined pathology.
- the population represents individuals corresponding to each prognostic classification among a group of predetermined prognostic classifications.
- the entity identifies the data by sequencing the nucleic acid molecules. In some cases, the entity receives the data from a sequencer that sequences the nucleic acid molecules. In some cases, the data is indicative of genomes of the samples, transcriptomes of the samples, exomes of the samples, or a predetermined panel of genes of the samples.
- the data includes one or more features that are relevant to prognostic classification.
- the data indicates MMRD probability scores of the samples; copy number states of nucleic acid molecules of the samples; pathogenic variants in one or more genes of the samples; fraction unstable scores of the samples; mutation signatures of the samples; TMB scores of the samples; presences of one or more hotspot mutations; presences of one or more aneuploid events; or any combination thereof.
- the data includes non-genetic biomarker data of the individuals in the population, such as histological and/or immunohistological images of the samples.
- the prognostic classifications can be determined using the data and/or additional analyses.
- the data may omit results of histological and/or immunohistological analyses of the samples, but the prognostic classifications may be determined using the results of the histological and/or immunohistological analyses.
- the prognostic classifications are determined based on outcomes of the subjects, such as after a predetermined time after the samples from the individuals were obtained.
- the prognostic classifications may be determined based on a determined effectiveness of at least one therapy administered to the individuals, a survivability (e.g., one year after diagnosis, ten years after diagnosis, or some other predetermined time period after diagnosis), a quality of life of the individuals, or any other information relevant to their prognoses.
- a survivability e.g., one year after diagnosis, ten years after diagnosis, or some other predetermined time period after diagnosis
- a quality of life of the individuals e.g., or any other information relevant to their prognoses.
- the entity trains a predictive model using the data and the prognostic classifications.
- the predictive model includes a classifier.
- the predictive model includes at least one ML model.
- the predictive model is defined according to various parameters.
- the predictive model uses the parameters to transform the input data into output data.
- the predictive model is trained by optimizing the parameters. For example, input data including the data identified at 602 is provided to the predictive model.
- the predictive model generates output data based on the input data.
- the output data includes, for instance, predicted prognostic classifications of the subject.
- the output data is compared to ground truth data, such as the prognostic classifications identified at 604.
- FIG. 7 illustrates an example environment 700 for sequencing various nucleic acid molecules 702.
- the nucleic acid molecules 702 include cfDNA, gDNA, cDNA, or any other type of DNA that is derived from a subject.
- the nucleic acid molecules 702, in various cases, are extracted from a sample, such as a biological sample obtained from a subject.
- the nucleic acid molecules 702 include DNA that is complementary to RNA present in the sample.
- the nucleic acid molecules 702, in various cases, are ligated with adapters 704.
- the adapters 704 are hybridized to the nucleic acid molecules 702.
- the adapters 704, for example, include additional nucleic acid molecules.
- the adapters 704 have a shorter length than the nucleic acid molecules 702 being sequenced.
- the adapters 704 include amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences.
- FIG. 7 illustrates adapters 704 being ligated to one end of each of the nucleic acid molecules 702, implementations are not so limited.
- the adapters 704 may be ligated to both ends of each of the nucleic acid molecules 702.
- the nucleic acid molecules 702 ligated with the adapters 704 are amplified in order to generate amplified molecules 706.
- Various amplification techniques can be performed.
- the amplified molecules 706 are generated using PCR, a non-PCR amplification technique, an isothermal amplification technique, or any combination thereof.
- Amplified molecules 706 may be captured by bait molecules 710 and sequenced.
- the amplified molecules 706 are sequenced via sequencing-by-synthesis.
- fluorescently tagged deoxyribonucleotide triphosphates (dNTP) 712 are utilized to synthesize a strand that is complementary to DNA strands bound to the substrate 708.
- dNTP 712 is added to the strand (e.g., by an enzyme)
- the dNTP 712 emits an optical signal 714.
- the frequency of the optical signal 714 is dependent on the type of dNTP 712 from which the optical signal 714 is emitted.
- the individual bases within the amplified molecule 706 will block the nanopore 716, which may decrease the amount of charged solutes traveling through the nanopore 716 and consequently, the magnitude of the electrical signal detected by the electrodes 720.
- Each of the four types of bases within the amplified molecules 706, may block the nanopore 716 to a different extent. Therefore, the sequence of the nucleic acid molecules 702 can be derived by analyzing the measured electrical signal with respect to time as the amplified molecules 706 are directed through the nanopore 716.
- FIG. 8 illustrates one or more devices 800 configured to perform various operations described herein.
- the device(s) 800 include one or more processor(s) 802.
- the processor(s) 802 includes a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GPU, or other processing unit or component known in the art.
- CPU central processing unit
- GPU graphics processing unit
- both CPU and GPU or other processing unit or component known in the art.
- the processor(s) 802 is operably connected to memory 804.
- the memory 804 is volatile (such as random access memory (RAM)), non-volatile (such as read only memory (ROM), flash memory, etc.) or some combination of the two.
- the memory 804 stores instructions that, when executed by the processor(s) 802, causes the processor(s) 802 to perform various operations.
- the memory 804 stores methods, threads, processes, applications, objects, modules, any other sort of executable instruction, or a combination thereof.
- the memory 804 stores files, databases, or a combination thereof.
- the memory 804 includes, but is not limited to, RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory, or any other memory technology.
- the memory 804 includes one or more of CD-ROMs, digital versatile discs (DVDs), content-addressable memory (CAM), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the processor(s) 802.
- the memory 804 stores instructions that, when executed by the processor(s) 802, causes the processor(s) 802 to perform operations of the feature selector 116, the predictive model 120, and the report generator 124.
- the processor(s) 802 causes a display among the input device(s) 806 to visually output various data described herein.
- the input device(s) 806 includes one or more touch sensors
- the output device(s) 808 includes a display screen
- the touch sensor(s) are integrated with the display screen.
- the processor(s) 802 is operably connected to one or more transceivers 810 that transmit and/or receive data over one or more communication networks 812.
- the transceiver(s) 810 includes a network interface card (NIC), a network adapter, a local area network (LAN) adapter, or a physical, virtual, or logical address to connect to the various external devices and/or systems.
- the transceiver(s) 810 includes any sort of wireless transceivers capable of engaging in wireless communication (e.g., radio frequency (RF) communication).
- RF radio frequency
- the communication network(s) 812 includes one or more wireless networks that include a 3rd Generation Partnership Project (3GPP) network, such as a Long Term Evolution (LTE) radio access network (RAN) (e.g., over one or more LTE bands), a New Radio (NR) RAN (e.g., over one or more NR bands), or a combination thereof.
- 3GPP 3rd Generation Partnership Project
- LTE Long Term Evolution
- NR New Radio
- the transceiver(s) 810 includes other wireless modems, such as a modem for engaging in WIFI®, WIGIG®, WIMAX®, BLUETOOTH®, or infrared communication over the communication network(s) 812.
- the device(s) 800 may further include the sequencer 112.
- the sequencer 112 includes one or more fluidic circuits 814 configured to receive a sample 816 derived from a subject 817.
- the sequencer 112 in various cases, may be configured to generate data indicative of one or more sequences of nucleic acid molecules (e.g., DNA and/or RNA) present in the sample 816.
- the sequencer 112 introduces one or more reagents 818 to the fluidic circuit(s) 814 in order to prepare for and perform sequencing of the nucleic acid molecules.
- sequencer 112 may include one or more sensors 820 configured to measure or otherwise detect detection signals from the fluidic circuit(s) 814, which may be indicative of the sequences of the nucleic acid molecules.
- the sensor(s) 820 may further include one or more ADCs.
- the sequencer 112 in various cases, outputs sequence read data to the processor(s) 802 for additional processing.
- a method for classifying cancer including: providing a plurality of nucleic acid molecules obtained from a sample from a subject; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, all or a subset of the captured amplified nucleic acid molecules to obtain a plurality of sequence reads that represent the sequenced amplified nucleic acid molecules thereby generating sequence read data representing a genome of the sample; receiving, at one or more processors, sequence read data for the plurality of sequence reads; determining, using the one or more processors, features of the sample based on the plurality of sequence reads, the features including: a mismatch repair deficiency (MMRD) probability score of the sample, the MMRD probability score being indicative of at least one
- MMRD
- the sample is obtained from an endometrial tumor of the subject, and/or wherein the features further include at least one of: a presence of a pathogenic variant in one or more of polymerase E (POLE), TP53, CTNNNB1, L1CAM, PTEN, an estrogen receptor (ER) gene, or a progesterone receptor (PR) gene; a fraction unstable score; a mutation signature; a tumor mutational burden (TMB) score; a presence of one or more hotspot mutations; or a presence of one or more aneuploid events.
- POLE polymerase E
- TP53 TP53
- CTNNNB1, L1CAM L1CAM
- PTEN an estrogen receptor
- PR progesterone receptor
- the method further including: receiving, by the one or more processors, training data including training input data and training output data, the training input data including population features of endometrial tumors of a population omitting the subject, the training output data including prognostic classifications of the population; and optimizing, using the one or more processors, parameters of the ML model to generate the training output data in response to receiving the training input data, wherein inputting the input data into the ML model is performed after optimizing the parameters of the ML model.
- a method including: determining features of a sample from a subject, the features including one or more of: a MMRD probability score of the sample, the MMRD probability score being indicative of at least one of one or more pathogenic variants in at least one MMR gene, a methylation status of the at least one MMR gene, or a methylation status of one or more promotors associated with the at least one MMR gene; a copy number state of at least one genetic loci based on nucleic acid molecules of the sample; a presence of a pathogenic variant in one or more of POLE, TP53, CTNNB1 , L1CAM, PTEN, an ER gene, or a PR gene; a fraction unstable score; a mutation signature; a TMB score; a presence of one or more hotspot mutations; or a presence of one or more aneuploid events; generating input data indicating the features; and generating, based on the input data and a predictive model, a prognostic classification of the subject.
- the one or more adapters include amplification primers, flow cell adaptor sequences, substrate adapter sequences, or sample index sequences.
- the one or more bait molecules include one or more additional nucleic acid molecules, each of the one or more additional nucleic acid molecules including a region that is complementary to a region of a captured nucleic acid molecule.
- amplifying the one or more ligated nucleic acid molecules includes performing a polymerase chain reaction (PCR) amplification technique, a non-PCR amplification technique, or an isothermal amplification technique.
- PCR polymerase chain reaction
- sequencing the captured nucleic acid molecules includes use of a massively parallel sequencing (MPS) technique, whole genome sequencing (WGS), whole exome sequencing, targeted sequencing, direct sequencing, or Sanger sequencing.
- MPS massively parallel sequencing
- WGS whole genome sequencing
- S whole exome sequencing
- sequencing the captured nucleic acid molecules includes next generation sequencing (NGS).
- NGS next generation sequencing
- sequencing the captured nucleic acid molecules includes sequencing-by- synthesis or nanopore sequencing.
- generating, using the amplified ligated molecules, the detection signals includes: synthesizing, by a polymerase using fluorescently tagged nucleotide triphosphates (NTPs), a synthesized nucleic acid molecule that is complementary to one of the amplified ligated molecules, and wherein detecting, by the at least one sensor, the detection signals includes: detecting, by at least one optical sensor, optical signals emitted by the fluorescently tagged NTPs upon binding to the synthesized nucleic acid molecule, the optical signals being indicative of at least one sequence of the nucleic acid molecules of the sample.
- NTPs fluorescently tagged nucleotide triphosphates
- generating, using the amplified ligated molecules, the detection signals includes: directing the amplified ligated molecules through a nanopore extending from a first space to a second space through a substrate, and wherein detecting, by the at least one sensor, the detection signals includes: detecting, by sensors disposed in the first space and the second space, an electrical signal over time, the electrical signal being indicative of at least one sequence of the nucleic acid molecules of the sample.
- the predetermined panel includes one or more of TP53, PTEN, POLE, MKI67, FAT3, TAF1, ZFHX3, RPL22, SPTA1, FAM135B, CSMD3, GIGYF2, CSDE1, MLL4, ATR, CTNNB1, USH2A, LIMCH1, RRN3P2, FBXW7, CDH19, USP9X, COL11A1, BOOR, ARID1A, ZNF770, ARID5B, SLC9A11, KRAS, PNN, INPP4A, CTCF, CHD4, AMY2B, RBMX, PPP2R1A, TNFAIP6, PIK3R1, SGK1, HOXA7, METTL14, HPD, MIR1277, CCND1 , MECOM, NFE2L2, or ESR1.
- the predetermined panel includes one or more of ABL1 , ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BOOR, BCORL1, BCR, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD74, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDK
- RNA includes messenger RNA, microRNA, or non-coding RNA.
- determining the features of the sample includes: determining, based on the sequence read data, a mutational profile of the sample; inputting the mutational profile into a model, wherein the model is trained using training data related to a plurality of mutational signatures; predicting one or more mutational signatures of the plurality of mutational signatures associated with the sample based on an output of the model, wherein the output of the model is associated with a dimensionality value that is less than a number of the plurality of mutational signatures, and wherein the features include the one or more mutational signatures.
- determining the features of the sample includes: determining, based on the sequence read data, the MMRD probability score being indicative of a functional deficiency in at least one mismatch repair gene, wherein the features include the MMRD probability score.
- determining, based on the sequence read data, the copy number state includes: generating, based on the sequence read data, a major allele coverage ratio and a minor allele coverage ratio; segmenting one or more nucleic acid sequences associated with the sequence read data into segments; generating copy number grid model input data including: a sum of the major allele coverage ratio and the minor allele coverage ratio; and a difference of the major allele coverage ratio and the minor allele coverage ratio; fitting copy number grid models including allowed copy number states to the copy number grid model input data; selecting a copy number grid model among the copy number grid models; and assigning the copy number state for at least a portion of the one or more nucleic acid sequences based on the selected copy number grid model.
- prognostic classification is selected from POLE ultramutated, microsatellite instability hypermutated, copy number low, or copy number high, and/or wherein the prognostic classification is selected from: a first classification associated with pathogenic mutations in POLE; a second classification associated with mismatch repair deficiency; a third classification associated with mutations in TP53; or a fourth classification associated with an absence of the pathogenic mutations in POLE, an absence of mismatch repair deficiency, and an absence of the mutations in TP53.
- the predictive model includes at least one of a neural network, a random forest, a support vector machine, an agglomerative clustering model, a hierarchical clustering model, a multiclass classifier, or a binary classifier.
- the anticancer therapy includes at least one of chemotherapy, radiation therapy, immunotherapy, a targeted therapy, or surgery.
- outputting the report includes: transmitting data indicating the report to an external device.
- the therapy includes a dosage of one or more therapeutic agents predicted to treat a pathology of the subject.
- a method including: identifying training data including features of samples of a population of individuals, the features including one or more of: MMRD probability scores of the samples; copy number states of one or more genetic loci based on nucleic acid molecules of the samples; generating input data indicating the features; pathogenic variants in one or more of POLE, TP53, CTNNNB1 , L1CAM, PTEN, an ER gene, or a PR gene of the samples; fraction unstable scores of the samples; mutation signatures of the samples; TMB scores of the samples; presences of one or more hotspot mutations; or presences of one or more aneuploid events; and training a predictive model to identify prognostic classifications in additional patient data by optimizing parameters of the predictive model based on the training data.
- the training data includes sequence read data of the samples, the sequence read data indicating at least one of full genomes of the samples, RNA transcriptomes of the samples, whole exomes of the samples, or a predetermined panel of genes of the samples.
- the predetermined panel includes one or more of TP53, PTEN, POLE, MKI67, FAT3, TAF1 , ZFHX3, RPL22, SPTA1 , FAM135B, CSMD3, GIGYF2, CSDE1 , MLL4, ATR, CTNNB1 , USH2A, LIMCH1 , RRN3P2, FBXW7, CDH19, USP9X, COL11A1 , BOOR, ARID1A, ZNF770, ARID5B, SLC9A11 , KRAS, PNN, INPP4A, CTCF, CHD4, AMY2B, RBMX, PPP2R1A, TNFAIP6, PIK3R1 , SGK1 , HOXA7, METTL14, HPD, MIR1277, CCND1 , MECOM, NFE2L2, or ESR1.
- the predictive model includes at least one of a neural network, a random forest, a support vector machine, an agglomerative clustering model, a hierarchical clustering model, a multiclass classifier, or a binary classifier.
- the training data includes training input data and training output data, the training input data including the features, the training output data including prognostic classifications of the samples.
- optimizing parameters of the predictive model based on the training data includes: identifying generated output data by inputting the training input data into the predictive model; and minimizing a loss between the generated output data and the training output data by modifying the parameters.
- optimizing parameters of the predictive model based on the training data includes: identifying clusters of the features of the samples; and defining the prognostic classifications based on the clusters.
- a system including: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations including: identifying features of a sample obtained from a subject, the features including one or more of: a MMRD probability score of the sample, the MMRD probability score being indicative of at least one of one or more pathogenic variants in at least one MMR gene, a methylation status of the at least one MMR gene, or a methylation status of one or more promotors associated with the at least one MMR gene; a copy number state of one or more genetic loci based on at least one nucleic acid molecule of the sample; generating input data indicating the features; a presence of a pathogenic variant in one or more of POLE, TP53, CTNNNB1, L1CAM, PTEN, an ER gene, or a PR gene; a fraction unstable score; a mutation signature; a TMB score; a presence of one or more hotspot mutations; or a presence of one or
- the predetermined panel includes one or more of TP53, PTEN, POLE, MKI67, FAT3, TAF1 , ZFHX3, RPL22, SPTA1 , FAM135B, CSMD3, GIGYF2, CSDE1 , MLL4, ATR, CTNNB1 , USH2A, LIMCH1 , RRN3P2, FBXW7, CDH19, USP9X, COL11A1 , BOOR, ARID1A, ZNF770, ARID5B, SLC9A11 , KRAS, PNN, INPP4A, CTCF, CHD4, AMY2B, RBMX, PPP2R1A, TNFAIP6, PIK3R1, SGK1 , HOXA7, METTL14, HPD, MIR1277, CCND1 , MECOM, NFE2L2, or ESR1.
- identifying the features of the sample includes: determining, based on the sequence read data, a mutational profile of the sample; inputting the mutational profile into a model, wherein the model is trained using training data related to a plurality of mutational signatures; predicting one or more mutational signatures of the plurality of mutational signatures associated with the sample based on an output of the model, wherein the output of the model is associated with a dimensionality value that is less than a number of the plurality of mutational signatures, and wherein the features include the one or more mutational signatures. 88.
- identifying the features of the sample includes: determining, based on the sequence read data, the MMRD probability score being indicative of a deficiency in in at least one mismatch repair gene, wherein the features include the MMRD probability score.
- the predictive model being a first predictive model, wherein determining, based on the sequence read data, the MMRD probability score includes: generating, by extracting two or more additional features of the sequence read data, additional input data; and inputting the additional input data into a second predictive model, the second predictive model being configured to generate the MMRD probability score based on the additional input data.
- identifying the features of the sample includes: determining, based on the sequence read data, the copy number state, and wherein the features include the copy number state.
- determining, based on the sequence read data, the copy number state includes: generating, based on the sequence read data, a major allele coverage ratio and a minor allele coverage ratio; segmenting one or more nucleic acid sequences associated with the sequence read data into segments; generating copy number grid model input data including: a sum of the major allele coverage ratio and the minor allele coverage ratio; and a difference of the major allele coverage ratio and the minor allele coverage ratio; fitting copy number grid models including allowed copy number states to the copy number grid model input data; selecting a copy number grid model among the copy number grid models; and assigning the copy number state for at least a portion of the one or more nucleic acid sequences based on the selected copy number grid model.
- prognostic classification is selected from POLE ultramutated, microsatellite instability hypermutated, copy number low, or copy number high.
- prognostic classification is selected from: a first classification associated with pathogenic mutations in POLE; a second classification associated with evidence of mismatch repair deficiency; a third classification associated with mutations in TP53; or a fourth classification associated with an absence of the pathogenic mutations in POLE, an absence of mismatch repair deficiency, and an absence of the mutations in TP53.
- the predictive model includes at least one of a neural network, a random forest, a support vector machine, an agglomerative clustering model, a hierarchical clustering model, a multiclass classifier, or a binary classifier.
- the genomic profile includes results from at least one of: a comprehensive genomic profiling test; a gene expression profiling test; a cancer hotspot panel test; a DNA methylation test; an RNA profiling test; a DNA fragmentation test; or an RNA fragmentation.
- transceiver is configured to transmit the communication signal to an external device associated with a subject associated with the sample or a healthcare provider.
- sequencer further includes: fluorescently tagged NTPs; and a polymerase configured to generate, using the fluorescently tagged NTPs, a synthesized nucleic acid molecule that is complementary to amplified ligated molecules, the amplified ligated molecules being based on the nucleic acid molecules of the sample, and wherein the detection signals are emitted by the fluorescently tagged NTPs when they are added, by the polymerase, to the synthesized nucleic acid molecule.
- a non-transitory computer-readable medium storing instructions for performing a method including: identifying features of a sample obtained from a subject, the features including one or more of: a MMRD probability score of the sample, the MMRD probability score being indicative of at least one of one or more pathogenic variants in at least one MMR gene, a methylation status of the at least one MMR gene, or a methylation status of one or more promotors associated with the at least one MMR gene; a copy number state of one or more genetic loci based on at least one nucleic acid molecule of the sample; generating input data indicating the features; a presence of a pathogenic variant in one or more of POLE, TP53, CTNNNB1 , L1CAM, PTEN, an ER gene, or a PR gene; a fraction unstable score; a mutation signature; a mutational burden score; a presence of one or more hotspot mutations; or a presence of one or more aneuploid events; generating input data indicating the
- each implementation disclosed herein can comprise, consist essentially of or consist of its particular stated element, step, or component.
- the terms “include” or “including” should be interpreted to recite: “comprise, consist of, or consist essentially of.”
- the transition term “comprise” or “comprises” means has, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts.
- the transition phrase “consisting essentially of” limits the scope of the implementation to the specified elements, steps, ingredients or components and to those that do not materially affect the implementation.
- the term “based on” is equivalent to “based at least partly on,” unless otherwise specified.
- a viral status test refers to a test that identifies the presence of viral RNA or DNA in a subject.
- the test can identify viral load and/or viral identity.
- the viral status test can identify the presence of viral RNA or DNA associated with the occurrence of certain cancers.
- viruses include Hepatitis 8 Virus (HBV) and Hepatitis C Virus (HCV), Kaposi Sarcoma-Associated Herpesvirus (KSHV), Merkel Cell Polyomavirus (MCV), Human Papillomavirus (HPV), Human Immunodeficiency Virus Type 1 (HIV-1 , or HIV), Human T-Cell Lymphotropic Virus Type 1 (HTLV-1), and Epstein-Barr Virus (EBV).
- HBV Hepatitis 8 Virus
- HCV Hepatitis C Virus
- KSHV Kaposi Sarcoma-Associated Herpesvirus
- MCV Merkel Cell Polyomavirus
- HPV Human Papillomavirus
- HSV-1 Human Immunodefic
- An exemplary quantitative methylation detection assay combines bisulfite treatment and restriction analysis COBRA, which uses methylation sensitive restriction endonucleases, gel electrophoresis, and detection based on labeled hybridization probes. (Ziong and Laird, Nucleic Acid Res. 1997 25; 2532-4).
- Another exemplary detection assay is the methylation specific polymerase chain reaction PCR (MSPCR) for amplification of DNA segments of interest. This assay can be performed after sodium bisulfite conversion of cytosine and uses methylation sensitive probes.
- QM Quantitative Methylation
- MethyLightTM Qiagen, Redwood City, CA
- Ms- SNuPE a quantitative technique for determining differences in methylation levels in CpG sites.
- Ms-SNuPE also requires bisulfite treatment to be performed first, leading to the conversion of unmethylated cytosine to uracil while methyl cytosine is unaffected.
- PCR primers specific for bisulfite converted DNA are then used to amplify the target sequence of interest.
- the amplified PCR product is isolated and used to quantitate the methylation status of the CpG site of interest.
- pyrosequencing can be used to detect marker methylation.
- Pyrosequencing is a method of DNA sequencing that relies on detection of the release of pyrophosphates as DNA is synthesized (and is therefore a “sequencing by synthesis” technique).
- a DNA sample can be incubated with sodium bisulfite, converting unmethylated cytosine to uracil. The presence of uracil will result in thymine incorporation during PCR amplification. Therefore, sequencing results that include thymine at a nucleotide position that is known to encode cytosine can be interpreted as unmethylated sites.
- cytosines present in the sequencing results indicate that the site was methylated in the original DNA sample, because methylation protects cytosine from conversion to uracil upon treatment.
- Bisulfite treatment can also be performed on control samples with known methylation patterns, to reduce or eliminate false positive results.
- Commercially available pyrosequencing machines include Pyro Mark Q96 (Qiagen, Hilden, Germany). For more details on methods to use pyrosequencing for measurement of methylation, see Delaney et al. Methods Mol Biol. 2015 1343: 249-264. Pyrosequencing is especially useful for detecting methylation in the CpG sites within genes.
- a protein marker is detected by contacting a sample with reagents (e.g., antibodies), generating complexes of reagent and marker(s), and detecting the complexes.
- reagents e.g., antibodies
- detecting and measuring protein levels can use methods including agglutination, chemiluminescence, electro-chemiluminescence (ECL), enzyme-linked immunoassays (ELISA), immunoassay, immunoblotting, immunodiffusion, Immunoelectrophoresis, immunofluorescence, immunohistochemistry, immunoprecipitation, mass-spectrometry, and western blot. See also, e.g., E.
- Read depth refers to the number of times that a specific genomic site is sequenced during a sequencing run.
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- General Health & Medical Sciences (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Data Mining & Analysis (AREA)
- Pathology (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Hospice & Palliative Care (AREA)
- Oncology (AREA)
- Chemical Kinetics & Catalysis (AREA)
Abstract
Sont décrites des techniques de détermination de classifications pronostiques de pathologies. Un procédé donné à titre d'exemple consiste à déterminer des caractéristiques d'un échantillon provenant d'un sujet, à générer des données d'entrée indiquant les caractéristiques et à générer, sur la base des données d'entrée et d'un modèle prédictif, une classification pronostique du sujet. Les caractéristiques comprennent un ou plusieurs éléments parmi : un score de probabilité MMRD de l'échantillon, un état de nombre de copies de l'échantillon, une présence d'un ou de plusieurs variants pathogènes, un score instable de fraction, une signature de mutation, un score TMB, la présence d'une mutation de point chaud ou la présence d'un événement aneuploïde. Le score de probabilité MMRD indique un variant pathogène dans au moins un gène MMR, un état de méthylation du ou des gènes MMR, ou un état de méthylation d'un ou de plusieurs promoteurs associés au(x) gène(s) MMR.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363524825P | 2023-07-03 | 2023-07-03 | |
| US63/524,825 | 2023-07-03 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2025010296A2 true WO2025010296A2 (fr) | 2025-01-09 |
| WO2025010296A3 WO2025010296A3 (fr) | 2025-04-17 |
Family
ID=94172246
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/036612 Pending WO2025010296A2 (fr) | 2023-07-03 | 2024-07-02 | Classification pronostique basée sur des marqueurs génétiques |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025010296A2 (fr) |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019213478A1 (fr) * | 2018-05-04 | 2019-11-07 | Nanostring Technologies, Inc. | Dosage d'expression génique pour la mesure d'une déficience de réparation de mésappariements d'adn |
| WO2022099004A1 (fr) * | 2020-11-06 | 2022-05-12 | The General Hospital Corporation | Procédés pour caractériser des échantillons biologiques |
| IL313476A (en) * | 2021-12-15 | 2024-08-01 | Univ Johns Hopkins | Single molecule genome- wide mutation and fragmentation profiles of cell-free dna |
-
2024
- 2024-07-02 WO PCT/US2024/036612 patent/WO2025010296A2/fr active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025010296A3 (fr) | 2025-04-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250140348A1 (en) | Methods and systems for predicting an origin of an alteration in a sample using a statistical model | |
| WO2024081769A2 (fr) | Méthodes et systèmes de détection du cancer sur la base de la méthylation de l'adn de sites cpg spécifiques | |
| US20250272835A1 (en) | Predicting treatment efficacy by analyzing non-cancer cells | |
| EP4591311A1 (fr) | Procédés et systèmes pour déterminer des propriétés de variants par apprentissage automatique | |
| GB2577548A (en) | A noise measure for copy number analysis on targeted panel sequencing data | |
| US20250197932A1 (en) | Disease subtype classification using genomic features and clustering | |
| WO2024118594A1 (fr) | Procédés et systèmes d'attribution de signature de mutation | |
| WO2024173655A1 (fr) | Classification des échantillons basée sur l'analyse de la méthylation des fragments d'adn | |
| US20250382667A1 (en) | Identifying patient conditions by transforming nucleic acid sequence data into alternate domains | |
| WO2025010296A2 (fr) | Classification pronostique basée sur des marqueurs génétiques | |
| WO2024259320A2 (fr) | Prédiction de l'expression d'une cellule cancéreuse par analyse de l'état de méthylation d'un adntc | |
| WO2024259316A2 (fr) | Identification et classification de tumeur à l'aide de caractéristiques fragmentomiques | |
| US20250188536A1 (en) | Methods and systems for prediction of alt status | |
| US20250174358A1 (en) | Methods and systems for classification of disease entities via mixture modeling | |
| US20250101537A1 (en) | Methods and systems for determining an origin of viral sequence reads detected in a liquid biopsy sample | |
| US20250139774A1 (en) | Methods and systems for machine learning-based prediction of gene alterations from pathology images | |
| US20250372256A1 (en) | Ancestry-related kras co-alteration patterns as prognostic biomarkers | |
| US20250125008A1 (en) | Methods and systems for evaluation of sex biases in identifying molecular biomarkers for disease | |
| US20250154604A1 (en) | Methods and systems for determining circulating tumor dna fraction in a patient sample | |
| WO2025080809A1 (fr) | Classification d'une maladie à l'aide d'images de fragment | |
| US20250305060A1 (en) | Pole variant classification strategy identifies patients who may have a favorable prognosis and benefit from immunotherapy | |
| WO2024215498A1 (fr) | Procédé de détection de patients ayant une charge mutationnelle tumorale systématiquement sous-estimée qui peuvent tirer avantage d'une immunothérapie | |
| WO2025178926A1 (fr) | Procédés et systèmes de classification d'hétérogénéité intra-tumorale | |
| WO2024238560A1 (fr) | Procédés et systèmes de prédiction de nouvelles mutations pathogènes | |
| WO2025024225A2 (fr) | Procédés et systèmes de prédiction d'activité de her2 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24836553 Country of ref document: EP Kind code of ref document: A2 |