WO2025137620A1 - Procédés de séquençage de méthylation de haute qualité et de haute précision - Google Patents
Procédés de séquençage de méthylation de haute qualité et de haute précision Download PDFInfo
- Publication number
- WO2025137620A1 WO2025137620A1 PCT/US2024/061535 US2024061535W WO2025137620A1 WO 2025137620 A1 WO2025137620 A1 WO 2025137620A1 US 2024061535 W US2024061535 W US 2024061535W WO 2025137620 A1 WO2025137620 A1 WO 2025137620A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- region
- sequencing
- modified
- nucleobases
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- the present disclosure provides methods and compositions related to base calling in a sequencing by synthesis reaction performed on a converted DNA molecule. Such methods are important for accurately detecting the methylation status and variants present in DNA, which, in turn, can be important for inferring information about the cells and subject from which the DNA sample is derived.
- the DNA molecule is from a subject having or suspected of having a disease or disorder, such as cancer.
- Base-resolution or single-site methylation (SSM) sequencing methods in which a base conversion procedure is used may suffer sequencing quality and yield issues on sequencing by synthesis (SBS)-based next generation sequencing (NGS) platforms, such as Illumina NGS sequencing platforms.
- SBS sequencing by synthesis
- NGS next generation sequencing
- the bases sequenced at the beginning of “read 1” may be used to calibrate base calling metrics, using an assumption that the sequenced libraries are sufficiently complex (also referred to as “diverse”), i.e., that all four bases (A, C, G, and T) will be present with some expected distribution (near random) at each cycle in each read.
- Well-balanced or high complexity libraries have roughly equal proportions of all four nucleotides in each cycle throughout the sequencing run.
- Low complexity libraries e.g., that have undergone a base conversion procedure that deaminates unmethylated cytosines
- libraries that have adapters with in-line barcodes or other molecular identifiers and low complexity inserts e.g., Atty. Docket No. GH0154WO / 01228-0041-00PCT that have undergone a base conversion procedure that deaminates unmethylated cytosines
- libraries that are prepared by ligating adapters with conversion-resistant nucleotides followed by undergoing a base conversion procedure can have regions of low complexity and high complexity.
- Sequence complexity is important for run performance and high-quality data generation, particularly in the early (e.g., first 25) cycles of a sequencing run because this is when various base calling metrics, e.g., the clusters passing filter, phasing/pre-phasing, and color matrix corrections, may be calibrated.
- various base calling metrics e.g., the clusters passing filter, phasing/pre-phasing, and color matrix corrections.
- SSM methods deaminate or otherwise convert unmethylated cytosines to uracils, which are amplified and sequenced as thymines, while methylated cytosines (or mCpG only in DM-seq) are not deaminated and sequenced as cytosines.
- SSM methods such as bisulfite sequencing, EM-seq, and DM-seq
- uracils which are amplified and sequenced as thymines
- methylated cytosines or mCpG only in DM-seq
- cytosines or mCpG only in DM-seq
- a common method used to reduce the sequencing yield and/or quality loss with SSM sequencing is to increase the base diversity of the total DNA being sequenced in a run by combining an SSM library (or libraries) with another high diversity library (or libraries), such as the PhiX library, in a pool to run on the same NGS flow cell.
- the PhiX library is derived from the small, well-characterized bacteriophage genome, PhiX.
- the PhiX library has an average size of 500 bp and a balanced base composition at approximately 45% GC and approximately 55% AT. See Illumina (2023). What is the PhiX Control v3 Library and what is its function in Illumina Next Generation Sequencing?
- the present disclosure provides an improved SSM workflow for use in sequencing by synthesis-based sequencing (such as Illumina NGS).
- the workflow results in both improved base calling accuracy (e.g., of modified bases, such as methylated bases) and higher sequencing qualities and yields than theoretically achievable by current approaches (such as standard Illumina NGS).
- Embodiment 1 is a method comprising: (a) performing a sequencing by synthesis reaction on a converted DNA molecule with a sequencing by synthesis instrument, the converted DNA molecule comprising: ligated adapters; a converted region comprising one or more nucleobases that have been converted by a conversion procedure; and a resistant region comprising one or more nucleobases that are resistant to the conversion procedure; wherein the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region; (b) calibrating one or more base calling metrics of the sequencing by synthesis instrument based Atty.
- Embodiment 2 is a method comprising: (a) ligating adapters to a DNA molecule, wherein the adapters comprise a resistant region comprising one or more nucleobases that are resistant to a conversion procedure and the DNA molecule comprises one or more nucleobases that are substrates for the conversion procedure, thereby producing an adapted DNA molecule; (b) performing the conversion procedure on the adapted DNA molecule, thereby producing a converted DNA molecule comprising a converted region, the converted region comprising one or more nucleobases that have been converted by the conversion procedure; (c) performing a sequencing by synthesis reaction on the converted DNA molecule with a sequencing by synthesis instrument; wherein the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region; (d) calibrating one or more base calling metrics of the sequencing by synthesis instrument based at least in part on data from the resistant region, thereby providing one or more calibrated base calling metrics; and (e) calling
- Embodiment 3 is a method comprising: (a) subjecting a DNA molecule comprising one or more nucleobases that are substrates for a conversion procedure to end repair, wherein the end repair comprises extending a recessed 3’ end of the DNA molecule using a DNA polymerase and deoxyribonucleotides comprising a nucleobase that is resistant to the conversion procedure, thereby generating an end-repaired DNA molecule comprising a resistant region that comprises the nucleobase resistant to the conversion procedure; (b) ligating adapters to the end-repaired DNA molecule, thereby producing an adapted DNA molecule; (c) performing the conversion procedure on the adapted DNA molecule, thereby producing a converted DNA molecule comprising a converted region, the converted region comprising one or Atty.
- Embodiment 4 is the method of claim 1 or claim 3, wherein the ligating seals one or more nicks present in the end-repaired DNA.
- Embodiment 5 is the method of the immediately preceding claim, wherein the end repair is performed with a DNA polymerase which does not have 5’-3’ exonuclease activity and/or is not a strand displacing DNA polymerase.
- Embodiment 6 is the method of the immediately preceding claim, wherein the DNA polymerase is T4 DNA polymerase, T7 DNA polymerase, or Klenow fragment.
- Embodiment 7 is the method of any one of the preceding claims, wherein the adapter is a Y-shaped adapter that comprises a first strand and a second strand.
- Embodiment 8 is the method of the immediately preceding claim, wherein (a) the first strand comprises a first arm region and a first stem region; and (b) the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region is configured not to anneal to the first arm region.
- Embodiment 9 is the method of the immediately preceding claim, wherein the first arm region and second arm region each comprise at least one nucleobase that is resistant to the conversion procedure.
- Embodiment 10 is the method of claim 8 or claim 9, wherein (a) (i) the number of nucleobases with a base-pairing specificity complementary to the nucleobases that are resistant to the conversion procedure in the first arm region is greater than the number of nucleobases that are resistant to the conversion procedure in the first arm region, and/or (ii) the number of nucleobases that are resistant to the conversion procedure in the first Atty. Docket No.
- GH0154WO / 01228-0041-00PCT arm region is less than 25% of the number of nucleobases in the first arm region; and (b) (i) the number of nucleobases with a base-pairing specificity complementary to the nucleobases that are resistant to the conversion procedure in the second arm region is greater than the number of nucleobases that are resistant to the conversion procedure in the second arm region and/or (ii) the number of nucleobases that are resistant to the conversion procedure in the second arm region is less than 25% of the number of nucleobases in the second arm region.
- Embodiment 11 is the method of any one of claims 8-10, wherein the first arm region is located 5’ of the first stem region and the second arm region is located 3’ of the second stem region.
- Embodiment 12 is the method of any one of the preceding claims, wherein the nucleobase that is resistant to the conversion procedure comprises a modified nucleobase.
- Embodiment 13 is the method of the immediately preceding claim, wherein the modified nucleobase comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5- hydroxymethylcytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 8- oxoguanine (8oxoG), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5-ghmC), 5- caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- the modified nucleobase comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5- hydroxymethylcytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 8- oxoguanine (8oxoG), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5-ghmC), 5- caryboxylcyto
- Embodiment 14 is the method of any one of the preceding claims, wherein the nucleobase that is resistant to the conversion procedure is a modified cytosine, optionally wherein the modified cytosine is 5-methylcytosine or 5-hydroxymethylcytosine.
- Embodiment 15 is the method of any one of claims 8-14, wherein the first arm region, the second arm region, the first stem region, and/or the second stem region comprise one or more modified cytosines, optionally wherein the one or more modified cytosines are 5-methylcytosine, 5-hydroxymethylcytosine, 5-caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- Embodiment 16 is the method of any one of claims 8-15, wherein at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% of the cytosines in the first arm region, the second arm region, the first stem region, and/or the second stem region are modified cytosines.
- Embodiment 17 is the method of any one of claims 8-16, wherein the first arm region, the second arm region, the first stem region, and/or the second stem region are substantially free of unmodified cytosines.
- Embodiment 18 is the method of any one of the preceding claims, wherein the resistant region is at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, Atty. Docket No. GH0154WO / 01228-0041-00PCT at least about 15, at least about 16, at least about 17, at least about 18, at least about 19, or at least about 20 nucleotides in length.
- Embodiment 19 is the method of any one of the preceding claims, wherein the resistant region is about 10-40, about 10-35, about 10-30, about 10-25, about 10-20, about 10-15, about 15-40, about 15-35, about 15-30, about 15-25, about 15-20, about 20-40, about 20-35, about 20- 30, or about 20-25 nucleotides in length.
- Embodiment 20 is the method of any one of the preceding claims, wherein the resistant region is located 3’ of the converted region.
- Embodiment 21 is the method of any one of the preceding claims, wherein the resistant region comprises at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, or at least 15 nucleobases that are resistant to the conversion procedure.
- Embodiment 22 is the method of any one of the preceding claims, wherein the resistant region comprises 2-30, 2-25, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, or 2-3 nucleobases that are resistant to the conversion procedure.
- Embodiment 23 is the method of any one of the preceding claims, wherein the ligation is a sticky-end ligation.
- Embodiment 24 is the method of any one of the preceding claims, wherein the one or more base calling metrics comprises clusters passing filter, phasing/pre-phasing, and/or color matrix corrections values.
- Embodiment 25 is the method of any one of the preceding claims, wherein the sequencing by synthesis reaction comprises sequencing the DNA in a manner that distinguishes the first nucleobase from the second nucleobase.
- Embodiment 26 is the method of any one of the preceding claims, wherein the sequencing by synthesis reaction comprises next generation sequencing.
- Embodiment 27 is the method of any one of the preceding claims, wherein the sequencing by synthesis reaction comprises generating a plurality of sequencing reads and mapping the plurality of sequencing reads to one or more reference sequences to generate mapped sequence reads.
- Embodiment 28 is the method of any one of the preceding claims, further comprising determining an epigenetic modification status of at least a portion of the nucleobases of the DNA. Atty. Docket No. GH0154WO / 01228-0041-00PCT
- Embodiment 29 is the method of any one of the preceding claims, further comprising performing an A-tailing reaction.
- Embodiment 30 is the method of the immediately preceding claim, wherein the end- repair and the A-tailing reaction are performed in the same reaction mixture, optionally wherein the end-repair and the A-tailing reaction are performed a single tube and/or optionally wherein the end-repair and the A-tailing reaction are performed without an intervening clean-up step.
- Embodiment 31 is the method of the immediately preceding claim, wherein the A- tailing is performed using a DNA polymerase that does not possess 5’-3’ exonuclease activity and/or is not a strand displacing DNA polymerase, optionally wherein the DNA polymerase is HemoKlen Taq.
- Embodiment 32 is the method of claim 29 or claim 30, wherein the A-tailing is performed using a thermostable DNA polymerase.
- Embodiment 33 is the method of claim 20 or claim 30, wherein the A-tailing is performed using Taq DNA polymerase, Tfl DNA Polymerase, Bst DNA Polymerase, Large Fragment or Tth DNA polymerase.
- Embodiment 34 is the method of claim 29, wherein the end-repair and the A-tailing reaction are performed as separate reactions, wherein a reaction clean-up step is performed after the end-repair and before the A-tailing reaction.
- Embodiment 35 is the method of claim 34, wherein the reaction clean-up step removes unincorporated dNTPs.
- Embodiment 36 is the method of claim 34 or claim 35, wherein the A-tailing is performed using a DNA polymerase that does not possess 3’-5’ exonuclease activity, optionally wherein the DNA polymerase is Klenow Fragment lacking 3'-5' exonuclease activity.
- Embodiment 37 is the method of claim 30 or claim 34, wherein the A-tailing is performed using a DNA polymerase that possesses 5’-3’ exonuclease activity and/or is a strand displacing DNA polymerase.
- Embodiment 38 is the method of any one of claims 29-37, wherein the A tailing reaction is performed at a higher temperature than the end repair, optionally wherein the end repair is performed at about 15-35°C and/or the A tailing is performed at a temperature over about 60°C, further optionally wherein the temperature over 60°C is about 60°C-75°C.
- Embodiment 39 is the method of any one of the preceding claims, wherein the conversion procedure comprises deamination of unmodified cytosines of the DNA to uracil. Atty. Docket No.
- Embodiment 40 is the method of any one of the preceding claims, wherein the conversion procedure comprises contacting the DNA or a subsample thereof with a cytosine deaminase.
- Embodiment 41 is the method of the immediately preceding claim, wherein the cytosine deaminase is an APOBEC enzyme, optionally wherein the APOBEC enzyme is APOBEC3A.
- Embodiment 42 is the method of any one of the preceding claims, wherein the conversion procedure comprises enzymatic protection of one or more modified nucleobases of the DNA.
- Embodiment 43 is the method of the immediately preceding claim, wherein the enzymatic protection comprises glucosylation of the 5-hydroxymethylcytosines of the DNA, optionally wherein the glucosylation comprises contacting the DNA with beta- glucosyltransferase.
- Embodiment 44 is the method of any one of the preceding claims, wherein the conversion procedure comprises contacting the DNA or a subsample thereof with a ten-eleven translocation (TET) enzyme.
- TET ten-eleven translocation
- Embodiment 45 is the method of any one of the preceding claims, wherein the conversion procedure comprises subjecting the DNA or a subsample thereof to a procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity.
- Embodiment 46 is the method of the immediately preceding claim, wherein the first nucleobase is an unmodified cytosine and the second nucleobase is a modified cytosine.
- Embodiment 47 is the method of the immediately preceding claim, wherein the modified cytosine is 5-methylcytosine.
- Embodiment 48 is the method of claim 37, wherein the modified cytosine is 5- hydroxymethylcytosine.
- Embodiment 49 is the method of any one of claims 45-48, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA chemically converts the first or second nucleobase such that the base pairing specificity of the converted nucleobase is altered. Atty. Docket No.
- Embodiment 50 is the method of any one of claims 45-49, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylation-sensitive conversion.
- Embodiment 51 is the method of the immediately preceding claim, wherein the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM-seq), or single-enzyme 5-methylctyosine sequencing (SEM-seq) method .
- the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM-seq), or single-enzyme 5-methylctyosine sequencing (SEM-seq) method .
- Embodiment 52 is the method of the immediately preceding claim, wherein the Tet- assisted conversion further comprises a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane.
- a substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane.
- Embodiment 53 is the method of any one of the preceding claims, wherein the conversion procedure comprises contacting the DNA with a CpG-specific DNA methyltransferase (MTase) or a CpG-specific carboxymethyltransferase (CxMTase), a methyl donor or a carboxymethyl donor, and a cytosine deaminase.
- MTase DNA methyltransferase
- CxMTase CpG-specific carboxymethyltransferase
- Embodiment 54 is the method of the immediately preceding claim, wherein the cytosine deaminase is an APOBEC enzyme, optionally wherein the APOBEC enzyme is APOBEC3A.
- Embodiment 55 is the method of any one of the preceding claims, wherein the adapters comprise at least one tag.
- Embodiment 56 is the method of the immediately preceding claim, wherein the at least one tag comprises a molecular barcode.
- Embodiment 57 is the method of any one of the preceding claims, wherein the DNA is cell-free DNA.
- Embodiment 58 is the method of claim 57, wherein the cell-free DNA is in an amount between 1 ng and 500 ng.
- Embodiment 59 is the method of any one of the preceding claims, wherein the DNA is from a blood sample and/or a tissue sample.
- Embodiment 60 is the method of claim 59, wherein the blood sample is a whole blood sample, a plasma sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample.
- Embodiment 61 is the method of any one of the preceding claims, wherein the DNA and/or the sample is from a subject. Atty. Docket No. GH0154WO / 01228-0041-00PCT
- Embodiment 62 is the method of claim 61, wherein the subject is an animal.
- Embodiment 63 is the method of claim 61 or claim 62, wherein the subject is a human.
- Embodiment 64 is the method of any one of claims 59-63, wherein the blood sample is fractionated prior to enriching for at least one epigenetic target region sets of DNA.
- Embodiment 65 is the method of any one of claims 61-64, wherein the subject has or is at risk of having a cancer.
- Embodiment 66 is the method of any one of claims 61-65, further comprising determining the presence or status of a cancer in the subject.
- Embodiment 67 is the method of any one of claims 61-66, further comprising determining the likelihood that the subject has an infection.
- Embodiment 68 is the method of any one of claims 61-67, further comprising determining the likelihood that the subject has a transplant rejection.
- Embodiment 69 is a Y-shaped oligonucleotide adapter comprising first and second strands, wherein: (a) the first strand comprises a first arm region and a first stem region; (b) the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region is configured not to anneal to the first arm region; (c) the first arm region and second arm region each comprise one or more modified nucleobases that are resistant to a conversion procedure; and (d) (i) the number of nucleobases with a base-pairing specificity complementary to the modified nucleobases that are resistant to the conversion procedure in the first arm region is greater than the number of modified nucleobases that are resistant to the conversion procedure in the first arm region, and
- Embodiment 70 is the Y-shaped oligonucleotide adapter of the immediately preceding claim, wherein the modified nucleobase comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 8-oxoguanine (8oxoG), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5-ghmC), 5- caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- the modified nucleobase comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 8-oxoguanine (8oxoG), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5-g
- Embodiment 71 is the Y-shaped oligonucleotide adapter of claim 69 or claim 70, wherein the nucleobase that is resistant to the conversion procedure is a modified cytosine, optionally wherein the modified cytosine is 5-methylcytosine or 5-hydroxymethylcytosine.
- Embodiment 72 is a Y-shaped oligonucleotide adapter comprising first and second strands, wherein: (a) the first strand comprises a first arm region and a first stem region; (b) the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region is configured not to anneal to the first arm region; (c) the first arm region and second arm region each comprise modified cytosines; and (d) (i) the number of guanines in the first arm region is greater than the number of modified cytosines in the first arm region, and/or (ii) the number of modified cytosines in the first arm region is less than 25% of the number of nucleobases in the first arm region; and (e) (i) the number of guanines in the second arm region is greater than the number of modified cytosines in the second arm region and/or (ii) the number of modified cyto
- Embodiment 73 is the Y-shaped oligonucleotide adapter of the immediately preceding claim, wherein the modified cytosine comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5- ghmC), 5-caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- Embodiment 74 is the Y-shaped oligonucleotide adapter of any one of claims 71-73, wherein the modified cytosine is 5-methylcytosine.
- Embodiment 75 is the Y-shaped oligonucleotide adapter of any one of claims 71-73, wherein the modified cytosine is 5-hydroxymethylcytosine.
- Embodiment 76 is the Y-shaped oligonucleotide adapter of any one of claims 69-75, wherein the first arm region is located 5’ of the first stem region and the second arm region is located 3’ of the second stem region. Atty. Docket No.
- Embodiment 77 is the Y-shaped oligonucleotide adapter of any one of claims 69-76, wherein the first arm region, the second arm region, the first stem region, and/or the second stem region comprise one or more modified cytosines, optionally wherein the one or more modified cytosines are 5-methylcytosine, 5-hydroxymethylcytosine, 5-caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- Embodiment 78 is the Y-shaped oligonucleotide adapter of any one of claims 69-77, wherein at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% of the cytosines in the first arm region, the second arm region, the first stem region, and/or the second stem region are modified cytosines.
- Embodiment 79 is the Y-shaped oligonucleotide adapter of any one of claims 69-78, wherein the first arm region, the second arm region, the first stem region, and/or the second stem region are substantially free of unmodified cytosines.
- Embodiment 80 is the Y-shaped oligonucleotide adapter of any one of the preceding claims, further comprising at least one tag.
- Embodiment 81 is the Y-shaped oligonucleotide adapter of the immediately preceding claim, wherein the at least one tag comprises a molecular barcode.
- the results of the methods disclosed herein are used as an input to generate a report.
- the report may be in a paper or electronic format. For example, true copy number variation, as obtained by the methods disclosed herein, or information derived therefrom, can be displayed directly in such a report.
- FIG.1A illustrates an exemplary standard double-stranded DNA library preparation workflow for methylation sequencing, including standard end-repair and A-tailing and epigenetic base conversion steps.
- FIG.1B illustrates an exemplary double-stranded DNA library preparation workflow for methylation sequencing according to certain embodiments disclosed herein.
- DNA useful in the disclosed embodiments can include cell-free DNA and/or DNA collected from a sample comprising cells (such as a blood sample (e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample)).
- a blood sample e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample
- FIG.2 is a schematic diagram of an example of a system suitable for use with some embodiments of the disclosure.
- FIG.3 illustrates an exemplary standard Y-shaped adapter (left) and an exemplary “complement” Y-shaped adapter suitable for use with some embodiments of the disclosure (right).
- the standard Y-shaped adapter comprises a 5’ read1 adapter strand, a 3’ read2 adapter strand, and an exemplary molecular barcode.
- the “complement” Y-shaped adapter comprises 5’ read2 adapter strand, 3’ read1 adapter strand, and an exemplary molecular barcode.
- the complement Y-shaped adapter (right) comprises the reverse complement of the Illumina-specific sequences of a current/standard NGS adapter (left).
- sequences of each strand of the exemplary Y-shaped adapters are as follows.
- SEQ ID NO: 1 GATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCGATGTGT
- SEQ ID NO: 2 CACTGACCTCAAGTCTGCACACGAGAAGGCTAGAGCTACAC
- SEQ ID NO: 3 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCGATGTGT
- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCGATGTGT is a 5’ read2 adapter strand of an exemplary Y-shaped adapter as disclosed herein.
- SEQ ID NO: 4 (CTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAGCTACAC) is a 3’ read1 adapter strand of an exemplary Y-shaped adapter as disclosed herein.
- DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS [00098]
- cell-free DNA includes DNA molecules that naturally occur in a subject in extracellular form (e.g., in blood, serum, plasma, or other bodily fluids such as lymph, cerebrospinal fluid, urine, or sputum).
- cfDNA While the cfDNA originally existed in a cell or cells in a large complex biological organism, e.g., a mammal, it has undergone release from the cell(s) into a fluid found in the organism, and may be obtained from a sample of the fluid without the need to perform an in vitro cell lysis step.
- “buffy coat” refers to the portion of a blood (such as whole blood) or bone marrow sample that contains all or most of the white blood cells and platelets of the sample.
- the buffy coat fraction of a sample can be prepared from the sample using centrifugation, which separates sample components by density. For example, following Atty. Docket No.
- the buffy coat fraction is situated between the plasma and erythrocyte (red blood cell) layers.
- the buffy coat can contain both mononuclear (e.g., T cells, B cells, NK cells, dendritic cells, and monocytes) and polymorphonuclear (e.g., granulocytes such as neutrophils and eosinophils) white blood cells.
- mononuclear e.g., T cells, B cells, NK cells, dendritic cells, and monocytes
- polymorphonuclear e.g., granulocytes such as neutrophils and eosinophils
- white blood cells e.g., neutrophils and eosinophils
- Such cells include, e.g., lymphocytes (T cells, B cells, and NK cells) as well as monocytes, and are isolated from blood samples (such as from a whole blood sample collected from a subject) using density gradient centrifugation.
- “partitioning” of nucleic acids, such as DNA molecules means separating, fractionating, sorting, or enriching a sample or population of nucleic acids into a plurality of subsamples or subpopulations of nucleic acids based on one or more modifications or features that is in different proportions in each of the plurality of subsamples or subpopulations.
- DNA fragmentation can be performed, for example, to prepare DNA (such as genomic DNA and/or DNA isolated from a sample comprising cells) for sequencing.
- a “reaction cleanup” refers to the removal of contaminants such as salts, enzymes, unincorporated dNTPs, primers, ethidium bromide, and other impurities that can interfere with downstream analysis. For example, when a reaction cleanup is performed between end repair and an A-tailing reaction, it removes unincorporated dNTPs such that the A-tailing reaction can be Atty. Docket No.
- isolated refers to a biological component (such as a nucleic acid molecule, protein, or cell) that has been substantially separated, produced apart from, or purified away from other components (for example, other components in a sample, cell, or organism in which the component naturally occurs).
- Nucleic acid molecules, proteins, or cells that have been “isolated” include those purified using standard purification methods.
- isolated or purified does not require absolute purity; rather, it is intended as a relative term.
- an isolated biological component is one in which the biological component is more enriched in a preparation than the biological component is in its natural environment within a cell, organism, sample, or production vessel (for example, a cell culture system).
- an isolated biological component can represent at least 50%, such as at least 70%, at least 80%, at least 90%, at least 95%, or greater, of the total biological component content of the preparation.
- base pairing specificity refers to the standard DNA base (A, C, G, or T) for which a given base most preferentially pairs.
- unmodified cytosine and 5- methylcytosine have the same base pairing specificity (i.e., specificity for G) whereas uracil and cytosine have different base pairing specificity because uracil has base pairing specificity for A while cytosine has base pairing specificity for G.
- uracil to form a wobble pair with G is irrelevant because uracil nonetheless most preferentially pairs with A among the four standard DNA bases.
- Atty. Docket No. GH0154WO / 01228-0041-00PCT [000113]
- “without substantially altering base pairing specificity” of a given nucleobase means that a majority of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity of the given nucleobase relative to its base pairing specificity as it was in the originally isolated sample.
- 75%, 90%, 95%, or 99% of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity relative to its base pairing specificity as it was in the originally isolated sample.
- altered base pairing specificity of a given nucleobase means that a majority of molecules comprising that nucleobase that can be sequenced have a base pairing specificity at that nucleobase relative to its base pairing specificity in the originally isolated sample.
- a modification or other feature is present in “a greater proportion” in a first sample or population of nucleic acid than in a second sample or population when the fraction of nucleotides with the modification or other feature is higher in the first sample or population than in the second population. For example, if in a first sample, one tenth of the nucleotides are mC, and in a second sample, one twentieth of the nucleotides are mC, then the first sample comprises the cytosine modification of 5-methylation in a greater proportion than the second sample.
- a “differentially methylated region” refers to a region of DNA having a detectably different degree of methylation in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type; or having a detectably different degree of methylation in at least one cell or tissue type obtained from a subject having a disease or disorder relative to the degree of methylation in the same region of DNA in the same cell or tissue type obtained from a healthy subject.
- a differentially methylated region has a detectably higher degree of methylation (e.g., a hypermethylated region) in at least one cell or tissue type, such as at least one cancer cell type and/or at least one immune cell type, relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type, such as other immune cell types, or from the same cell or tissue type from a healthy subject.
- a detectably higher degree of methylation e.g., a hypermethylated region
- a differentially methylated region has a detectably lower degree of methylation (e.g., a hypomethylated region) in at least one cell or tissue type, such as at least one cancer cell type and/or at least one immune cell type, relative to the degree of methylation in the same region of DNA from at least one other cell or Atty.
- Docket No. GH0154WO / 01228-0041-00PCT tissue type such as other immune cell types, or from the same cell or tissue type from a healthy subject.
- Tumor cells are neoplastic cells that originated from a tumor, regardless of whether they remain in the tumor or become separated from the tumor (as in the cases, e.g., of metastatic cancer cells and circulating tumor cells).
- precancer or a “precancerous condition” is an abnormality that has the potential to become cancer, wherein the potential to become cancer is greater than the potential if the abnormality was not present, i.e., was normal.
- precancer include but are not limited to adenomas, hyperplasias, metaplasias, dysplasias, benign neoplasias (benign tumors), premalignant carcinoma in situ, and polyps.
- methylation refers to addition of a methyl group to a nucleotide base in a nucleic acid molecule.
- methylation refers to addition of a methyl group to a cytosine at a CpG site (cytosine-phosphate-guanine site (i.e., a cytosine followed by a guanine in a 5’ ⁇ 3’ direction of the nucleic acid sequence)).
- DNA methylation refers to addition of a methyl group to adenine, such as in N 6 - methyladenine (6mA).
- DNA methylation is 5-methylation (modification of the carbon in the 5th position of the cytosine ring).
- 5-methylation refers to addition of a methyl group to the 5C position of the cytosine to create 5-methylcytosine (5mC).
- methylation comprises a derivative of 5mC. Derivatives of 5mC include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC).
- DNA methylation is 3C methylation (modification of the carbon in the 3 rd position of the cytosine ring).
- 3C methylation comprises addition of a methyl group to the 3C position of the cytosine to generate 3-methylcytosine (3mC).
- Methylation can also occur at non-CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site.
- DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed. DNA methylation is critical for normal development and abnormality in methylation may disrupt epigenetic regulation.
- modified cytosine refers to a cytosine in which at least one position of the cytosine has been substituted with a chemical moiety, such as a methyl or hydroxymethyl, that is different from the substituent at that position in unmodified cytosine. For the avoidance of doubt, “modified cytosine” does not include unmodified cytosine.
- hypermethylation refers to an increased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules.
- hypermethylated DNA can include DNA molecules comprising at least 1 methylated residue, at least 2 methylated residues, at least 3 methylated residues, at least 5 methylated residues, or at least 10 methylated residues.
- hypermethylation refers to a decreased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules.
- hypomethylated DNA includes unmethylated DNA molecules.
- hypomethylated DNA can include DNA molecules comprising 0 methylated residues, at most 1 methylated residue, at most 2 methylated residues, at most 3 methylated residues, at most 4 methylated residues, or at most 5 methylated residues.
- “methylation status” can refer to the presence or absence of methyl group on a DNA base (e.g., cytosine) at a particular genomic position in a nucleic acid molecule. It can also refer to the degree of methylation in a nucleic acid sequence (e.g., highly methylated, low methylated, intermediately methylated or unmethylated nucleic acid molecules).
- the methylation status can also refer to the number of nucleotides methylated in a particular nucleic acid molecule.
- a “sequencing by synthesis reaction” refers to sequencing reactions (which are generally next-generation sequencing (NGS) reactions) that can determine the sequence of a DNA molecule by detecting the incorporation of each nucleotide into a complementary strand synthesized by a DNA polymerase. As the polymerase synthesizes a copy of a single strand of DNA, the incorporation of each nucleotide is monitored, such as by detection of fluorescently labeled nucleotides.
- NGS next-generation sequencing
- next-generation sequencing refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of sequence reads at a time.
- next-generation sequencing techniques Atty. Docket No. GH0154WO / 01228-0041-00PCT include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
- next-generation sequencing includes the use of instruments capable of sequencing single molecules.
- clusters passing filter refers to a base calling metric that provides an indication of signal purity from each cluster in a sequencing run (e.g., an NGS run).
- Clusters passing filters are commonly used in processing NGS data, e.g., in Illumina workflows.
- a cluster is a clonal grouping of template DNA, e.g., bound to the surface of a flow cell.
- Each cluster is typically seeded by a single, template DNA strand and is clonally amplified until the cluster has approximately 1000 copies.
- Each cluster (e.g., on the flow cell) produces a single sequencing read.
- the least reliable clusters are removed, e.g., filtered (such as by a chastity filter, e.g., of Illumina NGS software), from the image extraction results.
- Illumina sequencers may perform an internal quality filtering procedure called chastity filter.
- clusters may “pass filter” if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles.
- Chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and the second brightest base intensities.
- the “clusters passing filter” metric can then be used (such as in combination with other metrics, such as phasing/prephasing rates and/or color matrix correction values) in base calling and quality score calculations for all cycles in the run. See Illumina (2016), Optimizing Cluster Density on Illumina Sequencing Systems, available at illumina.com/content/dam/illumina-marketing/documents/products/other/miseq-overclustering- primer-770-2014-038.pdf.
- Phasing/prephasing refers to a base calling metric that indicates the fraction of molecules that become phased or prephased per sequencing cycle. Phasing and pre- phasing are commonly used in processing NGS data, e.g., in Illumina workflows. Phasing and prephasing indicate the rate at which singular molecules in a cluster fall behind (“phasing”) or move ahead (“prephasing”) of the current cycle during the sequencing stage, e.g., of an NGS run (such as an Illumina NGS run). In other words, the phasing and prephasing rates define the fraction of molecules that become phased or prephased per cycle.
- each DNA strand in a cluster extends by one base per cycle. A small proportion of strands may become out of phase with the current cycle, either falling a base behind (phasing) or Atty. Docket No. GH0154WO / 01228-0041-00PCT jumping a base ahead (prephasing).
- the phasing and prephasing rates define the fraction of molecules that become phased or prephased per cycle. Calculation of these rates generally requires a balanced and random base composition in sequencing cycles 2–12.
- the “phasing/prephasing” metric is then used (typically in combination with other metrics, such as clusters passing filter and/or color matrix correction values) in base calling and quality score calculations for all cycles in the run.
- color matrix correction values refers to a base calling metric that is used to correct for cross talk between imaging channels in a sequencing analysis. Color matrix correction values are commonly used in processing NGS data, e.g., in Illumina workflows. Color matrix correction refers to a template created in the first few sequencing cycles (e.g., of an NGS analysis) that includes intensities from each imaging channel, and which is then used in all subsequent reads as well as for phasing/pre-phasing rates.
- Cross talk occurs when, for example, a cluster shows intensity in the cytosine channel and some intensity also shows in the adenine channel.
- Matrix-corrected intensities are generated with reduced or no cross talk, and differences in overall intensities between color channels are balanced.
- the “color matrix correction values” metric is then used (typically in combination with other metrics, such as phasing/prephasing rates and/or clusters passing filter) in base calling and quality score calculations for all cycles in the run. See Illumina (2003), What is nucleotide diversity and why is it important?, available at knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material- list/000001543.
- the form of the “originally isolated” sample refers to the composition or chemical structure of a sample at the time it was isolated and before undergoing any procedure that changes the chemical structure of the isolated sample.
- a feature that is “originally present” in DNA molecules refers to a feature present in “original DNA molecules” or in DNA molecules “originally comprising” the feature before the DNA molecules undergo a procedure that changes the chemical structure of DNA molecules.
- nucleic acid tag refers to a short nucleic acid (e.g., less than about 500 nucleotides, about 100 nucleotides, about 50 nucleotides, or about 10 nucleotides in length), used to distinguish nucleic acids from different samples (e.g., representing a sample index), distinguish nucleic acids from different partitions (e.g., representing a partition tag) or different Atty. Docket No. GH0154WO / 01228-0041-00PCT nucleic acid molecules in the same sample (e.g., representing a molecular barcode), of different types, or which have undergone different processing.
- the nucleic acid tag comprises a predetermined, fixed, non-random, random or semi-random oligonucleotide sequence. Such nucleic acid tags may be used to label different nucleic acid molecules or different nucleic acid samples or sub-samples. Nucleic acid tags can be single-stranded, double-stranded, or at least partially double-stranded. Nucleic acid tags optionally have the same length or varied lengths. Nucleic acid tags can also include double-stranded molecules having one or more blunt-ends, include 5’ or 3’ single-stranded regions (e.g., an overhang), and/or include one or more other single-stranded regions at other locations within a given molecule.
- Nucleic acid tags can be attached to one end or to both ends of the other nucleic acids (e.g., sample nucleic acids to be amplified and/or sequenced). Nucleic acid tags can be decoded to reveal information such as the sample of origin, form, or processing of a given nucleic acid. For example, nucleic acid tags can also be used to enable pooling and/or parallel processing of multiple samples comprising nucleic acids bearing different molecular barcodes and/or sample indexes in which the nucleic acids are subsequently being deconvolved by detecting (e.g., reading) the nucleic acid tags. Nucleic acid tags can also be referred to as identifiers (e.g., molecular identifier, sample identifier).
- identifiers e.g., molecular identifier, sample identifier
- nucleic acid tags can be used as molecular identifiers (e.g., to distinguish between different molecules or amplicons of different parent molecules in the same sample or sub-sample). This includes, for example, uniquely tagging different nucleic acid molecules in a given sample, or non-uniquely tagging such molecules.
- tags i.e., molecular barcodes
- endogenous sequence information for example, start and/or stop positions where they map to a selected reference genome, a sub-sequence of one or both ends of a sequence, and/or length of a sequence
- a sufficient number of different molecular barcodes are used such that there is a low probability (e.g., less than about a 10%, less than about a 5%, less than about a 1%, or less than about a 0.1% chance) that any two molecules may have the same endogenous sequence information (e.g., start and/or stop positions, subsequences of one or both ends of a sequence, and/or lengths) and also have the same molecular barcode.
- Terms such as “library adaptors having distinct molecular barcodes” encompass library adaptors for uniquely or non-uniquely tagging molecules, in that regardless of Atty. Docket No.
- subject refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human.
- a subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual in need of therapy or suspected of needing therapy.
- the terms “individual” or “patient” are intended to be interchangeable with “subject”.
- a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy. The subject can be in remission of a cancer.
- the subject can be an individual who is diagnosed of having an autoimmune disease.
- the subject can be a female individual who is pregnant or who is planning on getting pregnant, who may have been diagnosed of or suspected of having a disease, e.g., a cancer, an auto-immune disease.
- a “Y-shaped adapter” refers to an adapter comprising two DNA strands comprising complementary and non-complementary parts, wherein the non-complementary parts form single-stranded arms.
- the adapter can be attached to a sample or insert DNA molecule, e.g., by ligation, such that the complementary (double-stranded) part of the adapter is proximal to the sample or insert DNA molecule.
- the double stranded portion of the Y- shaped adapter may have a blunt end or an overhang, e.g., of one to three nucleotides.
- the single stranded arms may or may not be of identical length.
- Y- shaped adapter refers to a Y-shaped adapter for use in the disclosed methods (such as the exemplary Y-shaped adapter shown in Figure 3, left side).
- the terms “or a combination thereof” and “or combinations thereof” as used herein refers to any and all permutations and combinations of the listed terms preceding the term.
- A, B, C, or combinations thereof is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, ACB, CBA, BCA, BAC, or CAB.
- expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth.
- BB BB
- AAA AAA
- AAB AAA
- BBC AAABCCCCCC
- CBBAAA CABABB
- the “capture yield” of a collection of probes for a given target set refers to the amount (e.g., amount relative to another target set or an absolute amount) of nucleic acid corresponding to the target set that the collection of probes captures under typical conditions.
- Exemplary typical capture conditions are an incubation of the sample nucleic acid and probes at 65°C for 10-18 hours in a small reaction volume (about 20 ⁇ L) containing stringent hybridization buffer.
- the capture yield may be expressed in absolute terms or, for a plurality of collections of probes, relative terms.
- first and second target regions are 50 kb and 500 kb, respectively (giving a normalization factor of 0.1)
- the DNA corresponding to the first target region set is captured with a higher yield than DNA corresponding to the second target region set when the mass per volume concentration of the captured DNA corresponding to the first target region set is more than 0.1 times the mass per volume concentration of the captured DNA corresponding to the second target region set.
- “Capturing” one or more target nucleic acids or one or more nucleic acids comprising at least one target region refers to preferentially isolating or separating the one or more target nucleic acids or one or more nucleic acids comprising at least one target region from non-target nucleic acids or from nucleic acids that do not comprise at least one target region.
- a “captured set” of nucleic acids or “captured” nucleic acids refers to nucleic acids that have undergone capture.
- a “capture moiety” is a molecule that allows affinity separation of molecules, such as nucleic acids, linked to the capture moiety from molecules lacking the capture moiety.
- Exemplary capture moieties include biotin, which allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinity separation through binding to a complementary oligonucleotide linked or linkable to a solid phase.
- a “target region” refers to a genomic locus targeted for identification and/or capture, for example, by using probes (e.g., through sequence complementarity).
- a “target region set” or “set of target regions” refers to a plurality of genomic loci targeted for identification and/or capture, for example, by using a set of probes (e.g., through sequence complementarity).
- “Specifically binds” in the context of a primer, a probe, or other oligonucleotide and a target sequence means that under appropriate hybridization conditions, the primer, oligonucleotide, or probe hybridizes to its target sequence, or replicates thereof, to form a stable hybrid, while at the same time formation of stable non-target hybrids is minimized.
- a primer or probe hybridizes to a target sequence or replicate thereof to a sufficiently greater extent than to a non-target sequence, to ultimately enable capture or detection of the target sequence.
- Appropriate hybridization conditions are well-known in the art, may be predicted based on sequence composition, or can be determined by using routine testing methods (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual, 2 nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989) at ⁇ 1.90-1.91, 7.37-7.57, 9.47-9.51 and 11.47-11.57, particularly ⁇ 9.50-9.51, 11.12-11.13, 11.45-11.47 and 11.55-11.57, incorporated by reference herein).
- sequence-variable target regions refer to target regions that may exhibit changes in sequence such as nucleotide substitutions (i.e., single nucleotide variations), insertions, deletions, or gene fusions or transpositions in neoplastic cells (e.g., tumor cells and cancer cells) relative to normal cells.
- a sequence-variable target region set is a set of sequence-variable target regions.
- the sequence-variable target regions are target regions that may exhibit changes that affect less than or equal to 50 contiguous nucleotides, e.g., less than or equal to 40, 30, 20, 10, 5, 4, 3, 2, or 1 nucleotides.
- Epigenetic target regions refers to target regions that may show sequence- independent differences in different cell or tissue types (e.g., different types of immune cells) or in neoplastic cells (e.g., tumor cells and cancer cells) relative to normal cells; or that may show sequence-independent differences (i.e., in which there is no change to the nucleotide sequence, e.g., differences in methylation, nucleosome distribution, or other epigenetic features) in DNA, e.g., from different cell types or from subjects having cancer relative to DNA from healthy subjects.
- sequence-independent differences i.e., in which there is no change to the nucleotide sequence, e.g., differences in methylation, nucleosome distribution, or other epigenetic features
- sequence-independent changes include, but are not limited to, changes in methylation (increases or decreases), nucleosome distribution, fragmentation patterns, CCCTC- binding factor (“CTCF”) binding, transcription start sites (e.g., with respect to any one of more Atty. Docket No. GH0154WO / 01228-0041-00PCT of binding of RNA polymerase components, binding of regulatory proteins, fragmentation characteristics, and nucleosomal distribution), and regulatory protein binding regions.
- Epigenetic target region sets thus include, but are not limited to, hypermethylation variable target region sets, hypomethylation variable target region sets, and fragmentation variable target region sets, such as CTCF binding sites and transcription start sites.
- loci susceptible to neoplasia-, tumor-, or cancer-associated focal amplifications and/or gene fusions may also be included in an epigenetic target region set because detection of a change in copy number by sequencing or a fused sequence that maps to more than one locus in a reference genome tends to be more similar to detection of exemplary epigenetic changes discussed above than detection of nucleotide substitutions, insertions, or deletions, e.g., in that the focal amplifications and/or gene fusions can be detected at a relatively shallow depth of sequencing because their detection does not depend on the accuracy of base calls at one or a few individual positions.
- An epigenetic target region set is a set of epigenetic target regions.
- a “agent that recognizes a modified nucleobase in DNA,” such as an “agent that recognizes a modified cytosine in DNA” refers to a molecule or reagent that binds to or detects one or more modified nucleobases in DNA, such as methyl cytosine.
- a “modified nucleobase” is a nucleobase that comprises a difference in chemical structure from an unmodified nucleobase. In the case of DNA, an unmodified nucleobase is adenine, cytosine, guanine, or thymine. In some embodiments, a modified nucleobase is a modified cytosine.
- a modified nucleobase is a methylated nucleobase.
- a modified cytosine is a methyl cytosine, e.g., a 5-methyl cytosine.
- the cytosine modification is a methyl.
- Agents that recognize a methyl cytosine in DNA include but are not limited to “methyl binding reagents,” which refer herein to reagents that bind to a methyl cytosine.
- Methyl binding reagents include but are not limited to methyl binding domains (MBDs) and methyl binding proteins (MBPs) and antibodies specific for methyl cytosine. In some embodiments, such antibodies bind to 5-methyl cytosine in DNA.
- the DNA may be single-stranded or double-stranded.
- Suitable agents include agents that recognize modified nucleotides in double-stranded DNA, single-stranded DNA, and both double- stranded and single-stranded DNA.
- “Substantially free” means free to a sufficient extent that the relevant properties are not meaningfully impacted by the presence of a minor impurity, such as the presence of a minor number of modified cytosines in a first arm region, a second arm region, a first stem region, Atty. Docket No. GH0154WO / 01228-0041-00PCT and/or a second stem region of a Y-shaped adapter as disclosed herein.
- Substantially free does not require 100% (such as exactly 100% of the cytosines of a first arm region, a second arm region, a first stem region, and/or a second stem region of a Y-shaped adapter are unmodified cytosines) but can include an amount equal to or greater than 90%, such as 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% (such as 90%-100% of the cytosines of a first arm region, a second arm region, a first stem region, and/or a second stem region of a Y-shaped adapter are modified cytosines).
- a first arm region, a second arm region, a first stem region, and/or a second stem region of a Y-shaped adapter as disclosed herein are “substantially free” of unmodified cytosines if at least 90% of the cytosines are modified cytosines.
- “Or” is used in the inclusive sense, i.e., equivalent to “and/or,” unless the context requires otherwise. II. Exemplary methods A.
- the present disclosure provides methods of performing a sequencing by synthesis reaction comprising a step of modified end repair (such as end repair performed using a modified dCTP, such as d5mCTP) and/or the use of modified NGS adapters (e.g., modified Y-shaped adapters).
- modified end repair method that can improve methylation detection accuracy in dsDNA library preparation based SSM workflows by marking end-repair synthesized bases as methylated is disclosed in International Application No. PCT/US2023/070763, which is incorporated by reference herein in its entirety. This process results in a methylated, end repair ‘scar’ at the 3’ end of a DNA molecule that has high complexity after an epigenetic base conversion process.
- Sequence complexity can be important for effective template generation (e.g., in NGS workflows, such as on Illumina sequencing platforms, such as MiSeq and HiSeq 2500 Atty. Docket No. GH0154WO / 01228-0041-00PCT systems) and for the generation of high-quality data. Complexity can be especially important during the first 4–7 cycles of the first sequencing read because the sequencing software can use images from these early cycles to identify the location of each cluster, e.g., during template generation.
- a cluster is a clonal grouping of template DNA, e.g., bound to the surface of a flow cell. Each cluster is typically seeded by a single, template DNA strand and is clonally amplified until the cluster has approximately 1000 copies.
- Each cluster (e.g., on the flow cell) produces a single sequencing read. Sequence complexity may also be important for the first 25 cycles in the first sequencing read because this is when phasing/pre-phasing, color matrix corrections, and the pass filter calculations may occur. For example, in Illumina systems, these corrections and calculations can be used in base calling and quality score calculations for all cycles in a run for the clusters that pass filter. [000143] Accordingly, the disclosed methods that use modified end repair and/or modified Y- shaped adapters can provide high quality and high yield sequencing by providing sequence complexity at the start of read 1 as used for cluster quality control (e.g., for the first approximately 25 cycles).
- the high-complexity end-repair scar is sequenced at the start of read 1, thus improving sequencing quality and yield.
- the disclosed methods allow for high quality and high yield sequencing by using Y-shaped adapters (also referred to herein as “complement” Y-shaped adapters) that comprise the reverse complement of the Illumina-specific sequences of the current/standard NGS adapters (as exemplified in Figure 3).
- the ‘top’ strand, i.e., the 5’-connecting strand, of the adapter is modified to include the ‘reverse’/read1 Illumina primer sequence rather than the ‘forward’/read1 Illumina primer sequence.
- the ‘bottom’ strand, i.e., the 3’-connecting strand, of the adapter is modified to include the ‘forward’/read1 Illumina primer binding site, rather than the ‘reverse’/read2 Illumina primer binding site.
- Molecular barcodes in the adapters can similarly be modified (swapped) in sequence as appropriate.
- the 3’ T overhang (and/or 3’C overhang in some embodiments) remains on the ‘top’ strand, i.e., the 5’-connecting strand, of the adapter.
- read 1 begins at the 3’ end of an adapted DNA molecule.
- an Atty. Docket No. GH0154WO / 01228-0041-00PCT exemplary current/standard NGS adapter (left) comprises 25 cytosines in constant regions, whereas an exemplary “complement” Y-shaped adapter of use in the disclosed methods comprises 12 cytosines in constant regions. This approximately 50% reduction in the number of cytosines can reduce 5mC adapter costs, e.g., by approximately 40%.
- library and/or sample-indexing amplification can use the same primers as the disclosed workflows.
- a first cycle of library amplification begins with the forward primer, rather than the reverse primer.
- the cytosine bases in the adapters are modified such that they are resistant to a base conversion procedure.
- methods disclosed herein comprise performing a sequencing by synthesis reaction on a converted DNA molecule with a sequencing by synthesis instrument.
- the converted DNA molecule comprises ligated adapters, a converted region comprising one or more nucleobases that have been converted by a conversion procedure, and a resistant region comprising one or more nucleobases that are resistant to the conversion procedure.
- the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region, calibrating one or more base calling metrics of the sequencing by synthesis instrument based at least in part on data from a region of the converted DNA molecule that is resistant to the conversion procedure, and calling at least a portion of nucleobases in the converted region using the one or more calibrated base calling metrics.
- methods comprise ligating adapters to a DNA molecule, wherein the adapters comprise a resistant region comprising one or more nucleobases that are resistant to a conversion procedure and the DNA molecule comprises one or more nucleobases that are substrates for the conversion procedure.
- a conversion procedure is performed on the adapted DNA molecule, producing a converted DNA molecule comprising a converted region that comprises one or more nucleobases that have been converted by the conversion procedure.
- a sequencing by synthesis reaction is performed on the converted DNA molecule with a sequencing by synthesis instrument, wherein the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region.
- One or more base calling metrics of the sequencing by synthesis instrument is calibrated based at least in part on data from the Atty. Docket No. GH0154WO / 01228-0041-00PCT resistant region, and at least a portion of nucleobases in the converted region is called using the one or more calibrated base calling metrics.
- disclosed methods comprise subjecting a DNA molecule comprising one or more nucleobases that are substrates for a conversion procedure to end repair.
- the end repair comprises extending a recessed 3’ end of the DNA molecule using a DNA polymerase and deoxyribonucleotides comprising a nucleobase that is resistant to the conversion procedure.
- the generated end-repaired DNA molecule comprises a resistant region that comprises the nucleobase resistant to the conversion procedure.
- Adapters are ligated to the end-repaired DNA molecule, and a conversion procedure is performed on the adapted DNA molecule, producing a converted DNA molecule comprising a converted region that comprises one or more nucleobases that have been converted by the conversion procedure.
- a sequencing by synthesis reaction is performed on the converted DNA molecule with a sequencing by synthesis instrument, wherein the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region.
- One or more base calling metrics of the sequencing by synthesis instrument is calibrated based at least in part on data from the resistant region, and at least a portion of nucleobases in the converted region is called using the one or more calibrated base calling metrics.
- the ligation is a blunt end ligation.
- the ligation is a sticky end ligation.
- the ligating seals one or more nicks present in the DNA.
- the end repair is performed with a DNA polymerase that does not have 5’-3’ exonuclease activity and/or is not a strand displacing DNA polymerase.
- the DNA polymerase is T4 DNA polymerase, T7 DNA polymerase, or Klenow fragment. Ligation of adapters to a DNA molecule of use in a disclosed method can be performed using any appropriate procedure known in the art and/or as disclosed elsewhere herein.
- the DNA to be sequenced in the sequencing by synthesis reaction is cell-free DNA.
- the DNA to be sequenced is isolated from a blood sample, e.g., from cells of a blood sample.
- DNA to be sequenced is isolated from a tissue sample, such as a tumor sample.
- a tissue sample such as a tumor sample.
- the methylation levels of DNA e.g., cell-free DNA or DNA isolated from a sample comprising cells, such as a blood sample
- the DNA originated from a tumor cell e.g., wherein the cancer is a solid tumor cancer or a hematological cancer.
- the DNA did not originate from a tumor cell.
- the cancer is not a hematological cancer.
- the cancer is a solid tumor cancer, e.g., a carcinoma, adenocarcinoma, or sarcoma.
- cancers, including solid tumor cancers such as carcinomas, adenocarcinomas, and sarcomas may cause changes to cell type distribution represented in cfDNA or other samples relative to the cell type distribution in a healthy subject or subject that does not have cancer. See Nabet et al. Cell.2020, 183:363-376; Watson et al. Sci.
- Such changes may be detected in the methods herein and can be useful in detecting cancer as well as determining cancer prognosis and/or treatment options.
- the disclosed methods can be combined with analysis of one or more additional biomarkers.
- the disclosed methods are combined with one or more methods, such as but not limited to, methods for assessing DNA methylation patterns, DNA mutations (such as somatic mutations), nucleic acid fragmentation patterns, non-coding RNA (such as micro RNAs (miRNAs), ribosomal RNAs, transfer RNAs, small nucleolar RNAs (snow RNAs), and/or small nuclear RNAs (snRNAs)) levels, and/or cell type proportions/levels, cellular locations, and/or structural modifications of one or more proteins (such as in a sample from a subject).
- methods for assessing DNA methylation patterns such as DNA mutations (such as somatic mutations), nucleic acid fragmentation patterns, non-coding RNA (such as micro RNAs (miRNAs), ribosomal RNAs, transfer RNAs, small nucleolar RNAs (snow RNAs), and/or small nuclear RNAs (snRNAs)) levels, and/or cell type proportions/levels,
- the disclosed methods are combined with one or more analyses of genetic variations including mutations, rare mutations, indels, rearrangements, copy number variations, transversions, translocations, recombinations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and/or abnormal changes in nucleic acid 5-methylcytosine.
- Adapters [000153] Some embodiments of the disclosed methods use Y-shaped adapters, e.g., that comprise the reverse complement of the Illumina-specific sequences of the current/standard NGS adapters (as exemplified in Figure 3). As described herein, methods that use a disclosed Y- shaped adapter allow for high quality and high yield sequencing due to maintenance of base diversity at the start of read 1 as used for cluster quality control (e.g., for the first approximately 25 cycles).
- adapters (such as the Y-shaped adapters disclosed herein) comprise at least one tag. In some embodiments, the at least one tag comprises a molecular barcode.
- the adapter is a Y-shaped adapter that comprises a first strand and a second strand.
- the first strand of the Y- shaped adapter comprises a first arm region and a first stem region
- the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region is configured not to anneal to the first arm region.
- the first arm region is located 5’ of the first stem region and the second arm region is located 3’ of the second stem region.
- the first arm region and second arm region each comprise at least one nucleobase that is resistant to the conversion procedure.
- the number of nucleobases with a base-pairing specificity complementary to the nucleobases that are resistant to the conversion procedure in the first arm region is greater than the number of nucleobases that are resistant to the conversion procedure in the first arm region, and/or (ii) the number of nucleobases that are resistant to the conversion procedure in the first arm region is less than 25% of the number of nucleobases in the first arm region.
- the number of nucleobases with a base-pairing specificity complementary to the nucleobases that are resistant to the conversion procedure in the second arm region is greater than the number of nucleobases that are resistant to the conversion procedure in the second arm region and/or (ii) the number of nucleobases that are resistant to the conversion procedure in the second arm region is less than 25% of the number of nucleobases in the second arm region.
- the adapter is Y-shaped oligonucleotide adapter comprising first and second strands, wherein (a) the first strand comprises a first arm region and a first stem region; (b) the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region Atty. Docket No.
- GH0154WO / 01228-0041-00PCT is configured not to anneal to the first arm region;(c) the first arm region and second arm region each comprise one or more modified nucleobases that are resistant to a conversion procedure; (d) (i) the number of nucleobases with a base-pairing specificity complementary to the modified nucleobases that are resistant to the conversion procedure in the first arm region is greater than the number of modified nucleobases that are resistant to the conversion procedure in the first arm region, and/or (ii) the number of modified nucleobases that are resistant to the conversion procedure in the first arm region is less than 25% of the number of nucleobases in the first arm region; and (e) (i) the number of nucleobases with a base-pairing specificity complementary to the modified nucleobases that are resistant to the conversion procedure in the second arm region is greater than the number of modified nucleobases that are resistant to the conversion procedure in the second arm region and/or (ii) the
- the nucleobase that is resistant to the conversion procedure comprises a modified nucleobase, such as 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5- hydroxymethylcytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 8- oxoguanine (8oxoG), 5-pyrrolo cytosine, 5-glucoylhydroxymethylated (5-ghmC), 5- caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- the nucleobase that is resistant to the conversion procedure is a modified cytosine.
- the modified cytosine is 5-methylcytosine. In other particular embodiments, the modified cytosine is 5-hydroxymethylcytosine.
- the first arm region, the second arm region, the first stem region, and/or the second stem region comprise one or more modified cytosines, such as 5-methylcytosine, 5-hydroxymethylcytosine, 5-caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- the adapter is a Y-shaped oligonucleotide adapter comprising first and second strands, wherein (a) the first strand comprises a first arm region and a first stem region; (b) the second strand comprises a second arm region and a second stem region, wherein the second stem region is configured to anneal to the first stem region and the second arm region is configured not to anneal to the first arm region; (c) the first arm region and second arm region each comprise modified cytosines; (d) (i) the number of guanines in the first arm region is greater than the number of modified cytosines in the first arm region, and/or (ii) the number of modified cytosines in the first arm region is less than 25% of the number of nucleobases in the Atty.
- the modified cytosine comprises 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), 5-pyrrolo cytosine, 5- glucoylhydroxymethylated (5-ghmC), 5-caryboxylcytosine (5-caC), and/or 5-propynyl cytosine.
- the modified cytosine is 5-methylcytosine. In other embodiments, the modified cytosine is 5-hydroxymethylcytosine.
- the first arm region, the second arm region, the first stem region, and/or the second stem region of the adapter comprise one or more modified cytosines, optionally wherein the one or more modified cytosines are 5-methylcytosine, 5-hydroxymethylcytosine, 5-caryboxylcytosine (5- caC), and/or 5-propynyl cytosine.
- the one or more modified cytosines are 5-methylcytosine, 5-hydroxymethylcytosine, 5-caryboxylcytosine (5- caC), and/or 5-propynyl cytosine.
- at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% of the cytosines in the first arm region, the second arm region, the first stem region, and/or the second stem region are modified cytosines.
- the first arm region, the second arm region, the first stem region, and/or the second stem region are substantially free of unmodified cytosines.
- the first arm region, the second arm region, the first stem region, and/or the second stem region are at least 90% free of unmodified cytosines, such as 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% free of unmodified cytosines.
- the resistant region is at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19, or at least about 20 nucleotides in length.
- the resistant region is about 10-40, about 10-35, about 10-30, about 10-25, about 10-20, about 10-15, about 15-40, about 15-35, about 15-30, about 15-25, about 15-20, about 20-40, about 20-35, about 20-30, or about 20-25 nucleotides in length.
- the resistant region is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. [000161] In some embodiments, the resistant region is located 3’ of the converted region.
- the resistant region comprises at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, or at least 15 nucleobases that are resistant to the conversion procedure.
- the Atty. Docket No. GH0154WO / 01228-0041-00PCT resistant region comprises 2-30, 2-25, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, or 2-3 nucleobases that are resistant to the conversion procedure.
- the sequencing comprises a sequencing by synthesis reaction.
- a sequencing by synthesis reaction the sequence of a DNA molecule is determined by detecting the incorporation of each nucleotide into a complementary strand synthesized by a DNA polymerase. As the polymerase synthesizes a copy of a single strand of DNA, the incorporation of each nucleotide is monitored, such as by detection of fluorescently labeled nucleotides.
- the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of a converted region and a resistant region in a converted DNA molecule comprising ligated adapters; (b) calibrating one or more base calling metrics of the sequencing by synthesis instrument based at least in part on data from the resistant region, thereby providing one or more calibrated base calling metrics; and (c) calling at least a portion of nucleobases in the converted region using the one or more calibrated base calling metrics.
- the sequencing by synthesis reaction comprises sequencing the DNA in a manner that distinguishes the first nucleobase from the second nucleobase.
- the sequencing by synthesis reaction comprises next generation sequencing.
- the sequencing by synthesis reaction comprises generating a plurality of sequencing reads and mapping the plurality of sequencing reads to one or more reference sequences to generate mapped sequence reads.
- the one or more base calling metrics comprises clusters passing filter, phasing/pre- phasing, and/or color matrix corrections values.
- sequence coverage of the genome may be, for example, less than 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100%.
- sequence reactions may provide for sequence coverage of, for example, at least 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, or 80% of the genome. Sequence coverage can performed on, for example, at least 5, 10, 20, 70, 100, 200 or 500 different genes, or up to, for example, 5000, 2500, 1000, 500 or 100 different genes.
- Simultaneous sequencing reactions may be performed using multiplex sequencing. In some cases, cell-free nucleic acids may be sequenced with at least, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions.
- cell-free nucleic acids may be sequenced with less than, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions. Sequencing reactions may be performed sequentially or simultaneously. Subsequent data analysis may be performed on all or part of the sequencing reactions. In some cases, data analysis may be performed on at least, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions. In other cases, data analysis may be performed on less than, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions.
- An exemplary read depth is 1000-50000 or 1000-10000 or 1000-20000 reads per locus (base).
- sequencing of epigenetic target regions e.g., to analyze a modified nucleoside profile of DNA
- sequencing of a sequence- variable target region e.g., for analysis of mutations.
- lesser sequencing depths may in some cases be adequate for the methods described herein.
- nucleic acids corresponding to the sequence-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to the epigenetic target region set.
- nucleic acids corresponding to the hydroxymethylation-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to at least one other target region set. For example, the depth of sequencing for nucleic acids corresponding to the sequence-variable and/or hydroxymethylation- Atty. Docket No.
- variable target region sets may be at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15-fold greater, or 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to 4.5-, 4.5- to 5-, 5- to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-, 9- to 10-, 10- to 11-, 11- to 12-, 13- to 14-, 14- to 15-fold, or 15- to 100-fold greater, than the depth of sequencing for nucleic acids corresponding to the epigenetic target region set or to at least one
- said depth of sequencing is at least 2-fold greater. In some embodiments, said depth of sequencing is at least 5-fold greater. In some embodiments, said depth of sequencing is at least 10-fold greater. In some embodiments, said depth of sequencing is 4- to 10-fold greater. In some embodiments, said depth of sequencing is 4- to 100-fold greater.
- Each of these embodiments refer to the extent to which nucleic acids corresponding to the sequence-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to the epigenetic target region set.
- the captured cfDNA corresponding to the sequence-variable target region set and the captured cfDNA corresponding to the epigenetic target region set are sequenced concurrently, e.g., in the same sequencing cell (such as the flow cell of an Illumina sequencer) and/or in the same composition, which may be a pooled composition resulting from recombining separately captured sets or a composition obtained by capturing the cfDNA corresponding to the sequence-variable target region set and the captured cfDNA corresponding to the epigenetic target region set in the same vessel.
- the captured cfDNA corresponding to the hydroxymethylation variable target region set and the captured cfDNA corresponding to the at least one other target region set are sequenced concurrently, e.g., in the same sequencing cell (such as the flow cell of an Illumina sequencer) and/or in the same composition, which may be a pooled composition resulting from recombining separately captured sets or a composition obtained by capturing the cfDNA corresponding to the hydroxymethylation variable target region set and the captured cfDNA corresponding to the at least one other target region set in the same vessel.
- end repair when end repair is performed with unmodified dNTPs prior to methylation-sensitive sequencing (e.g., bisulfite sequencing), the end repair can lead to 5’overhang filling, nick translation and gap filling with dCTP comprising unmodified cytosines. These unmodified cytosines may not reflect the original methylation status at these positions in the original DNA molecule (i.e., prior to the formation of the overhang, nicks and gaps) and thus the end repair can lead to artifactual methylation information.
- the methods disclosed herein avoid such artifactual information by identifying the sequencing data corresponding to these synthesized regions.
- Such regions can, e.g., then be filtered such that they are not used to classify the methylation status of the DNA molecule.
- the regions synthesized in the end repair can be classified in a variety of ways and the exact approach will depend on the identity of the modified base used in the end repair reaction as well as the modification sensitive sequencing method being used. Moreover, the exact end points of the regions classified as being synthesized during end repair can be determined by the user.
- the basic step of the identification of the regions synthesized during the end repair reaction is the identification of the presence of the base modification in the at least one type of dNTP used in the end repair reaction.
- the end repair is performed with dNTP comprising 5mC or 5hmC.5mC and 5hmC are both naturally occurring base modifications, so upon the identification of these modified bases in the sequencing data may derive from: (i) modified bases present in the original DNA molecule; or (ii) modified bases introduced in the end repair reaction.5mC and 5hmC can, however, be classified as being introduced in the end repair reaction when they occur in a non-CpG sequence context. While CpH (i.e., CpA, CpT, CpC) methylation has been described in humans, it is thought to comprise 0.02% of total methyl- cytosine in differentiated somatic cells (Jang et al.
- methylated cytosines in a CpH sequence context can confidently be attributed to regions synthesized during end repair. This is particularly the case when the disclosed methods comprise enrichment for a sequence panel wherein the panel does not comprise regions known to contain methylated CpH sites.
- the classification of whether or not a methylated CpH is part of a synthesized region can be made by accounting for: (i) the position of particular of the Atty. Docket No. GH0154WO / 01228-0041-00PCT CpH site in a reference sequence; and/or (ii) the methylation status of the surrounding CpH sites.
- methylation detected at that CpH site can be ignored when defining regions synthesized during end repair.
- detected methylation at such CpH sites can be called as the true methylation status in the DNA sample. If a CpH site known to be methylated in nature is detected as being methylated in the sequencing data, but is contained within a string of other methylated CpH sites, some of which are not known to be methylated in nature (e.g., by comparison to reference data), the region may still be classified as being synthesized during end repair.
- a region of the one or more regions of the end-repaired DNA that were synthesized during the end repair is defined as: (i) the sequence between two non-methylated cytosines which span a methylated non-CpG cytosine; and/or (ii) the sequence between a non-methylated cytosine and the end of a sequence read wherein there is no additional non-methylated cytosine between the non-methylated cytosine and the end of the sequence read.
- the one or more regions of the end-repaired DNA that were synthesized during the end repair may be defined as: (i) the sequence from a first methylated non-CpG cytosine to the last methylated non- CpG cytosines in one or more consecutive methylated non-CpG cytosines; and/or (ii) the sequence from a methylated cytosine (5mC or 5hmC) in a non-CpG context to the end of a sequence read wherein there is no non-methylated cytosine between the methylated cytosine in the non-CpG context and the end of the sequence read.
- the “end of the sequence read” refers to the portion of the sequence read which corresponds to the end-repaired DNA molecule and does not include, e.g. adapter sequences.
- the end repair is performed with dNTPs comprising base modifications which are not naturally found in the subject the DNA sample derives from, or are present at only very low frequencies. For example 4mC does not occur in mammals (e.g. humans), whereas 6mA occurs only at very low frequencies (Xiao et al. Molecular Cell Volume 71, Issue 2, 19 July 2018, Pages 306-318.e7).
- the regions of the end- repaired DNA that were synthesized during the end repair can be classified simply as any region wherein the modified base is detected.
- a region of the one or more regions is defined as: (i) the sequence between two non- modified bases spanning a modified base, wherein the bases are of the same identity as the bases present in the at least one type of dNTP comprising the modified base; and/or (ii) the sequence between a non-modified base and the end of a sequence read, wherein there is no additional non- modified bases between the non-modified base and the end of the sequence read, where the non- modified bases are of the same identity as the modified base present in the at least one type of dNTP comprising the modified base.
- the one or more regions of the end-repaired DNA that were synthesized during the end repair may be defined as: (i) the sequence from a first modified to the last modified base in one or more consecutive modified bases wherein the bases are of the same identity as the bases present in the at least one type of dNTP comprising the modified base; and/or (ii) the sequence from a modified base to the end of a sequence read wherein there is no non-modified base between the modified base and the end of the sequence read where the modified base and non-modified base are of the same identity to the at least one type of dNTP comprising the modified base.
- the regions of the end-repaired DNA classified as being synthesized during the end repair may be filtered out of the sequence data such that they are not used for further analysis, such as variant calling or for determining the modification status of bases in the original DNA molecule (i.e., prior to end repair). Accordingly, in some embodiments the methods disclosed herein further comprise analyzing at least some of the sequence data corresponding to regions that are not identified as being synthesized during the end repair to detect the presence or absence of base modifications or mutations present in the DNA sample.
- a DNA molecule comprising one or more nucleobases that are substrates for a conversion procedure is subjected to end repair.
- the end repair comprises extending a recessed 3’ end of the DNA molecule using a DNA polymerase and deoxyribonucleotides comprising a nucleobase that is resistant to the conversion procedure, thereby generating an end-repaired DNA molecule comprising a resistant region that comprises the nucleobase resistant to the conversion procedure.
- the end repair is performed with a DNA polymerase that does not have 5’-3’ exonuclease activity and/or is not a strand displacing DNA polymerase.
- the DNA polymerase is T4 DNA polymerase, T7 DNA polymerase, or Klenow fragment.
- End repair refers to methods for repairing DNA by the conversion of non-blunt ended DNA into blunt ended DNA. Sequencing workflows typically use end repair to make ends of DNA molecules compatible with adapters, which are subsequently ligated onto the DNA. Fragmented and/or damaged DNA (e.g., cfDNA or DNA from FFPE samples) often contain non- blunt ends, which contain 3’overhangs and/or 5’overhangs. A 3’overhang refers to the 3’ end of a DNA strand which extends beyond the 5’end of the paired strand, resulting in one or more unpaired nucleotides at the 3’end of the DNA strand.
- end repair refers to methods for repairing DNA by the conversion of non-blunt ended DNA into blunt ended DNA. Sequencing workflows typically use end repair to make ends of DNA molecules compatible with adapters, which are subsequently ligated onto the DNA. Fragmented and/or damaged DNA (e.g., cfDNA or DNA from FFPE samples) often contain non- blunt ends
- a 5’overhang refers to the 5’ end of a DNA strand which extends beyond the 3’end of the paired strand, resulting in one or more unpaired nucleotides at the 5’end of the DNA strand.
- the process of end repair involves the conversion of double-stranded DNA with 3’overhangs and/or 5’overhangs to double-stranded DNA without overhangs. This can be done using an enzyme such as T4 DNA polymerase and/or Klenow fragment.
- end repair is conducted in the presence of dATP, dCTP, dGTP and dTTP.
- End repair can also include a second step, which involves the addition of a phosphate group to the 5' ends of DNA, by an enzyme such as polynucleotide kinase.
- A-tailing refers to the addition of a single deoxyadenosine residue to the end of a blunt-ended double-stranded DNA fragment to form a 3' deoxyadenosine single-base overhang.
- a tailing reactions are conducted with polymerases that have the ability to add a non-templated A to the 3' end of a blunt, double-stranded DNA molecule.
- Polymerases capable of A-tailing typically do not possess 3’-5’ exonuclease activity.
- GH0154WO / 01228-0041-00PCT tailing is performed as a separate reaction to end repair, it can be conducted in the presence of dATP, but the absence of dCTP, dTTP and dGTP.
- A-tailed fragments are not compatible for self- ligation (i.e., self-circularizatian and concantenation of the DNA), but they are compatible with 3' T-overhangs, which can be used on adapters.
- Methods comprising end repair, A-tailing, and ligation to adapters with 3' T-overhangs can result in higher efficiency ligation, compared to blunt ended ligation, as blunt ligation can lead to self-ligation of the adapters and/or DNA molecules.
- the methods disclosed herein comprise end repair of the DNA molecules followed by blunt end ligation of adapters.
- the methods disclosed herein comprise end repair of the DNA molecules followed by A-tailing and sticky-end ligation of T-tailed adapters.
- an A-tailing step it may be performed separately from the end repair with an intervening reaction clean-up step or it may be performed in the same reaction as the end repair (e.g. using NEBNext® UltraTM II End Repair/dA-Tailing Module (E7546)).
- the end-repair and the A-tailing reaction are performed in the same reaction mixture, optionally wherein the end-repair and the A-tailing reaction are performed a single tube and/or optionally wherein the end-repair and the A-tailing reaction are performed without an intervening clean-up step.
- a sticky-end ligation may be performed with a mixture of T-tailed adapters and C-tailed adapters.
- the end-repair and the A-tailing reactions are performed in a single tube. In such cases, the A tailing reaction can be performed at a higher temperature than the end repair.
- the A tailing reaction can be performed using a thermostabile polymerase (e.g., Taq DNA polymerase, Tfl DNA polymerase, Bst DNA Polymerase, Large Fragment or Tth DNA polymerase) and the method further comprises increasing temperature of the sample after the end repair to inactivate the polymerase used in end repair (e.g., T4 DNA polymerase or Klenow fragment).
- a thermostabile polymerase e.g., Taq DNA polymerase, Tfl DNA polymerase, Bst DNA Polymerase, Large Fragment or Tth DNA polymerase
- the method further comprises increasing temperature of the sample after the end repair to inactivate the polymerase used in end repair (e.g., T4 DNA polymerase or Klenow fragment).
- the A- tailing is performed using a DNA polymerase that: (i) does not possess 5’-3’ exonuclease activity; and/or (ii) is not a strand displacing DNA polymerase.
- the A-tailing is performed using Taq DNA polymerase. In other embodiments, the A-tailing is performed using Tfl polymerase, Bst DNA Polymerase, Large Fragment or Tth polymerase. [000184] In some embodiments of the methods disclosed herein the end repair is performed with a polymerase which lacks 5’to 3’ exonuclease activity and/or strand displacement activity.
- the polymerase used in the end repair reaction may be Q5® High-Fidelity DNA Polymerase, Q5U® Hot Start High-Fidelity DNA Polymerase, Phusion® High-Fidelity DNA Polymerase, Hemo KlenTaq, phi29 DNA Polymerase, T7 DNA Polymerase, DNA Polymerase I (E. coli), DNA Polymerase I, Large (Klenow) Fragment (“Klenow fragment”) or T4 DNA Polymerase.
- the polymerase used in the end repair is T4 DNA Polymerase or Klenow fragment.
- the methods disclosed herein comprise an A tailing reaction after the end repair and before the ligation reaction, wherein the end repair and A tailing reactions are separated by a reaction cleanup.
- the A tailing reaction is typically performed in the presence of dATP, but in the absence of dCTP, dTTP and dGTP.
- the A tailing reaction is performed using Klenow Fragment lacking 3'-5' exonuclease activity.
- the dNTP that comprises a modified base may comprise any modified base wherein the presence or the absence of the modification can be detected by a type of modification sensitive sequencing.
- the modified base may be 4-methylcytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), N6-methyladenosine (6mA), bromodeoxyuridine (BrdU), 5-fluorodeoxyuridine (FldU), 5-iododeoxyuridine (IdU), 5-ethynyldeoxyuridine (EdU) and/or 8-oxoguanine (8oxoG).
- a dNTP comprising a modified base it may be used in place of the equivalent unmodified base in the end repair reaction.
- dCTP comprising 5mC
- multiple types of dNTP comprising a modified base are used in the end repair.
- dATP comprising 6mA and dCTP comprising 5mC can be used in the end repair reaction in place of dATP comprising unmodified adenine and dCTP comprising unmodified cytosine.
- dNTP comprising a modified base
- the use of multiple types of dNTP comprising a modified base is advantageous because it provides increased resolution in defining the regions of the end- repaired DNA molecule which have been synthesized during the end repair reaction. This is because, in this example, the end of a synthesized region can be defined as the first unmodified adenine or unmodified cytosine after a stretch of containing 6mAs and/or 5mCs, rather than relying on the detection of solely an unmodified adenine or solely an unmodified cytosine.
- the modification sensitive sequencing method used will depend on the type of modified base used in the end-repair reaction such that the specific modification can be detected. Exemplary conversion-based methods are described above alongside the base modification which they can detect.
- nanopore-based sequencing can be used to detect 4mC, 5mC, 5hmC, 6mA, BrdU, FdU, IdU, and EdU
- single-molecule real time (SMRT) sequencing from Pacific Biosciences can be used to detect 4mC, 5mC, 5hmC, 6mA, and 8oxoG.
- Adapter ligation [000189] Some embodiments of the disclosed methods comprise ligating an adapter (such as a modified Y-shaped adapter as disclosed herein) to the DNA.
- DNA molecules can be ligated to adapters at either one end or both ends.
- DNA molecules can be ligated with at least partially double stranded adapter (e.g., a Y-shaped or bell-shaped adapter).
- the ligation step can take place before or after the conversion step. In some embodiments, the ligation step is performed after the conversion step.
- DNA ligase and adapters are added to ligate DNA molecules in the sample with an adapter on one or both ends, i.e., to form adapted DNA.
- An adapter is typically a short nucleic acid (e.g., less than about 500, less than about 100 or less than about 50 nucleotides in length, or about 20-30, 20-40, 30-50, 30-60, 40-60, 40-70, 50-60, 50-70, 20-500, or 30-100 bases from end to end) that are typically at least partially double-stranded and can be ligated to the end of a Atty. Docket No. GH0154WO / 01228-0041-00PCT given sample DNA molecule.
- two adapters can be ligated to a single sample DNA molecule, with one adapter ligated to each end of the sample nucleic acid molecule.
- the ligase used in ligation reactions can act on both single strand DNA nicks and double stranded DNA ends.
- the ligase is T4 DNA ligase or T3 DNA ligase.
- Adapters can include nucleic acid primer binding sites to permit amplification of a sample DNA molecule flanked by adapters at both ends, and/or a sequencing primer binding site, including primer binding sites for sequencing applications, such as various next generation sequencing (NGS) applications.
- NGS next generation sequencing
- Adapters can include a sequence for hybridizing to a solid support, e.g., a flow cell sequence. Adapters can also include binding sites for capture probes, such as an oligonucleotide attached to a flow cell support or the like. Adapters can also include sample indexes and/or molecular barcodes. These are typically positioned relative to amplification primer and sequencing primer binding sites, such that the sample index and/or molecular barcode is included in amplicons and sequencing reads of a given DNA molecule. Adapters of the same or different sequence can be linked to the respective ends of a sample DNA molecule.
- adapters of the same or different sequence are linked to the respective ends of the DNA molecule except that the sample index and/or molecular barcode differs in its sequence.
- the adapter is a Y-shaped adapter in which one end is blunt ended or tailed as described herein, for joining to a nucleic acid molecule, which is also blunt ended or tailed with one or more complementary nucleotides to those in the tail of the adapter.
- an adapter is a bell-shaped adapter that includes a blunt or tailed end for joining to a DNA molecule to be analyzed.
- Other exemplary adapters include T-tailed, C- tailed or hairpin shaped adapters.
- a hairpin shaped adaptor can comprise a complementary double stranded portion and a loop portion, where the double stranded portion can be attached (e.g., ligated) to a double-stranded polynucleotide.
- Hairpin shaped sequencing adaptors can be attached to both ends of a polynucleotide fragment to generate a circular molecule, which can be sequenced multiple times.
- the adapters used in the methods of the present disclosure comprise one or more known modified nucleosides, such as methylated nucleosides. In instances where two adapters are ligated to a sample nucleic acid (one at each end), either or both of the adapters may comprise one or more known modified nucleosides.
- the primer binding site(s), sequencing primer binding site(s), sample index(es) and/or molecular barcode(s), if present do not comprise the known modified nucleosides that change base pairing specificity as a result of the conversion procedure.
- Atty. Docket No. GH0154WO / 01228-0041-00PCT [000192]
- adapters may be added to the DNA or a subsample thereof. Adapters can be ligated to DNA at any point in the methods herein. In some embodiments, adapters are ligated to the DNA of a sample or subsample thereof prior to annealing primers to the DNA for capture probe generation.
- the adapter-ligated DNA is amplified prior to annealing primers to the DNA for capture probe generation.
- adapters are ligated to the DNA of a sample or subsample thereof before the DNA is contacted with the capture probes.
- the DNA to which the adapters are ligated is in the same sample or subsample as the DNA used as a template to generate capture probes.
- the DNA to which the adapters are ligated is in a different sample or subsample, e.g., a second sample or a second subsample of a first sample, than the DNA used as a template to generate capture probes.
- the adapters ligated to DNA captured by the capture probes are not complementary to adapters, and the resulting capture probes therefore do not comprise adapters.
- Adapter-ligated DNA can therefore be selectively amplified in the presence of capture probes that do not comprise adapters.
- adapter-ligated DNA can be separated from DNA that does not comprise adapters.
- the disclosed methods comprise analyzing DNA in a sample. In such methods, adapters may be added to the DNA.
- adapters are added by other approaches, such as ligation.
- first adapters are added to the 3’ ends of the nucleic acids by ligation, which may include ligation to single-stranded DNA.
- first adapters are added to the 5’ ends of the nucleic acids by ligation, which may include ligation to single-stranded DNA.
- first adapters are added to the nucleic acids by ligation, which may include ligation to single-stranded DNA (e.g., to the 3’ ends thereof).
- the capture probes can be isolated after partitioning and ligation.
- the hypomethylated partition can be ligated with adapters and a portion of the ligated hypomethylated partition can then be used to generate the capture probes for rearrangements.
- the adapter can be used as a priming site for second-strand synthesis, e.g., using a universal primer and a DNA polymerase.
- a second adapter can then be ligated to at least the 3’ Atty. Docket No.
- the first adapter comprises an affinity tag, such as biotin, and nucleic acid ligated to the first adapter is bound to a solid support (e.g., bead), which may comprise a binding partner for the affinity tag such as streptavidin.
- a solid support e.g., bead
- streptavidin e.g., streptavidin
- nucleic acids are amplified.
- the single-stranded DNA library preparation is performed in a one-step combined phosphorylation/ligation reaction, e.g., as described in Troll et al., BMC Genomics, 20:1023 (2019), available at doi.org/10.1186/s12864-019-6355-0.
- This method called Single Reaction Single-stranded LibrarY (“SRSLY,”) can be performed without end-polishing.
- SRSLY may be useful for converting short and fragmented DNA molecules, e.g., cfDNA fragments, into sequencing libraries while retaining native lengths and ends.
- the SRSLY method can create sequencing libraries (e.g., Illumina sequencing libraries) from fragmented or degraded template (input) DNA.
- template DNA is first heat denatured and then immediately cold shocked to render the template DNA molecules single-stranded.
- the DNA can be maintained as single-stranded throughout the ligation reaction by the inclusion of a thermostable single-stranded binding protein (SSB).
- SSB thermostable single-stranded binding protein
- the template DNA which at this point can be single-stranded and coated with SSB, is placed in a phosphorylation/ligation dual reaction with directional dsDNA NGS adapters that contain single-stranded overhangs.
- Both the forward and reverse sequencing adapters can share similar structures but differ in which termini is unblocked in order to facilitate proper ligations.
- Both sequencing adapters can comprise a dsDNA portion and a single-stranded splint overhang of random nucleotides that occurs on the 3- prime terminus of the bottom strand of the forward adapter and the 5-prime terminus of the bottom strand of the reverse adapter.
- the forward adapter e.g., (P5) Illumina adapter
- the reverse adapter e.g., (P7) Illumina adapter
- the native polarity of input DNA molecules can be retained.
- T4 Polynucleotide Kinase can be used to prepare template DNA termini for ligation by phosphorylating 5-prime termini and dephosphorylating 3-prime termini.
- T4 PNK works on both ssDNA and dsDNA molecules Atty. Docket No. GH0154WO / 01228-0041-00PCT and has no activity on the phosphorylation state of proteins.
- the random nucleotides of the splint adapter can be annealed to the single-stranded template molecule.
- the library DNA can be, e.g., purified and placed directly into standard NGS indexing PCR, compatible with both traditional single or dual index primers.
- the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags.
- Adapters can include the same or different primer binding sites, but preferably adapters include the same primer binding site.
- the nucleic acids are subject to amplification.
- the amplification can use, e.g., universal primers that recognize primer binding sites in the adapters.
- the DNA or a subsample or portion of the DNA is partitioned, comprising contacting the DNA with an agent that preferentially binds to nucleic acids bearing an epigenetic modification.
- the nucleic acids are partitioned into at least two partitioned subsamples differing in the extent to which the nucleic acids bear the modification from binding to the agents.
- nucleic acids overrepresented in the modification preferentially bind to the agent, whereas nucleic acids underrepresented for the modification do not bind or are more easily eluted from the agent.
- the nucleic acids can then be amplified from primers binding to the primer binding sites within the adapters. Partitioning may be performed instead before adapter attachment, in which case the adapters may comprise differential tags that include a component that identifies which partition a molecule occurred in. [000200]
- the nucleic acids are linked at both ends to Y-shaped adapters including primer binding sites and tags. The molecules are amplified. F.
- the DNA molecules of the sample may be tagged with sample indexes and/or molecular barcodes (referred to generally as “tags”).
- tags can be molecules, such as nucleic acids, containing information that indicates a feature of the molecule with which the tag is associated.
- DNA molecules can bear a sample tag or sample index (which distinguishes molecules in one sample from those in a different sample), a partition tag (which distinguishes molecules in one partition from those in a different partition) and/or a molecular tag/molecular barcode (which distinguishes different molecules from one another (in both unique and non-unique tagging scenarios)).
- Tagging strategies can be divided into unique tagging and non-unique tagging strategies. In unique tagging, all or substantially all of the molecules in a sample bear a different tag, so that reads can be assigned to original molecules based on tag information alone. Tags used in such methods are sometimes referred to as “unique tags”.
- a tag can comprise one or a combination of barcodes.
- barcode refers to a nucleic acid molecule having a particular nucleotide sequence, or to the nucleotide sequence, itself, depending on context.
- a barcode can have, for example, between 10 and 100 nucleotides.
- a collection of barcodes can have degenerate sequences or can have sequences having a certain Hamming distance, as desired for the specific purpose. So, for example, a molecular barcode can be comprised of one barcode or a combination of two barcodes, each attached to different ends of a molecule.
- tags can be used to label the individual polynucleotide population partitions so as to correlate the tag (or tags) with a specific partition.
- tags can be used in embodiments of the disclosure that do not employ a partitioning step. In some embodiments, a single tag can be used to label a specific partition.
- tags can be used to label a specific partition.
- the set of tags used to label one partition can be readily differentiated for the set of tags used to label other partitions.
- the tags may have additional functions, for example the tags can be used to index sample sources or used as unique molecular identifiers (which can be used to improve the quality of sequencing data by differentiating sequencing errors from mutations, for example as in Kinde et al., Proc Nat’l Acad Sci USA 108: 9530-9535 (2011), Kou et al., PLoS ONE,11: e0146638 (2016)) or used as non- unique molecule identifiers, for example as described in US Pat.
- tags may have additional functions, for example the tags can be used to index sample sources or used as non-unique molecular identifiers (which can be used to improve the quality of sequencing data by differentiating sequencing errors from mutations).
- Tags may be incorporated into or otherwise joined to adapters by chemical synthesis, ligation (e.g., as described above, e.g., by blunt-end ligation or sticky-end ligation), or overlap extension polymerase chain reaction (PCR), among other methods. Such adapters are ultimately joined to the sample DNA molecule.
- one or more rounds of amplification cycles may be applied to introduce sample indexes to a nucleic acid molecule using conventional nucleic acid amplification methods.
- the amplifications may be conducted in one or more reaction mixtures (e.g., a plurality of microwells in an array).
- Molecular barcodes and/or sample indexes may be introduced simultaneously, or in any sequential order.
- molecular barcodes and/or sample indexes are introduced prior to and/or after any conversion procedure. In the case of molecular barcodes and/or sample indexes being introduced through amplification processes, the conversion step will occur before the molecular barcodes and/or sample indexes are introduced.
- molecular barcodes and/or sample indexes are introduced prior to and/or after sequence capturing steps, if present, are performed. In some embodiments, only the molecular barcodes are introduced prior to probe capturing and the sample indexes are introduced after sequence capturing steps are performed. In some embodiments, both the molecular barcodes and the sample indexes are introduced prior to performing probe-based capturing steps, if present. In Atty. Docket No. GH0154WO / 01228-0041-00PCT some embodiments, the sample indexes are introduced after sequence capturing steps are performed, if present. In some embodiments, sample indexes are incorporated through overlap extension polymerase chain reaction (PCR).
- PCR overlap extension polymerase chain reaction
- the tags may be located at one end or at both ends of the sample DNA molecule.
- tags are predetermined or random or semi-random sequence oligonucleotides.
- the tag(s) may together be less than about 500, 200, 100, 50, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotides in length. Typically tags are about 5 to 20 or 6 to 15 nucleotides in length.
- the tags may be linked to sample DNA molecules randomly or non-randomly.
- each sample or partition (discussed below) is uniquely tagged with a sample index or a combination of sample indexes.
- each nucleic acid molecule of a sample or sub-sample is uniquely tagged with a molecular barcode or a combination of molecular barcodes.
- a plurality of molecular barcodes may be used such that molecular barcodes are not necessarily unique to one another in the plurality (e.g., non-unique molecular barcodes).
- molecular barcodes are generally attached (e.g., by ligation as part of an adapter) to individual molecules such that the combination of the molecular barcode and the sequence it may be attached to creates a unique sequence that may be individually tracked.
- Detection of non-unique molecular barcodes in combination with endogenous sequence information typically allows for the assignment of a unique identity to a particular molecule.
- endogenous sequence information e.g., the beginning (start) and/or end (stop) genomic location/position corresponding to the sequence of the original DNA molecule in the sample, start and stop genomic positions corresponding to the sequence of the original DNA molecule in the sample, the beginning (start) and/or end (stop) genomic location/position of the sequence read that is mapped to the reference sequence, start and stop genomic positions of the sequence read that is mapped to the reference sequence, sub-sequences of sequence reads at one or both ends, length of sequence reads, and/or length of the original DNA molecule in the sample) typically allows for the assignment of a unique identity to a particular molecule.
- This number is a function of the number of molecules falling into the calls.
- the class may be all molecules mapping to the same start-stop position on a reference genome.
- the class may be all molecules mapping across a particular genetic locus, e.g., a particular base or a particular region (e.g., up to 100 bases or a gene or an exon of a gene).
- the number of different tags used to uniquely identify a number of molecules, z, in a class can be between any of 2*z, 3*z, 4*z, 5*z, 6*z, 7*z, 8*z, 9*z, 10*z, 11 *z, 12*z, 13*z, 14*z, 15*z, 16*z, 17*z, 18*z, 19*z, 20*z or 100*z (e.g., lower limit) and any of 100,000*z, 10,000*z, 1000*z or 100*z (e.g., upper limit).
- molecular barcodes are introduced at an expected ratio of a set of identifiers (e.g., a combination of unique or non-unique molecular barcodes) to molecules in a sample.
- a set of identifiers e.g., a combination of unique or non-unique molecular barcodes
- One example format uses from about 2 to about 1,000,000 different molecular barcode sequences, or from about 5 to about 150 different molecular barcode sequences, or from about 20 to about 50 different molecular barcode sequences, ligated to both ends of a target molecule. Alternatively, from about 25 to about 1,000,000 different molecular barcode sequences may be used.
- 20-50 x 20- 50 molecular barcode sequences i.e., one of the 20-50 different molecular barcode sequences can be attached to each end of the target molecule
- Such numbers of identifiers are typically sufficient for different molecules having the same start and stop points to have a high probability (e.g., at least 94%, 99.5%, 99.99%, or 99.999%) of receiving different combinations of identifiers.
- about 80%, about 90%, about 95%, or about 99% of molecules have the same combinations of molecular barcodes.
- the assignment of unique or non-unique molecular barcodes in reactions is performed using methods and systems described in, for example, U.S. Patent Application Nos.20010053519, 20030152490, and 20110160078, and U.S. Patent Nos. 6,582,908, 7,537,898, 9,598,731, and 9,902,992, each of which is hereby incorporated by reference in its entirety.
- different nucleic acid molecules of a sample may be identified using only endogenous sequence information (e.g., start and/or stop positions, sub-sequences of one or both ends of a sequence, and/or lengths).
- Tags can be linked to sample nucleic acids randomly or non-randomly.
- the tagged nucleic acids are sequenced after loading into a microwell plate.
- the microwell plate can have 96, 384, or 1536 microwells. In some cases, they are introduced at an expected ratio of unique tags to microwells.
- the unique tags may be loaded so that more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags are loaded per genome sample.
- the unique tags may be loaded so that less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags are loaded per genome sample.
- the average number of unique tags loaded per sample genome is less than, or greater than, about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags per genome sample.
- a format uses 20-50 different tags (e.g., barcodes) ligated to both ends of target nucleic acids.
- 35 different tags e.g., barcodes
- 35 different tags ligated to both ends of target molecules creating 35 x 35 permutations, which equals 1225 for 35 tags.
- Such numbers of tags are sufficient so that different molecules having the same start and stop points have a high probability (e.g., at least 94%, 99.5%, 99.99%, 99.999%) of receiving different combinations of tags.
- Other barcode combinations include any number between 10 and 500, e.g., about 15x15, about 35x35, about 75x75, about 100x100, about 250x250, about 500x500.
- unique tags may be predetermined or random or semi-random sequence oligonucleotides.
- a plurality of barcodes may be used such that barcodes are not necessarily unique to one another in the plurality.
- barcodes may be ligated to Atty. Docket No. GH0154WO / 01228-0041-00PCT individual molecules such that the combination of the barcode and the sequence it may be ligated to creates a unique sequence that may be individually tracked.
- detection of non-unique barcodes in combination with sequence data of beginning (start) and end (stop) portions of sequence reads may allow assignment of a unique identity to a particular molecule.
- the length or number of base pairs, of an individual sequence read may also be used to assign a unique identity to such a molecule.
- the method includes adding one or more internal control DNAs and forward and reverse primers for amplifying the internal control DNAs.
- the internal control DNAs may be added before amplification using the primers that anneal upstream and downstream of the rearrangement breakpoints.
- the forward and reverse primers for amplifying the internal control DNAs may be included with, or added at the same time as, the primers that anneal upstream and downstream of the rearrangement breakpoints.
- the internal control DNAs may comprise or consist of sequences that do not occur in the genome of the subject, or that do not occur in the genome of the species of which the subject is a member (e.g., the human genome).
- the forward and/or reverse primers for amplifying the internal control DNAs may comprise sequences that are not complementary to any sequence in the genome of the subject, e.g., the human genome.
- the internal control DNAs may be used to ensure that the amplification process proceeded as designed. As such, the method may comprise detecting (e.g., sequencing) molecules amplified from and/or captured by the one or more internal control DNAs.
- the method can comprise comparing an amount of internal control DNAs (e.g., number of molecules or reads detected that correspond to an internal control DNA sequence) to a predetermined threshold, and either rejecting sequencing results if the predetermined threshold is not met or accepting sequencing results if the predetermined threshold is met.
- the predetermined threshold may be established, e.g., based on historical data or by testing the method on samples of DNA from test subjects, such as healthy volunteers. For example, amplification and detection of the one or more internal control DNAs provides confirmation that the amplification process proceeded properly, thus reducing the likelihood of a false negative. G.
- methods disclosed herein comprise a step of subjecting DNA (e.g., cell-free DNA or DNA from a sample comprising cells, such as a blood sample (e.g., a Atty. Docket No. GH0154WO / 01228-0041-00PCT whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample) and/or additional DNA) to a conversion procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA.
- DNA e.g., cell-free DNA or DNA from a sample comprising cells, such as a blood sample (e.g., a Atty. Docket No. GH0154WO / 01228-0041-00PCT whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample) and/or additional DNA) to a conversion procedure that affects a first nucleobase in
- the first nucleobase is a modified or unmodified nucleobase
- the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase
- the first nucleobase and the second nucleobase have the same base pairing specificity.
- the procedure chemically converts the first or second nucleobase such that the base pairing specificity of the converted nucleobase is altered.
- the second nucleobase is a modified or unmodified adenine; if the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified or unmodified cytosine; if the first nucleobase is a modified or unmodified guanine, then the second nucleobase is a modified or unmodified guanine; and if the first nucleobase is a modified or unmodified thymine, then the second nucleobase is a modified or unmodified thymine (where modified and unmodified uracil are encompassed within modified thymine for the purpose of this step).
- the conversion procedure comprises deamination of unmodified cytosines of the DNA to uracil. In some embodiments, the conversion procedure comprises contacting the DNA or a subsample thereof with a cytosine deaminase, such as an APOBEC enzyme, optionally wherein the APOBEC enzyme is APOBEC3A. In some embodiments, the conversion procedure comprises enzymatic protection of one or more modified nucleobases of the DNA, such as glucosylation of the 5-hydroxymethylcytosines of the DNA, optionally wherein the glucosylation comprises contacting the DNA with beta-glucosyltransferase.
- a cytosine deaminase such as an APOBEC enzyme
- the conversion procedure comprises enzymatic protection of one or more modified nucleobases of the DNA, such as glucosylation of the 5-hydroxymethylcytosines of the DNA, optionally wherein the glucosylation comprises contacting the DNA with beta-glucosyltransferase.
- the first nucleobase is a modified or unmodified cytosine
- the second nucleobase is a modified or unmodified cytosine
- the first nucleobase may comprise unmodified cytosine (C) and the second nucleobase may comprise one or more of 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC).
- the second nucleobase may comprise C and the first nucleobase may comprise one or more of mC and hmC.
- Other combinations are also possible, as indicated, e.g., in the Summary above and the following discussion, such as where one of the first and second nucleobases comprises mC and the other comprises hmC.
- the first nucleobase comprises unmodified cytosine (C) and the second nucleobase comprises 5-methylcytosine (mC). In other embodiments, the first nucleobase comprises unmodified cytosine (C) and the second nucleobase comprises 5- hydroxymethylcytosine (hmC).
- the procedure that affects a first Atty. Docket No. GH0154WO / 01228-0041-00PCT nucleobase of the DNA differently from a second nucleobase of the DNA is methylation- sensitive conversion. [000218] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises bisulfite conversion.
- the first nucleobase comprises one or more of unmodified cytosine, 5-formyl cytosine, 5- carboxylcytosine, or other cytosine forms affected by bisulfite
- the second nucleobase may comprise one or more of mC and hmC, such as mC and optionally hmC.
- Sequencing of bisulfite- treated DNA identifies positions that are read as cytosine as being mC or hmC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5-formyl cytosine, or 5-carboxylcytosine.
- Performing bisulfite conversion such as on a DNA sample as described herein, thus facilitates identifying positions containing mC or hmC using the sequence reads obtained from the exemplary sample.
- For an exemplary description of bisulfite conversion see, e.g., Moss et al., Nat Commun.2018; 9: 5068.
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises oxidative bisulfite (Ox-BS) conversion.
- This procedure first converts hmC to fC, which is bisulfite susceptible, followed by bisulfite conversion.
- the first nucleobase comprises one or more of unmodified cytosine, fC, caC, hmC, or other cytosine forms affected by bisulfite
- the second nucleobase comprises mC. Sequencing of Ox-BS converted DNA identifies positions that are read as cytosine as being mC positions.
- positions that are read as T are identified as being T, hmC, or a bisulfite-susceptible form of C, such as unmodified cytosine, fC, or hmC.
- Ox-BS conversion such as on a DNA sample as described herein, thus facilitates identifying positions containing mC using the sequence reads obtained from the sample.
- oxidative bisulfite conversion see, e.g., Booth et al., Science 2012; 336: 934-937.
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises Tet-assisted bisulfite (TAB) conversion.
- TAB Tet-assisted bisulfite
- Sequencing of TAB-converted DNA identifies positions that are read as cytosine as being hmC positions. Meanwhile, positions that are read as T are identified as being T, mC, or a bisulfite-susceptible form of C, such as unmodified cytosine, fC, or caC. Performing TAB conversion, such as on a DNA sample as described herein, thus facilitates identifying positions containing hmC using the sequence reads obtained from the sample.
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises APOBEC-coupled epigenetic (ACE) conversion.
- ACE APOBEC-coupled epigenetic
- an AID/APOBEC family DNA deaminase enzyme such as APOBEC3A (A3A) is used to deaminate unmodified cytosine and mC without deaminating hmC, fC, or caC.
- A3A APOBEC3A
- the first nucleobase comprises unmodified C and/or mC (e.g., unmodified C and optionally mC)
- the second nucleobase comprises hmC.
- Sequencing of ACE-converted DNA identifies positions that are read as cytosine as being hmC, fC, or caC positions. Meanwhile, positions that are read as T are identified as being T, unmodified C, or mC.
- Performing ACE conversion on a DNA sample as described herein thus facilitates distinguishing positions containing hmC from positions containing mC or unmodified C using the sequence reads obtained from the sample.
- ACE conversion see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083–1090.
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of the first nucleobase, e.g., as in EM-Seq. See, e.g., Vaisvila R, et al.
- EM-seq Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv; DOI: 10.1101/2019.12.20.884692, available at biorxiv.org/content/10.1101/2019.12.20.884692v1.
- TET2 and T4- ⁇ GT can be used to convert 5mC and 5hmC into substrates that cannot Atty. Docket No.
- GH0154WO / 01228-0041-00PCT be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines converting them to uracils.
- a deaminase e.g., APOBEC3A
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of the first nucleobase, e.g., as in SEM-seq. See, e.g., Vaisvila et al.
- SEM-Seq employs a non-specific, modification-sensitive double-stranded DNA deaminase (MsddA) or a modification-sensitive DNA deaminase A (MsddA)-like deaminase in a nondestructive single-enzyme 5-methylctyosine sequencing (SEM-seq) method that deaminates unmodified cytosines. Accordingly, SEM-seq does not require the TET2/T4- ⁇ GT protection and denaturing steps that are of use, e.g., in APOEC3A-based protocols.
- MsddA does not deaminate 5-formylated cytosines (5fC) or 5-carboxylated cytosines (5caC).
- unmodified cytosines in the DNA are deaminated to uracil and is read as “T” during sequencing.
- Modified cytosines e.g., 5mC
- Cytosines that are read as thymines are identified as unmodified (e.g., unmethylated) cytosines or as thymines in the DNA. Performing SEM-seq conversion thus facilitates identifying positions containing 5mC using the sequence reads obtained.
- MsddA and MsddA-like deaminases see, e.g., Vaisvila et al. Mol Cell.2024 Mar 7;84(5):854-866.e7, which illustrates in Fig.2A-C that MsddA-like deaminases have reduced activity on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine in dsDNA, e.g., a reduction of about 75%, 80%, or more on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine (e.g., using assay conditions as described in Vaisvila et al., such as analysis of deamination of C in E.
- assay conditions as described in Vaisvila et al., such as analysis of deamination of C in E.
- Deamination can be performed by contacting substrate DNA with deaminase and analyzed using NGS as follows: 50 ng of unmodified E.
- coli C2566 genomic DNA can be combined with the control DNAs (about 1 ng of Lambda, XP12, and T4147, and 0.1 ng of the 5hmC Adenovirus PCR fragment), sheared to Atty. Docket No.
- GH0154WO / 01228-0041-00PCT about 300 bp and ligated to pyrrolo-dC adapters with 1 uL of in vitro synthesized deaminase (e.g., synthesized using the PURExpress In Vitro Protein Synthesis kit (NEB, Ipswich, MA) following manufacturer’s recommendations with 100-400 ng of PCR fragment template DNA containing codon-optimized deaminase coding sequence and T7 promoter and terminator).
- exemplary deamination reaction conditions are 50 mM Bis-Tris pH 6.0, 0.1% Triton X-100 for 1 hour at 37 degrees C.
- Thermolabile Proteinase K (NEB, Ipswich, MA) can be added and incubated for 30 min at 37 degrees C and then the Proteinase K can be heat inactivated at 60 degrees C for 10 minutes.
- the deaminated product can then be used for library amplification using the NEBNext Q5U Master Mix (New England Biolabs, Ipswich, MA, USA) with 5mMof NEBNext Unique Dual Index Primers.
- the resulting library can be purified using 1X NEBNext Sample Purification Beads according to the manufacturer’s instructions and the purified library can be analyzed and quantified by an Agilent Bioanalyzer 2100 DNA Highsensitivity chip.
- the libraries can be sequenced using the Illumina NextSeq and NovaSeq platforms. Paired-end sequencing of 75 cycles (2 x 75 bp) can be performed for all the sequencing runs. Base calling and demultiplexing can be carried out with the standard Illumina pipeline.
- the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample comprises separating DNA originally comprising the first nucleobase from DNA not originally comprising the first nucleobase.
- the first nucleobase is hmC.
- DNA originally comprising the first nucleobase may be separated from other DNA using a labeling procedure comprising biotinylating positions that originally comprised the first nucleobase.
- the first nucleobase is first derivatized with an azide-containing moiety, such as a glucosyl-azide containing moiety.
- the azide-containing moiety then may serve as a reagent for attaching biotin, e.g., through Huisgen cycloaddition chemistry.
- the DNA originally comprising the first nucleobase, now biotinylated can be separated from DNA not originally comprising the first nucleobase using a biotin-binding agent, such as avidin, neutravidin (deglycosylated avidin with an isoelectric point of about 6.3), or streptavidin.
- a biotin-binding agent such as avidin, neutravidin (deglycosylated avidin with an isoelectric point of about 6.3), or streptavidin.
- hmC-seal An example of a procedure for separating DNA originally comprising the first nucleobase from DNA not originally comprising the first nucleobase is hmC-seal, which labels hmC to form ⁇ -6-azide-glucosyl-5-hydroxymethylcytosine and then attaches a biotin moiety through Huisgen cycloaddition, followed by separation of the biotinylated DNA from other DNA using a biotin-binding agent.
- Atty. Docket No. GH0154WO / 01228-0041-00PCT hmC-seal see, e.g., Han et al., Mol. Cell 2016; 63: 711-719.
- the method further comprises differentially tagging each of the DNA originally comprising the first nucleobase, the DNA not originally comprising the first nucleobase.
- the method may further comprise pooling the DNA originally comprising the first nucleobase and the DNA not originally comprising the first nucleobase following differential tagging.
- the DNA originally comprising the first nucleobase and the DNA not originally comprising the first nucleobase may then be used in downstream analyses.
- the pooled DNA originally comprising the first nucleobase and the DNA not originally comprising the first nucleobase may be sequenced in the same sequencing cell (such as after being subjected to further treatments, such as those described herein) while retaining the ability to resolve whether a given read came from a molecule of DNA originally comprising the first nucleobase or DNA not originally comprising the first nucleobase using the differential tags.
- the first nucleobase is a modified or unmodified adenine
- the second nucleobase is a modified or unmodified adenine.
- the modified adenine is N6-methyladenine (mA).
- methods such as DM-seq may in some cases be preferred over methods such as bisulfite sequencing and EM-seq because they are less destructive (especially important for low yield samples such as cfDNA) and do not require denaturation, meaning that non-conversion errors are theoretically more likely to be random.
- methods that require denaturation for conversion failure to denature a DNA molecule will result in non-conversion of all bases in the DNA molecule.
- these non-random (localized) conversion can appear as false negatives (non-methylated regions).
- Random non-conversion methods can maximally affect a low percent of bases within a region, and thus the specificity of methylation change detection can be maximized (reduce false positives) by placing a threshold on % of bases within a region that are methylated/non- methylated. Hence, in some cases, a conversion procedure that does not involve denaturation is preferred.
- the conversion procedure used in the methods of the disclosure is one that changes the base pairing specificity of an unmodified nucleoside (e.g., cytosine), but does not change the base pairing specificity of the corresponding modified nucleoside (e.g., methylated cytosine).
- the conversion procedure converts modified nucleosides.
- the conversion procedure which converts modified nucleosides comprises Atty. Docket No. GH0154WO / 01228-0041-00PCT enzymatic conversion, such as DM-seq, for example, as described in WO2023/288222A1.
- DM-seq unmodified cytosines in the DNA are enzymatically protected from a subsequent deamination step wherein 5mC in 5mCpG is converted to T.
- the enzymatically protected unmodified (e.g., unmethylated) cytosines are not converted and are read as “C” during sequencing. Cytosines that are read as thymines (in a CpG context) are identified as methylated cytosines in the DNA. [000233]
- the first nucleobase comprises unmodified (such as unmethylated) cytosine
- the second nucleobase comprises modified (such as methylated) cytosine. Sequencing of the converted DNA identifies positions that are read as cytosine as being unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC.
- Exemplary cytosine deaminases for use herein include APOBEC enzymes, for example, APOBEC3A.
- APOBEC3A AID/APOBEC family DNA deaminase enzymes such as APOBEC3A (A3A) are used to deaminate (unprotected) unmodified cytosine and 5mC.
- the enzymatic protection of unmodified cytosines in the DNA comprises addition of a protective group to the unmodified cytosines.
- Such protective groups can comprise an alkyl group, an alkyne group, a carboxyl group, a carboxyalkyl group, an amino group, a hydroxymethyl group, a glucosyl group, a glucosylhydroxymethyl group, an isopropyl group, or a dye.
- DNA can be treated with a methyltransferase, such as a CpG-specific methyltransferase, which adds the protective group to unmodified cytosines.
- methyltransferase is used broadly herein to refer to enzymes capable of transferring a methyl or substituted methyl (e.g.,carboxymethyl) to a substrate (e.g., a cytosine in a nucleic acid).
- a substrate e.g., a cytosine in a nucleic acid.
- the DNA is contacted with a CpG-specific DNA methyltransferase (MTase), such as a CpG-specific carboxymethyltransferase (CxMTase), and a substituted methyl donor, such as a carboxymethyl donor (e.g., carboxymethyl-S-adenosyl-L-methionine).
- MTase DNA methyltransferase
- CxMTase CpG-specific carboxymethyltransferase
- a substituted methyl donor such as a carboxymethyl donor (e.g., carboxymethyl-S-aden
- the CxMTase can facilitate the addition of a protective carboxymethyl group to an unmethylated cytosine.
- the unmethylated cytosine is unmodified cytosine.
- the carboxymethyl group can prevent deamination of the cytosine during a deamination step (such as a deamination step using an Atty. Docket No. GH0154WO / 01228-0041-00PCT APOBEC enzyme, such as A3A).
- Substituted methyl or carboxymethyl donors useful in the disclosed methods include but are not limited to, S-adenosyl-L-methionine (SAM) analogs, optionally wherein the SAM analog is carboxy-S-adenosyl-L-methionine (CxSAM).
- SAM analogs are described, for example, in WO2022/197593A1.
- the MTase may be, for example, a CpG methyltransferase from Spiroplasma sp.
- the CxMTase may be a CpG methyltransferase from Mycoplasma penetrans (M.MpeI).
- the methyltransferase enzyme is a variant of M.MpeI having an N374R substitution or an N374K substitution.
- the methyltransferase can further comprise one or more amino acid substitutions selected from a) substitution of one or both residues T300 and E305 with S, A, G, Q, D, or N; b) substitution of one or more residues A323, N306, and Y299 with a positively charged amino acid selected from K, R or H; and/or c) substitution of S323 with A, G, K, R or H, which may enhance the activity of the enzyme.
- the conversion procedure further includes enzymatic protection of 5hmCs, such as by glucosylation of the 5hmCs (e.g., using ⁇ GT), in the DNA prior to the deamination of unprotected modified cytosines.
- 5hmC can be protected from conversion, for example through glucosylation using ⁇ -glucosyl transferase ( ⁇ GT), forming (forming 5- glucosylhydroxymethylcytosine) 5ghmC.
- ⁇ GT ⁇ -glucosyl transferase
- Glucosylation of 5hmC can reduce or eliminate deamination of 5hmC by a deaminase such as APOBEC3A.
- Treatment with an MTase or CxMTase then adds a protecting group to unmodified (unmethylated) cytosines in the DNA.5mC (but not protected, unmodified cytosine and not 5ghmC) is then deaminated (converted to T in the case of 5mC) by treatment with a deaminase, for example, an APOBEC enzyme (such as APOBEC3A). Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5hmC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC.
- Performing DM-seq conversion with glucosylation of 5hmC on a sample as described herein thus facilitates distinguishing positions containing unmodified C or 5hmC on the one hand from positions containing 5mC using the sequence reads obtained.
- alternative base conversion schemes are used. For example, unmethylated cytosines can be left intact while methylated cytosines and Atty. Docket No. GH0154WO / 01228-0041-00PCT hydroxymethylcytosines are converted to a base read as a thymine (e.g., uracil, thymine, or dihydrouracil).
- methylating a cytosine in at least one first complementary strand or second complementary strand comprises contacting the cytosine with a methyltransferase such as DNMT1 or DNMT5.
- a methyltransferase such as DNMT1 or DNMT5.
- the step of oxidizing a 5- hydroxymethylated cytosine to 5-formylcytosine can be optional.
- converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine comprises oxidizing a hydroxymethyl cytosine, e.g., the hydroxymethyl cytosine is oxidized to formylcytosine.
- oxidizing the hydroxymethyl cytosine to formylcytosine comprises contacting the hydroxymethyl cytosine with a ruthenate, such as potassium ruthenate (KRuO 4 ).
- the modified cytosine is converted to thymine, uracil, or dihydrouracil.
- the method comprises converting a formylcytosine and/or a methylcytosine to carboxylcytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine.
- converting the formylcytosine and/or the methylcytosine to carboxylcytosine can comprise contacting the formylcytosine and/or the methylcytosine with a TET enzyme, such as TET1, TET2, or TET3.
- the method comprises reducing the carboxylcytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine, and/or the carboxylcytosine is reduced to dihydrouracil.
- reducing the carboxylcytosine comprises contacting the carboxylcytosine with a borane or borohydride reducing agent.
- the borane or borohydride reducing agent comprises pyridine borane, 2-picoline borane, borane, tert-butylamine borane, ammonia borane, sodium borohydride, sodium cyanoborohydride (NaBH 3 CN), lithium borohydride (LiBH 4 ), ethylenediamine borane, dimethylamine borane, sodium triacetoxyborohydride, morpholine borane, 4-methylmorpholine borane, trimethylamine borane, dicyclohexylamine borane, or a salt thereof.
- the reducing agent comprises lithium aluminum hydride, sodium amalgam, amalgam, sulfur dioxide, dithionate, thiosulfate, iodide, hydrogen peroxide, hydrazine, Atty. Docket No. GH0154WO / 01228-0041-00PCT diisobutylaluminum hydride, oxalic acid, carbon monoxide, cyanide, ascorbic acid, formic acid, dithiothreitol, beta-mercaptoethanol, or any combination thereof.
- TET enzymes may be used in the disclosed methods as appropriate.
- the one or more TET enzymes comprise TETv.
- TETv is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 1 therein.
- the one or more TET enzymes comprise TETcd.
- TETcd is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 3 therein.
- the one or more TET enzymes comprise TET1.
- the one or more TET enzymes comprise TET2.
- TET2 may be expressed and used as a fragment comprising TET2 residues 1129-1480 joined to TET2 residues 1844-1936 by a linker as described, e.g., in US Patent 10,961,525.
- the one or more TET enzymes comprise TET1 and TET2.
- the one or more TET enzymes comprise a V1900 TET mutant, such as a V1900A, V1900C, V1900G, V1900I, or V1900P TET mutant. In some embodiments, the one or more TET enzymes comprise a V1900 TET2 mutant, such as a V1900A, V1900C, V1900G, V1900I, or V1900P TET2 mutant.
- the TET enzyme comprises a mutation that increases formation of 5-caC. Exemplary mutations are set forth above.
- a mutation that increases formation of 5-caC means that the TET enzyme having the mutation produces more 5- caC than a TET enzyme that lacks the mutation but is otherwise identical.5-caC production can be measured as described, e.g., in Liu et al., Nat Chem Biol 13:181-187 (2017) (see Online Methods section, TET reactions in vitro subsection, “driving” conditions). H.
- the disclosure relates to methods of performing a sequencing by synthesis reaction on a DNA sample, comprising a step of modified end-repair (such as end repair performed using a modified dCTP, such as d5mCTP) and/or the use of modified NGS adapters (e.g., modified Y- shaped adapters).
- the DNA sample is obtained or has been obtained from a subject.
- the DNA sample may comprise or consist of DNA from a biological sample obtained from a subject.
- the subject may be a human, a mammal, an animal, a Atty. Docket No.
- GH0154WO / 01228-0041-00PCT primate rodent (including mice and rats), or other common laboratory, domestic, companion, service or agricultural animal, for example a rabbit, dog, cat, horse, cow, sheep, goat or pig.
- the DNA sample is from a human.
- the subject may in some cases have or be suspected of having a cancer, tumor or neoplasm. In other cases the subject may not have cancer or a detectable cancer symptom.
- the subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologics.
- the subject may be in remission, e.g., from a tumor, cancer, or neoplasia (e.g., following treatment such as chemotherapy, surgical resection, radiation, or a combination thereof).
- the subject may or may not be diagnosed as being susceptible to cancer or any cancer-associated genetic mutations/disorders.
- the sample is a DNA sample obtained from a tumor tissue biopsy.
- the cancer, tumor, or neoplasm may generally be of any type, for example a cancer tumor or neoplasm of the lung, colon, rectum (or colorectum), kidney, breast, prostate, or liver, or other type of cancer as described herein.
- the sample is obtained from a subject in remission from a tumor, cancer, or neoplasia (e.g., following chemotherapy, surgical resection, radiation, or a combination thereof).
- the pre-cancer, cancer, tumor, or neoplasia or suspected pre-cancer, cancer, tumor, or neoplasia may be of the bladder, head and neck, lung, colon, rectum, kidney, breast, prostate, skin, or liver.
- the pre-cancer, cancer, tumor, or neoplasia or suspected pre-cancer, cancer, tumor, or neoplasia is of the lung.
- the pre-cancer, cancer, tumor, or neoplasia or suspected pre-cancer, cancer, tumor, or neoplasia is of the colon or rectum. In some embodiments, the pre-cancer, cancer, tumor, or neoplasia or suspected pre-cancer, cancer, tumor, or neoplasia is of the breast. In some embodiments, the pre-cancer, cancer, tumor, or neoplasia or suspected pre-cancer, cancer, tumor, or neoplasia is of the prostate. In any of the foregoing embodiments, the subject may be a human subject.
- the sample is obtained from a subject having a stage I cancer, stage II cancer, stage III cancer or stage IV cancer.
- the subject may have an infection, a transplant rejection, or other disease or disorder related to changes in the immune system.
- the subject may not have cancer or a detectable cancer symptom.
- the subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologics.
- the subject may be in remission.
- the subject may or may not be diagnosed as being susceptible to cancer or any cancer-associated genetic mutations/disorders. Atty. Docket No.
- the biological sample can be any biological sample isolated from a subject.
- Biological samples can include body tissues, such as known or suspected solid tumors, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, cerebrospinal fluid synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, pleural effusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine.
- biological samples are body fluids, particularly blood and fractions thereof, or urine.
- a sample can be in the form originally isolated from a subject or can have been subjected to further processing to remove or add components, such as cells, or enrich for one component relative to another.
- a sample can be isolated or obtained from a subject and transported to a site of sample analysis. The sample may be preserved and shipped at a desirable temperature, e.g., room temperature, 4°C, -20°C, and/or -80°C.
- a sample can be isolated or obtained from a subject at the site of the sample analysis.
- the subject can be a human, a mammal, an animal, a companion animal, a service animal, or a pet.
- the subject may have a cancer, precancer, infection, transplant rejection, or other disease or disorder related to changes in the immune system.
- the subject may not have cancer or a detectable cancer symptom.
- the subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines, or biologics.
- the subject may be in remission.
- the subject may or may not be diagnosed of being susceptible to cancer or any cancer-associated genetic mutations/disorders.
- DNA to be sequenced is isolated from a tissue sample, such as a tumor sample.
- the DNA to be sequenced is isolated from cells of a blood sample, such as a buffy coat sample, a whole blood sample, a leukapheresis sample, or a PBMC sample.
- a blood sample e.g., a buffy coat sample, a whole blood sample, a leukapheresis sample, or a PBMC sample
- the DNA isolated from any type of sample comprising cells including but not limited to a blood sample (e.g., a buffy coat sample, a whole blood sample, a leukapheresis sample, or a PBMC sample) may be DNA isolated from the cells of that sample.
- DNA such as DNA from a tumor sample
- DNA may also be analyzed (e.g., sequenced, captured, converted, and/or partitioned) to provide information, e.g., for quantifying cell contributions to the DNA (such as cancer cell contributions to the DNA and/or immune cell contributions to the DNA); for identifying other cell types contributing to the DNA; for detecting mutations in the DNA; and/or Atty. Docket No. GH0154WO / 01228-0041-00PCT for detecting epigenetic differences, e.g., differential methylation, relative to healthy or normal DNA.
- the volume of plasma obtained can depend on the desired read depth for sequenced regions. Exemplary volumes are 0.4-40 mL, 5-20 mL, 10-20 mL, and 3-5 mL.
- the volume can be 0.5 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL, 6 mL, 7 mL, 8 mL, 9 mL, 10 mL, 20 mL, 30 mL, or 40 mL.
- a volume of sampled plasma may be 5 to 20 mL.
- the sample volume is 3-5 mL of plasma, such as 4 mL of plasma, per 10 mL whole blood. [000254]
- the sample comprises whole blood.
- the sample volume is 1-5 mL of whole blood, such as 2.5 mL of whole blood.
- the sample comprises buffy coat separated from whole blood.
- Exemplary volumes of sampled buffy coat are 0.1-20 mL, 1-10 mL, 1-5 mL, 0.2-0.6 mL, and 0.3-0.5 mL.
- the volume can be 0.1 mL, 0.2 mL, 0.3 mL, 0.4 mL, 0.5 mL, 0.6 mL, 0.7 mL, 0.8 mL, 0.9 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL 10 mL, or 20 mL.
- the volume can be 0.1 mL, 0.2 mL, 0.3 mL, 0.4 mL, 0.5 mL, 0.6 mL, 0.7 mL, 0.8 mL, 0.9 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL 10 mL, or 20 mL.
- a volume of sampled PBMCs may be 1 to 10 mL.
- the sample volume is 0.1-0.5 mL of PBMCs, such as 0.3 mL of PBMCs, per 10 mL whole blood.
- the sample comprises leukocytes separated from subject blood using leukapheresis.
- the sample volume is 0.1-0.6 mL of leukocytes from leukapheresis, such as 0.4 mL of leukocytes, per 10 mL whole blood.
- a sample can comprise various amount of nucleic acid that contains genome equivalents. For example, a sample of about 30 ng DNA can contain about 10,000 (10 4 ) haploid human genome equivalents. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents.
- a sample can comprise nucleic acids from different sources, e.g., nucleic acids from cells and cell-free nucleic acids of the same subject, and nucleic acids from cells and cell-free nucleic acids of different subjects.
- the nucleic acid may be DNA.
- a sample can comprise DNA carrying mutations.
- a sample can comprise DNA carrying germline mutations and/or somatic mutations.
- Germline mutations refer to mutations Atty. Docket No. GH0154WO / 01228-0041-00PCT existing in germline DNA of a subject.
- Somatic mutations refer to mutations originating in somatic cells of a subject, e.g., cancer cells.
- a sample can comprise DNA carrying cancer- associated mutations (e.g., cancer-associated somatic mutations).
- a sample can comprise an epigenetic variant, wherein the epigenetic variant associated with the presence of a genetic variant such as a cancer-associated mutation.
- the sample comprises an epigenetic variant associated with the presence of a genetic variant, wherein the sample does not comprise the genetic variant.
- the DNA sample may be or comprise cell free nucleic acids or cfDNA.
- the cfDNA may be obtained from a test subject, for example as described above.
- the sample for analysis may be plasma or serum containing cell-free nucleic acids.
- Cell-free DNA “cfDNA molecules,” or “cfDNA”, for example, include DNA molecules that naturally occur in a subject in extracellular form (e.g., in blood, serum, plasma, or other bodily fluids such as lymph, cerebrospinal fluid, urine, or sputum).
- Exemplary amounts of nucleic acids e.g., DNA from a buffy coat sample or any other sample comprising cells, such as a blood sample (e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample) in a sample before amplification range from about 1 fg to about 1 ⁇ g, e.g., 1 pg to 200 ng, 1 ng to 100 ng, 10 ng to 1000 ng.
- the amount can be up to Atty. Docket No.
- GH0154WO / 01228-0041-00PCT about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of nucleic acid molecules.
- the amount can be at least 1 fg, at least 10 fg, at least 100 fg, at least 1 pg, at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of nucleic acid molecules.
- DNA molecules can be linked to adapters at either one end or both ends.
- a method described herein comprises identifying the presence of DNA produced by a tumor (or neoplastic cells, or cancer cells) or by precancer cells.
- the methods of the disclosure may be used to characterize the heterogeneity of an abnormal condition in a subject. Such methods can include, e.g., generating a genetic profile of extracellular polynucleotides derived from the subject, wherein the genetic profile comprises a plurality of data resulting from copy number variation and rare mutation analyses.
- an abnormal condition is cancer or precancer.
- the abnormal condition may be one resulting in a heterogeneous genomic population.
- some tumors are known to comprise tumor cells in different stages of the cancer.
- heterogeneity may comprise multiple foci of disease.
- the present methods can be used to generate a profile, fingerprint, or set of data that is a summation of genetic information derived from different cells in a heterogeneous disease. Such a set of data may comprise copy number variation, epigenetic variation, or other mutation analyses alone or in combination.
- the present methods can be used to diagnose, prognose, monitor or observe cancers, or other diseases. In some embodiments, the methods herein do not involve the diagnosing, prognosing or monitoring a fetus and as such are not directed to non-invasive prenatal testing.
- these methodologies may be employed in a pregnant subject to diagnose, prognose, monitor or observe cancers or other diseases in an unborn subject whose DNA and other polynucleotides may co-circulate with maternal molecules.
- analysis of reads can be performed on a partition-by- partition level, as well as a whole DNA population level. Tags can be used to sort reads from different partitions. Analysis can include in silico analysis to determine genetic and epigenetic Atty. Docket No. GH0154WO / 01228-0041-00PCT variation (one or more of methylation, chromatin structure, etc.) using sequence information, genomic coordinates length, coverage, and/or copy number.
- An exemplary method for performing a sequencing by synthesis reaction on a converted DNA molecule comprises the following steps: 1.
- an extracted DNA sample e.g., cell-free DNA or DNA isolated from a sample comprising cells, such as a blood sample (e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample) and/or additional DNA
- a blood sample e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample
- additional DNA ligating modified Y- shaped adapters (as disclosed herein, such as shown in FIG.3, right side) comprising molecular tags to the DNA.
- the adapters are ligated to the DNA such that read 1 begins from the 3’ end of an adapter-ligated DNA molecule.
- the adapter comprises modified bases (e.g., modified cytosines) such that the nucleobases of the adapter are resistant to a conversion procedure (e.g., resistant to deamination).
- a conversion procedure e.g., resistant to deamination
- DNA is fragmented prior to the ligating, e.g., by sonication or restriction digestion.
- an epigenetic base conversion procedure e.g., a procedure described herein, such as cytosine deamination by DM-seq, EM-seq, or bisulfite.
- Performing a sequencing by synthesis reaction on the converted DNA molecules using a sequencing by synthesis instrument e.g., an NGS instrument. 4. Calibrating one or more base calling metrics of the sequencing by synthesis instrument based at least in part on data from a resistant region of the DNA molecule (such as a resistant region of the Y-shaped adapter), thereby providing one or more calibrated base calling metrics. 5. Calling at least a portion of nucleobases in a converted region of the DNA molecule using the one or more calibrated base calling metrics.
- Another exemplary method for performing a sequencing by synthesis reaction on a converted DNA molecule comprises the following steps: 1.
- an extracted DNA sample e.g., cell-free DNA or DNA isolated from a sample comprising cells, such as a blood sample (e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample) and/or additional DNA
- a modified dNTP mix comprising modified cytosines, such as d5mCTP, rather than dCTP, Atty. Docket No. GH0154WO / 01228-0041-00PCT and optionally an A-tailing reaction.
- the modified cytosines of the dNTP mix are incorporated into the DNA molecule during end repair and are resistant to a conversion procedure.
- FIG.1B Another exemplary method for performing a sequencing by synthesis reaction on a converted DNA molecule comprises the following steps, and is shown in FIG.1B (an exemplary standard workflow is illustrated in FIG.1A for comparison purposes): 1. Preparing an extracted DNA sample (e.g., cell-free DNA or DNA isolated from a sample comprising cells, such as a blood sample (e.g., a whole blood sample, a buffy coat sample, a leukapheresis sample, or a PBMC sample) and/or additional DNA) by performing end repair using a modified dNTP mix comprising modified cytosines, such as d5mCTP, rather than dCTP, and optionally an A-tailing reaction.
- a modified dNTP mix comprising modified cytosines, such as d5mCTP, rather than dCTP, and optionally an A-tailing reaction.
- the modified cytosines of the dNTP mix are incorporated into the DNA molecule during end repair and are resistant to a conversion procedure.
- DNA is fragmented prior to end repair and optional A-tailing, e.g., by sonication or restriction digestion.
- Ligating modified Y-shaped adapters (as disclosed herein, such as shown in FIG.3, right side) Atty. Docket No. GH0154WO / 01228-0041-00PCT comprising molecular tags to the DNA.
- the adapters are ligated to the DNA such that read 1 begins from the 3’ end of an adapter-ligated DNA molecule.
- the adapter comprises modified bases (e.g., modified cytosines) such that the nucleobases of the adapter are resistant to a conversion procedure (e.g., resistant to deamination).
- modified bases e.g., modified cytosines
- the adapted DNA molecules are subjected to an epigenetic base conversion procedure (e.g., a procedure described herein, such as cytosine deamination by DM-seq, EM-seq, or bisulfite).
- an epigenetic base conversion procedure e.g., a procedure described herein, such as cytosine deamination by DM-seq, EM-seq, or bisulfite.
- Performing a sequencing by synthesis reaction on the converted DNA molecules using a sequencing by synthesis instrument e.g., an NGS instrument. 5.
- molecular tags comprise or consist of nucleotides that are not altered by a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA, such as any of those described herein (e.g., mC along with A, T, and G where the procedure is bisulfite conversion or any other conversion that does not affect mC; hmC along with A, T, and G where the procedure is a conversion that does not affect hmC; etc.).
- a nucleic acid sample such as a heterogeneous nucleic acid sample
- partitions e.g., sub-samples
- each partition is differentially tagged.
- Tagged partitions can then be pooled together for collective sample prep and/or sequencing.
- the partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on one or more different characteristics, Atty. Docket No. GH0154WO / 01228-0041-00PCT and tagged using differential tags that are distinguished from other partitions and partitioning means.
- a partitioning step occurs prior to a step of sequencing by synthesis and prior to a step of performing a conversion procedure on a DNA molecule (such as on an adapted DNA molecule). In some embodiments of the disclosed methods, a partitioning step occurs prior to a step of sequencing by synthesis and after a step of performing a conversion procedure on a DNA molecule (such as on an adapted DNA molecule). In some embodiments, a partitioning step occurs prior to a step of sequencing by synthesis and prior to a step of tagging a DNA molecule. In some embodiments, a partitioning step occurs prior to a step of sequencing by synthesis and after a step of tagging a DNA molecule.
- partitioning examples include sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation, and/or proteins that bind to DNA.
- Resulting partitions can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments.
- partitioning based on a cytosine modification (e.g., cytosine methylation) or methylation generally is performed and is optionally combined with at least one additional partitioning step, which may be based on any of the foregoing characteristics or forms of DNA.
- a heterogeneous population of nucleic acids is partitioned into nucleic acids with one or more base modifications and without the one or more base modifications. Examples of base modifications are described elsewhere herein.
- a heterogeneous population of nucleic acids can be partitioned into nucleic acid molecules associated with nucleosomes and nucleic acid molecules devoid of nucleosomes.
- a heterogeneous population of nucleic acids may be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA Atty. Docket No. GH0154WO / 01228-0041-00PCT (dsDNA).
- a heterogeneous population of nucleic acids may be partitioned based on nucleic acid length (e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp).
- nucleic acid length e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp.
- different procedures are applied to different partitions to determine different characteristics of the initial sample.
- the DNA of at least one partition can be subjected to an end repair procedure according to the methods of the disclosure described herein.
- at least one partition is not subjected to the end repair procedure according to the methods of the disclosure described herein.
- partition tagging comprises tagging molecules in each partition with a partition tag. After re-combining partitions (e.g., to reduce the number of sequencing runs needed and avoid unnecessary cost) and sequencing molecules, the partition tags identify the source partition. In another embodiment, different partitions are tagged with different sets of molecular tags, e.g., comprised of a pair of barcodes.
- each molecular barcode indicates the source partition as well as being useful to distinguish molecules within a partition.
- a first set of 35 barcodes can be used to tag molecules in a first partition, while a second set of 35 barcodes can be used tag molecules in a second partition.
- the molecules may be pooled for sequencing in a single run.
- a sample tag is added to the molecules, e.g., in a step subsequent to addition of partition tags and pooling. Sample tags can facilitate pooling material generated from multiple samples for sequencing in a single sequencing run.
- partition tags may be correlated to the sample as well as the partition.
- a first tag can indicate a first partition of a first sample; a second tag can indicate a second partition of the first sample; a third tag can indicate a first partition of a second sample; and a fourth tag can indicate a second partition of the second sample.
- tags may be attached to molecules already partitioned based on one or more characteristics, the final tagged molecules in the library may no longer possess that Atty. Docket No. GH0154WO / 01228-0041-00PCT characteristic. For example, while single stranded DNA molecules may be partitioned and tagged, the final tagged molecules in the library are likely to be double stranded.
- tagged molecules derived from these molecules may be unmethylated. Accordingly, the tag attached to a molecule in the library can indicate the characteristic of the “parent molecule” from which the ultimate tagged molecule is derived, not necessarily to characteristic of the tagged molecule, itself.
- barcodes 1, 2, 3, 4, etc. are used to tag and label molecules in the first partition; barcodes A, B, C, D, etc. are used to tag and label molecules in the second partition; and barcodes a, b, c, d, etc. are used to tag and label molecules in the third partition.
- Differentially tagged partitions can be pooled prior to sequencing.
- Differentially tagged partitions can be separately sequenced or sequenced together concurrently, e.g., in the same flow cell of an Illumina sequencer.
- analysis of reads can be performed on a partition-by-partition level, as well as a whole DNA population level. Tags are used to sort reads from different partitions. Analysis can include in silico analysis to determine genetic and epigenetic variation (one or more of methylation, chromatin structure, etc.) using sequence information, genomic coordinates length, coverage, and/or copy number. In some embodiments, higher coverage can correlate with higher nucleosome occupancy in genomic region while lower coverage can correlate with lower nucleosome occupancy or a nucleosome depleted region (NDR).
- NDR nucleosome depleted region
- Disclosed methods herein comprise performing a sequencing by synthesis reaction on a converted DNA molecule.
- the disclosed methods can comprise partitioning DNA.
- different forms of DNA e.g., hypermethylated and hypomethylated DNA
- This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated.
- a first subsample or aliquot of a sample is subjected to steps for making capture probes as described elsewhere herein and a second subsample or aliquot of a sample is subjected to partitioning.
- Partitioning nucleic acid molecules in a sample can increase a rare signal, e.g., by enriching rare nucleic acid molecules that are more prevalent in one partition of the sample. For example, a genetic variation present in hypermethylated DNA but less (or not) present in hypomethylated DNA can be more easily detected by partitioning a sample into hypermethylated and hypomethylated nucleic acid molecules. By analyzing multiple partitions of a sample, a multi-dimensional analysis of a single molecule can be performed and hence, greater sensitivity can be achieved.
- Partitioning may include physically partitioning nucleic acid molecules into partitions or subsamples based on the presence or absence of one or more methylated nucleobases.
- a sample may be partitioned into partitions or subsamples based on a characteristic that is indicative of differential gene expression or a disease state.
- a sample may be partitioned based on a characteristic, or combination thereof that provides a difference in signal between a normal and diseased state during analysis of nucleic acids, e.g., cell free DNA (cfDNA), non- cfDNA, tumor DNA, circulating tumor DNA (ctDNA) and cell free nucleic acids (cfNA).
- cfDNA cell free DNA
- ctDNA circulating tumor DNA
- cfNA cell free nucleic acids
- hypermethylation and/or hypomethylation variable epigenetic target regions are analyzed to determine whether they show differential methylation characteristic of tumor cells or cells of a type that does not normally contribute to the DNA sample being analyzed (such as cfDNA), and/or particular cell types, such as immune cell types.
- heterogeneous DNA in a sample is partitioned into two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions). In some embodiments, each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and/or sequencing.
- the partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on a different characteristic (examples provided herein), and tagged using differential tags that are distinguished from other partitions and partitioning means. In other instances, the differentially tagged partitions are separately sequenced. Atty. Docket No.
- the agents used to partition populations of nucleic acids within a sample can be affinity agents, such as antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28: 1106-1114 (2010); Song et al., Nat Biotech 29: 68- 72 (2011)), or artificial peptides selected e.g., by phage display to have specificity to a given target.
- the agent used in the partitioning is an agent that recognizes a modified nucleobase.
- partitioning agents include antibodies, such as antibodies that recognize a modified nucleobase, which may be a modified cytosine, such as a methylcytosine (e.g., 5-methylcytosine).
- the partitioning agent is an antibody that recognizes a modified cytosine other than 5-methylcytosine, such as 5-carboxylcytosine (5caC).
- partitioning agents include methyl binding domain (MBDs) and methyl binding proteins (MBPs) as described herein, including proteins such as MeCP2.
- MBDs methyl binding domain
- MBPs methyl binding proteins
- Additional, non-limiting examples of partitioning agents are histone binding proteins which can separate nucleic acids bound to histones from free or unbound nucleic acids. Examples of histone binding proteins that can be used in the methods disclosed herein include RBBP4, RbAp48 and SANT domain peptides.
- partitioning can comprise both binary partitioning and partitioning based on degree/level of modifications.
- methylated fragments can be partitioned by methylated DNA immunoprecipitation (MeDIP), or all methylated fragments can be partitioned from unmethylated fragments using methyl binding domain proteins (e.g., MethylMinder Methylated DNA Enrichment Kit (ThermoFisher Scientific). Subsequently, additional partitioning may involve eluting fragments having different levels of methylation by adjusting the salt concentration in a solution with the methyl binding domain and bound Atty. Docket No. GH0154WO / 01228-0041-00PCT fragments. As salt concentration increases, fragments having greater methylation levels are eluted.
- MeDIP methylated DNA immunoprecipitation
- Analyzing DNA may comprise detecting or quantifying DNA of interest.
- Analyzing DNA can comprise detecting genetic variants and/or epigenetic features (e.g., DNA methylation and/or DNA fragmentation).
- methylation levels can be determined using partitioning, modification-sensitive conversion such as bisulfite conversion, direct detection during sequencing, methylation-sensitive restriction enzyme digestion, methylation-dependent restriction enzyme digestion, or any other suitable approach.
- modification-sensitive conversion such as bisulfite conversion
- direct detection during sequencing methylation-sensitive restriction enzyme digestion
- methylation-dependent restriction enzyme digestion or any other suitable approach.
- different forms of DNA e.g., hypermethylated and hypomethylated DNA
- a methylated DNA binding protein e.g., an MBD such as MBD2, MBD4, or MeCP2
- an antibody specific for 5-methylcytosine as in MeDIP
- DNA fragmentation pattern can be determined based on endpoints and/or centerpoints of DNA molecules, such as cfDNA molecules.
- the final partitions are enriched in nucleic acids having different extents of modifications (overrepresentative or underrepresentative of modifications).
- Overrepresentation and underrepresentation can be defined by the number of modifications born by a nucleic acid relative to the median number of modifications per strand in a population. For example, if the median number of 5-methylcytosine residues in nucleic acid in a sample is 2, a nucleic acid including more than two 5-methylcytosine residues is overrepresented in this modification and a nucleic acid with 1 or zero 5-methylcytosine residues is underrepresented.
- the effect of the affinity separation is to enrich for nucleic acids overrepresented in a modification in a bound phase and for nucleic acids underrepresented in a modification in an unbound phase (i.e., in solution). The nucleic acids in the bound phase can be eluted before subsequent processing.
- methylation When using MeDIP or MethylMiner ® Methylated DNA Enrichment Kit (ThermoFisher Scientific) various levels of methylation can be partitioned using sequential elutions. For example, a hypomethylated partition (no methylation) can be separated from a methylated partition by contacting the nucleic acid population with the MBD from the kit, which is attached to magnetic beads. The beads are used to separate out the methylated nucleic acids from the non- Atty. Docket No. GH0154WO / 01228-0041-00PCT methylated nucleic acids.
- a first set of methylated nucleic acids can be eluted at a salt concentration of 160 mM or higher, e.g., at least 150 mM, at least 200 mM, 300 mM, 400 mM, 500 mM, 600 mM, 700 mM, 800 mM, 900 mM, 1000 mM, or 2000 mM.
- magnetic separation is once again used to separate higher level of methylated nucleic acids from those with lower level of methylation.
- nucleic acids bound to an agent used for affinity separation-based partitioning are subjected to a wash step.
- the wash step washes off nucleic acids weakly bound to the affinity agent.
- nucleic acids can be enriched in nucleic acids having the modification to an extent close to the mean or median (i.e., intermediate between nucleic acids remaining bound to the solid phase and nucleic acids not binding to the solid phase on initial contacting of the sample with the agent).
- the affinity separation results in at least two, and sometimes three or more partitions of nucleic acids with different extents of a modification. While the partitions are still separate, the nucleic acids of at least one partition, and usually two or three (or more) partitions are linked to nucleic acid tags, usually provided as components of adapters, with the nucleic acids in different partitions receiving different tags that distinguish members of one partition from another.
- the tags linked to nucleic acid molecules of the same partition can be the same or different from one another. But if different from one another, the tags may have part of their code in common so as to identify the molecules to which they are attached as being of a particular partition.
- the nucleic acid molecules can be partitioned into different partitions based on the nucleic acid molecules that are bound to a specific protein or a fragment thereof and those that are not bound to that specific protein or fragment thereof.
- Nucleic acid molecules can be partitioned based on DNA-protein binding.
- Protein- DNA complexes can be partitioned based on a specific property of a protein. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or Atty. Docket No. GH0154WO / 01228-0041-00PCT enzymatic activity. Examples of proteins which may bind to DNA and serve as a basis for fractionation may include, but are not limited to, protein A and protein G. Any suitable method can be used to partition the nucleic acid molecules based on protein bound regions.
- the partitioning comprises contacting the DNA with a methylation sensitive restriction enzyme (MSRE) and/or a methylation dependent restriction enzyme (MDRE).
- MSRE methylation sensitive restriction enzyme
- MDRE methylation dependent restriction enzyme
- the DNA may be partitioned based on size to generate hypermethylated (longest DNA molecules following MSRE treatment and shortest DNA fragments following MDRE treatment), intermediate (intermediate length DNA molecules following MSRE or MDRE treatment), and hypomethylated (shortest DNA molecules following MSRE treatment and longest DNA fragments following MDRE treatment) subsamples.
- the partitioning is performed by contacting the nucleic acids with a methyl binding domain (“MBD”) of a methyl binding protein (“MBP”).
- the nucleic acids are contacted with an entire MBP.
- an MBD binds to 5-methylcytosine (5mC), and an MBP comprises an MBD and is referred to interchangeably herein as a methyl binding protein or a methyl binding domain protein.
- MBD is coupled to paramagnetic beads, such as Dynabeads® M-280 Streptavidin via a biotin linker. Partitioning into fractions with different extents of methylation can be performed by eluting fractions by increasing the NaCl concentration.
- bound DNA is eluted by contacting the antibody or MBD with a protease, such as proteinase K. This may be performed instead of or in addition to elution steps using NaCl as discussed above.
- agents that recognize a modified nucleobase contemplated herein include, but are not limited to: (a) MeCP2 is a protein that preferentially binds to 5-methyl-cytosine over unmodified cytosine. (b) RPL26, PRP8 and the DNA mismatch repair protein MHS6 preferentially bind to 5- hydroxymethyl-cytosine over unmodified cytosine. (c) FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to 5-formyl-cytosine over unmodified cytosine (Iurlaro et al., Genome Biol.14: R119 (2013)). Atty. Docket No.
- elution is a function of the number of modifications, such as the number of methylated sites per molecule, with molecules having more methylation eluting under increased salt concentrations.
- elution buffers can be used to elute the DNA into distinct populations based on the extent of methylation. Salt concentration can range from about 100 nm to about 2500 mM NaCl. In one embodiment, the process results in three (3) partitions.
- Molecules are contacted with a solution at a first salt concentration and comprising a molecule comprising an agent that recognizes a modified nucleobase, which molecule can be attached to a capture moiety, such as streptavidin.
- a population of molecules will bind to the agent and a population will remain unbound.
- the unbound population can be separated as a “hypomethylated” population.
- a first partition enriched in hypomethylated form of DNA is that which remains unbound at a low salt concentration, e.g., 100 mM or 160 mM.
- a second partition enriched in intermediate methylated DNA is eluted using an intermediate salt concentration, e.g., between 100 mM and 2000 mM concentration.
- a third partition enriched in hypermethylated form of DNA is eluted using a high salt concentration, e.g., at least about 2000 mM.
- a monoclonal antibody raised against 5-methylcytidine (5mC) is used to purify methylated DNA.
- DNA is denatured, e.g., at 95°C in order to yield single- stranded DNA fragments.
- Protein G coupled to standard or magnetic beads as well as washes following incubation with the anti-5mC antibody are used to immunoprecipitate DNA bound to the antibody.
- Such DNA may then be eluted.
- Partitions may comprise unprecipitated DNA and one or more partitions eluted from the beads.
- the partitions of DNA are desalted and concentrated in preparation for enzymatic steps of library preparation.
- Sequences that comprise aberrantly high copy numbers may tend to be hypermethylated.
- the DNA contacted with capture probes specific for members of an epigenetic target region set comprising a plurality of target regions that are both type-specific differentially methylated regions and copy number variants comprises at least a portion of a hypermethylated partition.
- the DNA from or comprising at least a portion of the hypermethylated partition may or may not be combined with DNA from or comprising at Atty. Docket No.
- GH0154WO / 01228-0041-00PCT least a portion of one or more other partitions, such as an intermediate partition or a hypomethylated partition.
- Amplification Adapted DNA can be amplified (e.g. by PCR) prior to, or as part of, the modification- sensitive sequencing. For example, in modification-sensitive sequencing procedures which comprise a conversion step, the adapted DNA may be amplified after the conversion step. In modification-sensitive sequencing procedures which involve single molecule sequencing (such a nanopore-based sequencing or SMRT sequencing), there may be no amplification step. [000320] Amplification is typically primed by primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified.
- Amplification methods can involve cycles of denaturation, annealing and extension, resulting from thermocycling or can be isothermal as in transcription-mediated amplification.
- Other amplification methods include the ligase chain reaction, strand displacement amplification, nucleic acid sequence based amplification, and self- sustained sequence based replication.
- the present methods perform dsDNA ligations with T-tailed and C-tailed adapters.
- the addition of C-tailed adapters can increase ligation efficiency because the A-tailing reaction can also add G-tails to a small portion of the DNA molecules, when the A tailing is performed in the presence of dGTP, such as when the A-tailing is performed in the same reaction as the end repair.
- the methods herein comprise preparing one or more pools comprising tagged DNA from a plurality of partitioned subsamples.
- a pool comprises at least a portion of the DNA of a hypomethylated partition and at least a portion Atty. Docket No. GH0154WO / 01228-0041-00PCT of the DNA of a hypermethylated partition.
- Target regions e.g., including epigenetic target regions and/or sequence-variable target regions, may be captured from a pool.
- the methods comprise capturing at least a first set of target regions from the first pool, wherein the first set comprises sequence-variable target regions.
- a step of amplifying DNA in the first pool may be performed before this capture step.
- capturing the first set of target regions from the first pool comprises contacting the DNA of the first pool with a first set of target-specific probes, wherein the first set of target- specific probes comprises target-binding probes specific for the sequence-variable target regions.
- the methods comprise capturing a second plurality of sets of target regions from the second pool, wherein the second plurality comprises sequence-variable target regions and epigenetic target regions. A step of amplifying DNA in the second pool may be performed before this capture step.
- capturing the second plurality of sets of target regions from the second pool comprises contacting the DNA of the first pool with a second set of target-specific probes, wherein the second set of target-specific probes comprises target-binding probes specific for the sequence-variable target regions and target-binding probes specific for the epigenetic target regions.
- sequence-variable target regions are captured from a second portion of a partitioned subsample. The second portion may include some, a majority, substantially all, or all of the DNA of the subsample that was not included in the pool. The regions captured from the pool and from the subsample may be combined and analyzed in parallel.
- sequence-variable target regions and epigenetic target regions may be captured to a lesser extent than one or more of the sequence-variable target regions are captured from the hypermethylated and hypomethylated partitions and/or to a lesser extent that epigenetic target regions are captured from a hypermethylated partition.
- sequence-variable target regions can be captured from a portion of a hypomethylated partition that is not pooled with a hypermethylated partition, and the pool can be prepared with some (e.g., a majority, substantially all, or all) of the DNA from a hypermethylated partition and none or some (e.g., a minority) of the DNA from a hypomethylated partition.
- including a minority of the DNA of a hypomethylated partition in the pool facilitates quantification of one or more epigenetic features (e.g., methylation or other epigenetic feature(s) discussed in detail elsewhere herein), e.g., on a relative basis.
- epigenetic features e.g., methylation or other epigenetic feature(s) discussed in detail elsewhere herein
- the pool comprises about 20% of the DNA of a hypomethylated partition.
- the pool comprises a portion of a hypermethylated partition, which may be at least about 50% of the DNA of a hypermethylated partition.
- the pool may comprise at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the DNA of a hypermethylated partition.
- the pool comprises 50-55%, 55- Atty. Docket No. GH0154WO / 01228-0041-00PCT 60%, 60-65%, 65-70%, 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, or 95-100% of the DNA of a hypermethylated partition.
- the second pool comprises all or substantially all of the DNA of a hypermethylated partition.
- a first pool comprises substantially all or all of the DNA of a hypomethylated partition (e.g., wherein a second pool does not comprise DNA of a hypomethylated partition.
- the second pool does not comprise DNA of a hypomethylated partition (e.g., wherein the first pool comprises substantially all or all of the DNA of a hypomethylated partition).
- a second pool comprises a portion of a hypermethylated partition, which may be any of the values and ranges set forth above with respect to a hypomethylated partition.
- DNA molecules in a sample can be subject to a capture step, in which molecules having target sequences are captured for subsequent analysis.
- methods disclosed herein comprise a step of capturing one or more sets of target regions of DNA, such as cfDNA or DNA from a sample comprising cells (such as a blood sample). Capture may be performed using any suitable approach known in the art.
- Target capture can involve use of a bait set comprising oligonucleotide baits labeled with a capture moiety, such as biotin or the other examples noted below.
- the probes can have sequences selected to tile across a panel of regions, such as genes.
- a capturing step occurs prior to a step of sequencing by synthesis and prior to a step of performing a conversion procedure on a DNA molecule (such as on an adapted DNA molecule).
- a capturing step occurs prior to a step of sequencing by synthesis and after a step of performing a conversion procedure on a DNA molecule (such as on an adapted DNA molecule). In some embodiments, a capturing step occurs prior to a step of sequencing by synthesis and prior to a step of tagging a DNA molecule. In some embodiments, a capturing step occurs prior to a step of sequencing by synthesis and after a step of tagging a DNA molecule. In some embodiments, capturing can be performed before a step of ligating adapters (such as Y-shaped adapters as disclosed herein) to DNA molecules, e.g., to facilitate including partition tags in the one or more captured sets of target DNA.
- ligating adapters such as Y-shaped adapters as disclosed herein
- capturing comprises contacting the DNA to be captured with a set of target-specific probes.
- the set of target-specific probes may have any of the features described herein for sets of target-specific probes, including but not limited to in the embodiments set forth above and the sections relating to probes below. Capturing may be performed on one or more subsamples prepared during methods disclosed herein.
- DNA is captured from at least the first subsample or the second subsample, e.g., at least the first subsample and the second subsample.
- a method described herein comprises contacting cfDNA obtained from a subject with a set of target-specific probes, wherein the set of target-specific Atty. Docket No. GH0154WO / 01228-0041-00PCT probes is configured to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set.
- the volume of data needed to determine fragmentation patterns (e.g., to test fsor perturbation of transcription start sites or CTCF binding sites) or fragment abundance (e.g., in hypermethylated and hypomethylated partitions) is generally less than the volume of data needed to determine the presence or absence of cancer-related sequence mutations.
- the methods further comprise sequencing the captured cfDNA, e.g., to different degrees of sequencing depth for the epigenetic and sequence-variable target region sets, consistent with the discussion herein.
- complexes of target-specific probes and DNA are separated from DNA not bound to target-specific probes. For example, where target-specific probes are bound covalently or noncovalently to a solid support, a washing or aspiration step can be used to separate unbound material.
- the concentration of the probes for the sequence-variable target region set is greater than the concentration of the probes for the epigenetic target region set.
- the capturing step is performed with the sequence-variable target region probe set in a first vessel and with the epigenetic target region probe set in a second vessel, or the Atty. Docket No. GH0154WO / 01228-0041-00PCT contacting step is performed with the sequence-variable target region probe set at a first time and a first vessel and the epigenetic target region probe set at a second time before or after the first time.
- compositions comprising captured DNA corresponding to the sequence-variable target region set and captured DNA corresponding to the epigenetic target region set.
- the compositions can be processed separately as desired (e.g., to fractionate based on methylation as described elsewhere herein) and recombined in appropriate proportions to provide material for further processing and analysis such as sequencing.
- a captured set of DNA e.g., cfDNA
- the captured set of DNA may be provided, e.g., by performing a capturing step prior to a sequencing step as described herein.
- the captured set may comprise DNA corresponding to a sequence-variable target region set, an epigenetic target region set, or a combination thereof.
- a capture step is performed prior to a conversion step or after a conversion step.
- a first target region set is captured (e.g., from a sample or a first subsample), comprising at least epigenetic target regions.
- the epigenetic target regions captured from the first subsample may comprise hypermethylation variable target regions.
- the hypermethylation variable target regions are CpG-containing regions that are unmethylated or have low methylation in cfDNA from healthy subjects (e.g., below-average methylation relative to bulk cfDNA).
- the hypermethylation variable target regions are regions that show lower methylation in healthy cfDNA than in at least one other tissue type.
- cancer cells may shed more DNA into the bloodstream than healthy cells of the same tissue type.
- the distribution of tissue of origin of cfDNA may change upon carcinogenesis.
- an increase in the level of hypermethylation variable target regions in the first subsample can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer.
- a second target region set is captured from the second subsample, comprising at least epigenetic target regions.
- the epigenetic target regions may comprise hypomethylation variable target regions.
- the hypomethylation variable target regions are CpG-containing regions that are methylated or have high methylation in cfDNA from healthy subjects (e.g., above-average methylation relative to bulk cfDNA).
- the hypomethylation variable target regions are regions that show higher Atty. Docket No. GH0154WO / 01228-0041-00PCT methylation in healthy cfDNA than in at least one other tissue type.
- cancer cells may shed more DNA into the bloodstream than healthy cells of the same tissue type. As such, the distribution of tissue of origin of cfDNA may change upon carcinogenesis.
- an increase in the level of hypomethylation variable target regions in the second subsample can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer.
- the quantity of captured sequence-variable target region DNA is greater than the quantity of the captured epigenetic target region DNA, when normalized for the difference in the size of the targeted regions (footprint size).
- first and second captured sets may be provided, comprising, respectively, DNA corresponding to a sequence-variable target region set and DNA corresponding to an epigenetic target region set. The first and second captured sets may be combined to provide a combined captured set.
- the DNA corresponding to the sequence-variable target region set may be present at a greater concentration than the DNA corresponding to the epigenetic target region set, e.g., a 1.1 to 1.2-fold greater concentration, a 1.2- to 1.4-fold greater concentration, a 1.4- to 1.6-fold greater concentration, a 1.6- to 1.8-fold greater concentration, a 1.8- to 2.0-fold greater concentration, a 2.0- to 2.2-fold greater concentration, a 2.2- to 2.4-fold greater concentration a 2.4- to 2.6-fold greater concentration, a 2.6- to 2.8-fold greater concentration, a 2.8- to 3.0-fold greater concentration, a 3.0- to 3.5-fold greater concentration, a 3.5- to 4.0, a 4.0- to 4.5-fold greater concentration, a 4.5- to 5.0
- the DNA that is captured comprises intronic regions.
- the intronic regions comprise one or more introns likely to differentiate DNA from neoplastic (e.g., tumor or cancer) cells and from healthy cells, e.g., non-neoplastic circulating cells.
- an intron comprising a rearrangement known to be present in some neoplastic cells and absent from healthy cells can be used to differentiate DNA from neoplastic (e.g., tumor or cancer) cells and from healthy cells.
- the rearrangement is a translocation.
- captured intronic regions have a footprint of at least 30 bp, e.g., at least 100 bp, at least 200 bp, at least 500 bp, at least 1 kb, at least 2 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 50 kb, at least 200 kb, at least 300 kb, or at least 400 kb.
- the intronic target region set has a footprint in the range of 30 bp-1000 kb, e.g., 30 bp-100 bp, 100 bp-200 bp, 200 bp-500 bp, 500 bp-1kb, 1 kb-2 kb, 2 kb-5 kb, 5 kb-10 kb, 10 kb- 20 kb, 20 kb-50 kb, 50 kb-100 kb, 100-200 kb, 200-300 kb, 300-400 kb, 400-500 kb, 500-600 kb, 600-700 kb, 700-800 kb, 800-900 kb, and 900-1,000 kb.
- 30 bp-1000 kb e.g., 30 bp-100 bp, 100 bp-200 bp, 200 bp-500 bp, 500 bp-1kb, 1 kb-2 kb, 2
- Exemplary rearrangements, such as intronic translocations that can be detected using the methods described herein include but are not limited to translocations wherein at least one of the two genes involved in the translocation is a receptor tyrosine kinase.
- Exemplary translocation products are the BCR-ABL fusion. and fusions comprising any of ALK, FGFR2, FGFR3, NTRK1, RET, or ROS1.
- the DNA that is captured comprises target regions having a type-specific epigenetic variation.
- an epigenetic target region set consists of target regions having a type-specific epigenetic variation.
- nucleic acids captured or enriched using a method described herein comprise captured DNA, such as one or more captured sets of DNA.
- the captured DNA comprise target regions that are differentially methylated in different cell types (such as different immune cell types).
- a captured epigenetic target region set captured from a sample or first subsample comprises hypermethylation variable target regions.
- the hypermethylation variable target regions are differentially or exclusively hypermethylated in one or more related cell or tissue types.
- the hypermethylation variable target regions are differentially or exclusively hypermethylated in one cell type (such as in one cancer cell type and/or in one immune cell type).
- the hypermethylation variable target regions are hypermethylated to an extent that is distinguishably higher or exclusively present in one cell type (such as in one cancer cell type and/or in one immune cell type). Such hypermethylation variable target regions may be hypermethylated in other cell or tissue types but not to the extent observed in the one or more related cell or tissue types. In some embodiments, the hypermethylation variable target regions show lower methylation in healthy cfDNA than in at least one other tissue type. In some embodiments, the hypermethylation variable target regions show even higher methylation in cfDNA from a diseased cell of the one or more related cell or tissue types. In some embodiments, target regions comprise hypermethylated regions with aberrantly high copy number.
- the target regions are hypermethylated in healthy and diseased colon tissue and have aberrantly high copy number in pre-cancerous or cancerous colon tissue. Examples of such target regions are shown in Table 1 below.
- a gene is considered to comprise a DMR when the DMR is located within an untranslated region (UTR), intron, or exon of the gene, or within 5000 nucleotides of either the 5’ end of the sense strand of the 5’ UTR or the 3’ end of the sense strand of the 3’ UTR.
- Table 1 Hypermethylated target regions with aberrantly high copy number in colon cancer or pre-cancer Chromosomal region Genes comprising DMRs within the chromosomal region Atty. Docket No.
- Exemplary Hypermethylation Target Regions based on Lung Cancer studies Gene Name Chromosome Atty. Docket No. GH0154WO / 01228-0041-00PCT DKK3 chr11 LKB1 chr11 [ , d epigenetic target region set captured from a sample or subsample comprises hypomethylation variable target regions.
- the hypomethylation variable target regions are exclusively hypomethylated in one or more related cell or tissue types. In some embodiments, the hypomethylation variable target regions are exclusively hypomethylated in one cell type (such as in one cancer cell type and/or in one immune cell type). In some embodiments, the hypomethylation variable target regions are hypomethylated to an extent that is exclusively present in one cell type (such as in one cancer cell type and/or in one immune cell type). Such hypomethylation variable target regions may be hypomethylated in other cell or tissue types but not to the extent observed in the one or more cell or tissue types. In some embodiments, the hypomethylation variable target regions show higher methylation in healthy cfDNA than in at least one other tissue type.
- proliferating or activated immune cells and/or dying cancer cells may shed more DNA into the bloodstream than immune cells in a healthy individual and/or healthy cells of the same tissue type, respectively.
- the distribution of cell type and/or tissue of origin of cfDNA may change upon carcinogenesis.
- the presence and/or levels of cfDNA originating from certain cell or tissue types can be an indicator of disease. Variations in hypermethylation and/or hypomethylation can be an indicator of disease.
- an increase in the level of hypermethylation variable target regions and/or hypomethylation variable target regions in a subsample following a partitioning step can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer.
- Exemplary hypermethylation variable target regions and hypomethylation variable target regions useful for distinguishing between various cell types have been identified by analyzing DNA obtained from various cell types via whole genome bisulfite sequencing, as described, e.g., in Scott, C.A., Duryea, J.D., MacKay, H. et al., “Identification of cell type- specific methylation signals in bulk whole genome bisulfite sequencing data,” Genome Biol 21, 156 (2020) (doi.org/10.1186/s13059-020-02065-5).
- first and second captured target region sets comprise, respectively, DNA corresponding to a sequence-variable target region set and DNA corresponding to an epigenetic target region set, for example, as described in WO 2020/160414.
- the first and second captured sets may be combined to provide a combined captured set.
- the sequence-variable target region set and epigenetic target region set may have any of the features described for such sets in WO 2020/160414, which is incorporated by reference herein in its entirety.
- the epigenetic target region set comprises a hypermethylation variable target region set.
- the epigenetic target region set comprises a hypomethylation variable target region set. In some embodiments, the epigenetic target region set comprises CTCF binding regions. In some embodiments, the epigenetic target region set comprises fragmentation variable target regions. In some embodiments, the epigenetic target region set comprises transcriptional start sites. In some embodiments, the epigenetic target region set comprises regions that may show focal amplifications in cancer, e.g., one or more of AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2, FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PIK3CA, and RAF1.
- the epigenetic target region set comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 of the foregoing targets.
- the sequence-variable target region set comprises a plurality of regions known to undergo somatic mutations in cancer.
- the sequence-variable target region set targets a plurality of different genes or genomic regions (“panel”) selected such that a determined proportion of subjects having a cancer exhibits a genetic variant or tumor marker in one or more different genes or genomic regions in the panel.
- the panel may be selected to limit a region for sequencing to a fixed number of base pairs.
- the panel may be Atty. Docket No.
- Probes for detecting the panel of regions can include those for detecting genomic regions of interest (hotspot regions).
- Probes for detecting the panel of regions can include those for detecting genomic regions of interest (hotspot regions).
- a sequence-variable target region set used in the methods of the present disclosure comprises at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 of the genes of Table 3 of WO 2020/160414.
- a sequence-variable target region set used in the methods of the present disclosure comprises at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 of the genes of Table 4 of WO 2020/160414.
- GH0154WO / 01228-0041-00PCT AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1.
- the capture yield of the target-binding probes specific for the sequence-variable target region set is higher (e.g., at least 2-fold higher) than the capture yield of the target-binding probes specific for the epigenetic target region set.
- the collection of capture probes is configured to have a capture yield specific for the sequence-variable target region set higher (e.g., at least 2-fold higher) than its capture yield specific for the epigenetic target region set.
- the capture yield of the target-binding probes specific for the sequence-variable target region set is at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15-fold higher than the capture yield of the target-binding probes specific for the epigenetic target region set.
- the capture yield of the target-binding probes specific for the sequence-variable target region set is 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to 4.5-, 4.5- to 5-, 5- to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-, 9- to 10-, 10- to 11-, 11- to 12-, 13- to 14-, or 14- to 15-fold higher than the capture yield of the target-binding probes specific for the epigenetic target region set.
- the collection of capture probes is configured to have a capture yield specific for the sequence-variable target region set at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15-fold higher than its capture yield for the epigenetic target region set.
- the collection of capture probes is configured to have a capture yield specific for the sequence-variable target region set is 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, Atty. Docket No.
- concentration may refer to the average mass per volume concentration of individual probes in each set.
- the capture probes specific for the sequence-variable target region set have a higher affinity for their targets than the capture probes specific for the epigenetic target region set. Affinity can be modulated in any way known to those skilled in the art, including by using different probe chemistries. For example, certain nucleotide modifications, such as cytosine 5-methylation (in certain sequence contexts), modifications that provide a heteroatom at the 2’ sugar position, and LNA nucleotides, can increase stability of double-stranded nucleic acids, indicating that oligonucleotides with such modifications have relatively higher affinity for their complementary sequences.
- nucleotide modifications such as the substitution of the nucleobase hypoxanthine for guanine, reduce affinity by reducing the amount of hydrogen bonding between the Atty. Docket No. GH0154WO / 01228-0041-00PCT oligonucleotide and its complementary sequence.
- the capture probes specific for the sequence-variable target region set have modifications that increase their affinity for their targets.
- the capture probes are linked to a solid support, e.g., covalently or non-covalently such as through the interaction of a binding pair of capture moieties.
- the solid support is a bead, such as a magnetic bead.
- the capture probes specific for the sequence-variable target region set and/or the capture probes specific for the epigenetic target region set are a capture probe set as discussed above, e.g., probes comprising capture moieties and sequences selected to tile across a panel of regions, such as genes.
- the capture probes are provided in a single composition.
- the single composition may be a solution (liquid or frozen).
- the capture probes may be provided as a plurality of compositions, e.g., comprising a first composition comprising probes specific for the epigenetic target region set and a second composition comprising probes specific for the sequence-variable target region set. These probes may be mixed in appropriate proportions to provide a combined probe composition with any of the foregoing fold differences in concentration and/or capture yield. Alternatively, they may be used in separate capture procedures (e.g., with aliquots of a sample or sequentially with the same sample) to provide first and second compositions comprising captured epigenetic target regions and sequence-variable target regions, respectively. 1.
- the probes for the epigenetic target region set may comprise probes specific for one or more types of target regions likely to differentiate DNA from neoplastic (e.g., tumor or cancer) Atty. Docket No. GH0154WO / 01228-0041-00PCT cells from healthy cells, e.g., non-neoplastic circulating cells. Exemplary types of such regions are discussed in detail herein, e.g., in the sections above concerning captured sets.
- the probes for the epigenetic target region set may also comprise probes for one or more control regions, e.g., as described herein.
- the probes for the epigenetic target region set have a footprint of at least 100 kbp, e.g., at least 200 kbp, at least 300 kbp, or at least 400 kbp.
- the epigenetic target region set has a footprint in the range of 100-20 Mbp, e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, 1.5-2 Mbp, 2-3 Mbp, 3-4 Mbp, 4-5 Mbp, 5-6 Mbp, 6-7 Mbp, 7-8 Mbp, 8-9 Mbp, 9-10 Mbp, or 10-20 Mbp.
- the epigenetic target region set has a footprint of at least 20 Mbp.
- the probes for the epigenetic target region set comprise probes specific for one or more hypermethylation variable target regions.
- Hypermethylation variable target regions may also be referred to herein as hypermethylated DMRs (differentially methylated regions).
- the hypermethylation variable target regions may be any of those set forth above.
- the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1.
- the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 2.
- the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1 or Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1 or Table 2.
- each locus included as a target region there may be one or more probes with a hybridization site that binds between the transcription start site and the stop codon (the last stop codon for genes that are alternatively spliced) of the gene.
- the one or more probes bind within 300 bp of the listed position, e.g., within 200 or 100 bp.
- a probe has a hybridization site overlapping the position listed above.
- the probes specific for the hypermethylation target regions include probes specific for one, two, three, four, or five subsets Atty. Docket No.
- the probes for the epigenetic target region set comprise probes specific for one or more hypomethylation variable target regions.
- Hypomethylation variable target regions may also be referred to herein as hypomethylated DMRs (differentially methylated regions).
- the hypomethylation variable target regions may be any of those set forth above.
- the probes specific for one or more hypomethylation variable target regions may include probes for regions such as repeated elements, e.g., LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and satellite DNA, and intergenic regions that are ordinarily methylated in healthy cells may show reduced methylation in tumor cells.
- probes specific for hypomethylation variable target regions include probes specific for repeated elements and/or intergenic regions.
- probes specific for repeated elements include probes specific for one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA.
- Exemplary probes specific for genomic regions that show cancer-associated hypomethylation include probes specific for nucleotides 8403565-8953708 and/or 151104701- 151106035 of human chromosome 1.
- the probes specific for hypomethylation variable target regions include probes specific for regions overlapping or comprising nucleotides 8403565-8953708 and/or 151104701-151106035 of human chromosome 1.
- CTCF binding regions [000385]
- the probes for the epigenetic target region set include probes specific for CTCF binding regions.
- the probes specific for CTCF binding regions comprise probes specific for at least 10, 20, 50, 100, 200, or 500 CTCF binding regions, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 CTCF binding regions, e.g., such as CTCF binding regions described above or in one or more of CTCFBSDB or the Cuddapah et al., Martin et al., or Rhee et al. articles cited above.
- the probes for the epigenetic target region set comprise at least 100 bp, at least 200 bp at least 300 bp, at least 400 Atty. Docket No.
- the probes for the epigenetic target region set include probes specific for transcriptional start sites.
- the probes specific for transcriptional start sites comprise probes specific for at least 10, 20, 50, 100, 200, or 500 transcriptional start sites, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 transcriptional start sites, e.g., such as transcriptional start sites listed in DBTSS.
- the probes for the epigenetic target region set comprise probes for sequences at least 100 bp, at least 200 bp, at least 300 bp, at least 400 bp, at least 500 bp, at least 750 bp, or at least 1000 bp upstream and downstream of the transcriptional start sites.
- Focal amplifications As noted above, although focal amplifications are somatic mutations, they can be detected by sequencing based on read frequency in a manner analogous to approaches for detecting certain epigenetic changes such as changes in methylation. As such, regions that may show focal amplifications in cancer can be included in the epigenetic target region set, as discussed above.
- the probes specific for the epigenetic target region set include probes specific for focal amplifications.
- the probes specific for focal amplifications include probes specific for one or more of AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2, FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PIK3CA, and RAF1.
- the probes specific for focal amplifications include probes specific for one or more of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 of the foregoing targets. f.
- control regions [000388] It can be useful to include control regions to facilitate data validation.
- the probes specific for the epigenetic target region set include probes specific for control methylated regions that are expected to be methylated in essentially all samples.
- the probes specific for the epigenetic target region set include probes specific for control hypomethylated regions that are expected to be hypomethylated in essentially all samples. Atty. Docket No. GH0154WO / 01228-0041-00PCT 2.
- Probes specific for sequence-variable target regions [000389]
- the probes for the sequence-variable target region set may comprise probes specific for a plurality of regions known to undergo somatic mutations in cancer.
- the probes may be specific for any sequence-variable target region set described herein. Exemplary sequence- variable target region sets are discussed in detail herein, e.g., in the sections above concerning captured sets. [000390]
- the sequence-variable target region probe set has a footprint of at least 0.5 kb, e.g., at least 1 kb, at least 2 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, or at least 40 kb.
- the epigenetic target region probe set has a footprint in the range of 0.5-100 kb, e.g., 0.5-2 kb, 2-10 kb, 10-20 kb, 20-30 kb, 30-40 kb, 40-50 kb, 50-60 kb, 60-70 kb, 70-80 kb, 80-90 kb, and 90-100 kb.
- the sequence- variable target region probe set has a footprint of at least 50 kbp, e.g., at least 100 kbp, at least 200 kbp, at least 300 kbp, or at least 400 kbp.
- the sequence-variable target region probe set has a footprint in the range of 100-2000 kbp, e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp or 1.5-2 Mbp. In some embodiments, the sequence-variable target region set has a footprint of at least 2 Mbp.
- probes specific for the sequence-variable target region set comprise probes specific for at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or at 70 of the genes of Table 3.
- probes specific for the sequence- variable target region set comprise probes specific for the at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 of the SNVs of Table 3.
- probes specific for the sequence-variable target region set comprise probes specific for at least 1, at least 2, at least 3, at least 4, at least 5, or 6 of the fusions of Table 3. In some embodiments, probes specific for the sequence-variable target region set comprise probes specific for at least a portion of at least 1, at least 2, or 3 of the indels of Table 3. In some embodiments, probes specific for the sequence- variable target region set comprise probes specific for at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 of the genes of Table 4.
- probes specific for the sequence-variable target region set comprise probes specific for at least 5, Atty. Docket No. GH0154WO / 01228-0041-00PCT at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 of the SNVs of Table 4.
- probes specific for the sequence-variable target region set comprise probes specific for at least 1, at least 2, at least 3, at least 4, at least 5, or 6 of the fusions of Table 4.
- probes specific for the sequence-variable target region set comprise probes specific for at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, or 18 of the indels of Table 4.
- probes specific for the sequence-variable target region set comprise probes specific for at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 of the genes of Table 5.
- GH0154WO / 01228-0041-00PCT K IT chr4 55575579 55575719 140 7 KIT chr4 55589739 55589874 135 8 , F, Atty. Docket No. GH0154WO / 01228-0041-00PCT 12574 (total , comprise probes specific for target regions from at least 10, 20, 30, or 35 cancer-related genes, such as AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1.
- cancer-related genes such as
- FIG.2 shows a computer system 201 that is programmed or otherwise configured to implement the methods of the present disclosure.
- the computer system 201 can regulate various aspects sample preparation, sequencing, and/or analysis.
- the computer system 201 is configured to perform sample preparation and sample analysis, including nucleic acid sequencing, e.g., according to any of the methods disclosed herein.
- the computer system 201 includes a central processing unit (CPU, also "processor” and “computer processor” herein) 205, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- CPU central processing unit
- the computer system 201 also includes memory or memory location 210 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 215 (e.g., hard disk), communication interface 220 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 225, such as cache, other memory, data storage, and/or electronic display adapters.
- the memory 210, storage unit 215, interface 220, and peripheral devices 225 are in communication with the CPU 205 through a communication network or bus (solid lines), such as a motherboard.
- the storage unit 215 can be a data storage unit (or data repository) for storing data.
- the computer system 201 can be operatively coupled to a computer network 230 with the aid of the communication interface 220.
- the computer network 230 can be the Internet, an internet and/or extranet, or an intranet and/or Atty. Docket No. GH0154WO / 01228-0041-00PCT extranet that is in communication with the Internet.
- the computer network 230 in some cases is a telecommunication and/or data network.
- the computer network 230 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the computer network 230 in some cases with the aid of the computer system 0, can implement a peer-to-peer network, which may enable devices coupled to the computer system 201 to behave as a client or a server.
- the CPU 205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 210. Examples of operations performed by the CPU 205 can include fetch, decode, execute, and writeback. [000399]
- the storage unit 215 can store files, such as drivers, libraries, and saved programs.
- the storage unit 215 can store programs generated by users and recorded sessions, as well as output(s) associated with the programs.
- the storage unit 215 can store user data, e.g., user preferences and user programs.
- the computer system 201 in some cases can include one or more additional data storage units that are external to the computer system 201, such as located on a remote server that is in communication with the computer system 201 through an intranet or the Internet.
- the computer system 201 can communicate with one or more remote computer systems through the network 230.
- the computer system 201 can communicate with a remote computer system of a user (e.g., operator).
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 201 via the network 230.
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 201, such as, for example, on the memory 210 or electronic storage unit 215.
- the machine executable or machine-readable code can be provided in the form of software.
- the code can be executed by the processor 205.
- the code can be retrieved from the storage unit 215 and stored on the memory 210 for ready access by the processor 205.
- the Atty. Docket No. GH0154WO / 01228-0041-00PCT electronic storage unit 215 can be precluded, and machine-executable instructions are stored on memory 210.
- the present disclosure provides a non-transitory computer-readable medium comprising computer-executable instructions which, when executed by at least one electronic processor, perform at least a portion of a method comprising: (a) performing a sequencing by synthesis reaction on a converted DNA molecule with a sequencing by synthesis instrument, the converted DNA molecule comprising: ligated adapters; a converted region comprising one or more nucleobases that have been converted by a conversion procedure; and a resistant region comprising one or more nucleobases that are resistant to the conversion procedure; wherein the sequencing by synthesis reaction comprises extending a sequencing primer that binds to the converted DNA molecule upstream of the converted region and the resistant region; (b) calibrating one or more base calling metrics of the sequencing by synthesis instrument based at least in part on data from the resistant region, thereby providing one or more calibrated base calling metrics; and (c) calling at least a portion of nucleobases in the converted region using the one or more calibrated base calling metrics.
- the code can be pre-compiled and configured for use with a machine have a processer adapted to execute the code or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- the present methods can be used to diagnose the presence of a condition, e.g., cancer or precancer, in a subject, to characterize a condition (such as to determine a cancer stage or determining heterogeneity of a cancer), to monitor a subject’s response to receiving a treatment for a condition (such as a response to a chemotherapeutic or immunotherapeutic), assess prognosis of a subject (such as to predict a survival outcome in a subject having a cancer), to determine a subject’s risk of developing a condition, to predict a subsequent course of a condition in a subject, to determine metastasis or recurrence of a cancer in a subject (or a risk of cancer metastasis or recurrence), and/or to monitor a subject’s health as part of a preventative health monitoring program (such as to determine whether and/or when a subject is in need of further diagnostic screening).
- a condition e.g., cancer or precancer
- a condition such as
- High and low consumption can be defined, e.g., as exceeding or falling below, respectively, recommendations in Dietary Guidelines for Americans 2020-2025, available at dietaryguidelines.gov/sites/default/files/2021- 03/Dietary_Guidelines_for_Americans-2020-2025.pdf.
- the subject has high alcohol consumption, e.g., at least three, four, or five drinks per day on average (where a drink is about one ounce or 30 mL of 80-proof hard liquor or the equivalent).
- the subject has a family history of cancer, e.g., at least one, two, or three blood relatives were previously diagnosed with cancer.
- the relatives are at least third-degree relatives (e.g., great-grandparent, great aunt or uncle, first cousin), at least second- degree relatives (e.g., grandparent, aunt or uncle, or half-sibling), or first-degree relatives (e.g., parent or full sibling).
- the disease under consideration is a type of cancer.
- ALL acute lymph
- the cancer is a type of cancer that is not a hematological cancer, e.g., a solid tumor cancer such as a carcinoma, adenocarcinoma, or sarcoma.
- Type and/or stage of cancer can be detected from genetic variations including mutations, rare mutations, indels, rearrangements, copy number variations, transversions, translocations, recombinations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, such as 5mC and 5mC profiles.
- determining the level of target regions comprises determining either an increased level or decreased level of target regions, wherein the increased or decreased level of target regions is determined by comparing the level of target regions with a threshold level/value.
- a threshold level/value Atty. Docket No. GH0154WO / 01228-0041-00PCT [000413]
- the present methods can be used to diagnose, prognose, monitor or observe cancers, precancers, or other diseases.
- the methods herein do not involve the diagnosing, prognosing or monitoring a fetus and as such are not directed to non-invasive prenatal testing.
- these methodologies may be employed in a pregnant subject to diagnose, prognose, monitor or observe cancers or other diseases in an unborn subject whose DNA and other polynucleotides may co-circulate with maternal molecules.
- Non-limiting examples of other genetic-based diseases, disorders, or conditions that are optionally evaluated using the methods and systems disclosed herein include achondroplasia, alpha- 1 antitrypsin deficiency, antiphospholipid syndrome, autism, autosomal dominant polycystic kidney disease, Charcot-Marie-Tooth (CMT), cri du chat, Crohn's disease, cystic fibrosis, Dercum disease, down syndrome, Duane syndrome, Duchenne muscular dystrophy, Factor V Leiden thrombophilia, familial hypercholesterolemia, familial mediterranean fever, fragile X syndrome, Gaucher disease, hemochromatosis, hemophilia, holoprosencephaly, Huntington's disease, Klinefelter syndrome, Marfan syndrome, myotonic dys
- Genetic and/or epigenetic data can also be used for characterizing a specific form of cancer. Cancers are often heterogeneous in both composition and staging. Genetic profile data may allow characterization of specific sub-types of cancer that may be important in the diagnosis or treatment of that specific sub-type. This information may also provide a subject or practitioner clues regarding the prognosis of a specific type of cancer and allow either a subject or practitioner to adapt treatment options in accord with the progress of the disease. Some cancers can progress to become more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The system and methods of this disclosure may be useful in determining disease progression.
- the methods of the disclosure may be used to characterize the heterogeneity of an abnormal condition in a subject.
- Such methods can include, e.g., generating a genetic profile of extracellular polynucleotides (e.g., cfDNA) derived from the subject, e.g., wherein the genetic profile comprises a plurality of data resulting from copy number variation and rare mutation analyses.
- an abnormal condition is cancer, e.g., as described herein.
- the abnormal condition may be one resulting in a heterogeneous genomic population.
- heterogeneity may comprise multiple foci of disease, such as where one or more foci (such as one or more tumor foci) are the result of metastases that have spread from a primary site of a cancer.
- the tissue(s) of origin can be useful for identifying organs affected by the cancer, including the primary cancer and/or metastatic tumors.
- the present methods can be used to generate or profile, fingerprint or set of data that is a summation of genetic information derived from different cells in a heterogeneous disease. This set of data may comprise copy number variation, epigenetic variation, and mutation analyses alone or in combination.
- the sample is obtained from a subject who was previously diagnosed with a cancer and received one or more previous cancer treatments. In some embodiments, the sample is obtained at one or more preselected time points following the one or more previous cancer treatments.
- a method described herein comprises detecting a presence or absence of a nucleic acid originating or derived from a tumor cell at a preselected timepoint following a previous cancer treatment of a subject previously diagnosed with cancer. The method may further comprise determining a cancer recurrence score that is indicative of the presence or levels of DNA and/or RNA originating or derived from the tumor cell for the subject.
- a cancer recurrence score is compared with a predetermined cancer recurrence threshold, and the subject is classified as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for therapy when the cancer recurrence score is below the cancer recurrence threshold.
- a cancer recurrence score equal to the cancer recurrence threshold Atty. Docket No. GH0154WO / 01228-0041-00PCT may result in classification as either a candidate for a subsequent cancer treatment or not a candidate for therapy.
- the one or more methods described in the present disclosure may be used to assist in the treatment of a type of cancer.
- the biomarker may include an epigenetic signature, such as a methylation state, methylation score and/or DNA fragmentation pattern/score.
- the epigenetic signature can be determined for one or more regions that include, but not limited to, transcription start sites, promoter regions, CTCF binding regions and regulatory protein binding regions.
- the epigenetic signature is determined for one or more regions that include, but not limited to, transcription start sites, promoter regions, intergenic regions and/or intronic regions that are associated with at least one or more genes listed in any one or more of Tables 1, 2, 3, 4, and 5.
- Such treatments may include small-molecule drugs or monoclonal antibodies.
- the methods may also improve biomarker testing in individuals suffering from disease and help determine if the individual is a candidate for a certain drug or combination of drugs based on the presence or absence of the biomarker. Additionally, the methods can improve identification of mutations that contribute to the development of resistance to targeted therapy. Consequently, the analysis techniques may reduce unnecessary or untimely therapeutic interventions, patient suffering, and patient mortality.
- the present methods can also be used to quantify levels of different cell types, such as cancer cell types and/or immune cell types, including rare immune cell types, such as activated lymphocytes and myeloid cells at particular stages of differentiation. Such quantification can be based on the numbers of molecules corresponding to a given cell type in a sample.
- quantities of each of a plurality of cell types are determined based on sequencing and analysis (such as determination of epigenetic and/or genomic signatures) of DNA isolated from at least one sample comprising cells (such as a buffy coat sample or another type of blood sample (e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample) from a subject.
- sequencing and analysis such as determination of epigenetic and/or genomic signatures
- a plurality of immune cell types can include, but is not limited to, macrophages (including M1 macrophages and M2 macrophages), activated B cells (including regulatory B cells, memory B cells and plasma cells); T cell subsets, such as central memory T cells, na ⁇ ve-like T cells, and activated T cells (including Atty. Docket No.
- GH0154WO / 01228-0041-00PCT cytotoxic T cells, regulatory T cells (Tregs), CD4 effector memory T cells, CD4 central memory T cells, CD8 effector memory T cells, and CD8 central memory T cells); immature myeloid cells (including myeloid-derived suppressor cells (MDSCs), low-density neutrophils, immature neutrophils, and immature granulocytes); and natural killer (NK) cells.
- Tregs regulatory T cells
- CD4 effector memory T cells CD4 central memory T cells
- CD8 effector memory T cells CD8 central memory T cells
- CD8 central memory T cells CD8 central memory T cells
- immature myeloid cells including myeloid-derived suppressor cells (MDSCs), low-density neutrophils, immature neutrophils, and immature granulocytes
- NK natural killer
- Sequence information obtained in the present methods may comprise sequence reads of the nucleic acids generated by a nucleic acid sequencer.
- the nucleic acid sequencer performs pyrosequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, sequencing-by-synthesis, 5-letter sequencing, 6-letter sequencing, sequencing-by-ligation or sequencing-by-hybridization on the nucleic acids to generate sequencing reads.
- the method further comprises grouping the sequence reads into families of sequence reads, each family comprising sequence reads generated from a nucleic acid in the sample.
- the methods comprise determining the likelihood that the subject from which the sample was obtained has cancer, precancer, an infection, transplant rejection, or another disease or disorder.
- the disease or disorder is related to changes in proportions of types of immune cells.
- comparisons of cell identities and/or cell quantities/proportions (such as cancer cell identities and/or quantities/proportions, and/or immune cell identities and/or quantities/proportions) between two or more samples collected from a subject at two different time points can allow for monitoring of one or more aspects of a condition in the subject over time, such as a response of the subject to a treatment, the severity of the condition (such as a cancer stage) in the subject, a recurrence of the condition (such as a cancer), and/or the subject’s risk of developing the condition (such as a cancer).
- a method provided herein is a method of determining a risk of cancer recurrence in a subject.
- a method provided herein is or comprises a method of detecting the presence of absence of a metastasis in a subject. In some embodiments, a method provided herein is or comprises a method of classifying a subject as being a candidate for a subsequent cancer treatment.
- Any of such methods may comprise collecting nucleic acids (e.g., DNA originating or derived from a tumor cell) from the subject diagnosed with the cancer at one or more preselected timepoints following one or more previous cancer treatments to the subject.
- the subject may be any of the subjects described herein.
- the DNA may be DNA, such as cfDNA, from a blood sample (e.g., a whole blood sample).
- the DNA may comprise DNA obtained from a tissue sample.
- Any of such methods may comprise enriching for a plurality of sets of target regions from DNA and/or RNA from the subject, wherein the plurality of target region sets comprise a sequence-variable target region set, and/or an epigenetic target region set, whereby a captured set of nucleic acid molecules is produced.
- the enriching step may be performed according to any of the embodiments described elsewhere herein.
- the previous cancer treatment may comprise surgery, administration of a therapeutic composition, and/or chemotherapy.
- Any of such methods may comprise sequencing the enriched DNA molecules or enriched cDNA molecules generated from RNA, whereby a set of sequence information is produced.
- the enriched DNA molecules of a sequence-variable target region set may be sequenced to a greater depth of sequencing than the captured DNA molecules of the epigenetic target region set.
- Any of such methods may comprise detecting a presence or absence of DNA originating or derived from a tumor cell at a preselected timepoint using the set of sequence information.
- the detection of the presence or absence of DNA, such as cfDNA, originating or derived from a tumor cell may be performed according to any of the embodiments thereof described elsewhere herein.
- Methods of determining a risk of cancer recurrence in a subject may comprise determining a cancer recurrence score that is indicative of the presence or absence, or amount, of the DNA, such as cfDNA, originating or derived from the tumor cell for the subject.
- the cancer Atty. Docket No. GH0154WO / 01228-0041-00PCT recurrence score may further be used to determine a cancer recurrence status.
- the cancer recurrence status may be at risk for cancer recurrence, e.g., when the cancer recurrence score is above a predetermined threshold.
- the cancer recurrence status may be at low or lower risk for cancer recurrence, e.g., when the cancer recurrence score is above a predetermined threshold.
- a cancer recurrence score equal to the predetermined threshold may result in a cancer recurrence status of either at risk for cancer recurrence or at low or lower risk for cancer recurrence.
- Methods herein may comprise additional steps to determine whether a metastasis is present.
- Methods of classifying a subject as being a candidate for a subsequent cancer treatment may comprise comparing the cancer recurrence score of the subject with a predetermined cancer recurrence threshold, thereby classifying the subject as a candidate for the subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for therapy when the cancer recurrence score is below the cancer recurrence threshold.
- a cancer recurrence score equal to the cancer recurrence threshold may result in classification as either a candidate for a subsequent cancer treatment or not a candidate for therapy.
- a number of mutations in the sequence-variable target regions chosen from 1, 2, 3, 4, or 5 is sufficient for the first subscore to result in a cancer recurrence score classified as positive for cancer recurrence.
- the number of mutations is chosen from 1, 2, or 3. Atty. Docket No.
- the set of sequence information comprises epigenetic target region sequences
- determining the cancer recurrence score comprises determining a second subscore indicative of the amount of molecules (obtained from the epigenetic target region sequences) that represent an epigenetic state different from DNA found in a corresponding sample from a healthy subject (e.g., DNA, such as cfDNA, from a blood sample (e.g., a whole blood sample), and/or DNA found in a tissue sample from a healthy subject where the tissue sample is of the same type of tissue as was obtained from the subject).
- DNA such as cfDNA
- abnormal molecules i.e., molecules with an epigenetic state different from DNA found in a corresponding sample from a healthy subject
- epigenetic changes associated with cancer e.g., methylation of hypermethylation variable target regions and/or perturbed fragmentation of fragmentation variable target regions, where “perturbed” means different from DNA found in a corresponding sample from a healthy subject.
- a proportion of molecules corresponding to the hypermethylation variable target region set and/or fragmentation variable target region set that indicate hypermethylation in the hypermethylation variable target region set and/or abnormal fragmentation in the fragmentation variable target region set greater than or equal to a value in the range of 0.001%-10% is sufficient for the second subscore to be classified as positive for cancer recurrence.
- the range may be 0.001%-1%, 0.005%-1%, 0.01%-5%, 0.01%-2%, or 0.01%-1%.
- any of such methods may comprise determining a fraction of tumor DNA from the fraction of molecules in the set of sequence information that indicate one or more features indicative of origination from a tumor cell.
- the fraction of tumor DNA may be determined based on a combination of molecules corresponding to epigenetic target regions and molecules corresponding to sequence variable target regions. Atty. Docket No.
- Determination of a cancer recurrence score may be based at least in part on the fraction of tumor DNA, wherein a fraction of tumor DNA greater than a threshold in the range of 10 -11 to 1 or 10 -10 to 1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence.
- a fraction of tumor DNA greater than or equal to a threshold in the range of 10 –10 to 10 –9 , 10 –9 to 10 –8 , 10 –8 to 10 –7 , 10 –7 to 10 –6 , 10 –6 to 10 –5 , 10 –5 to 10 –4 , 10 –4 to 10 –3 , 10 –3 to 10 –2 , or 10 –2 to 10 –1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence.
- the fraction of tumor DNA greater than a threshold of at least 10 -7 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence.
- a determination that a fraction of tumor DNA is greater than a threshold may be made based on a cumulative probability. For example, the sample was considered positive if the cumulative probability that the tumor fraction was greater than a threshold in any of the foregoing ranges exceeds a probability threshold of at least 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.995, or 0.999. In some embodiments, the probability threshold is at least 0.95, such as 0.99.
- the set of sequence information comprises sequence-variable target region sequences and epigenetic target region sequences
- determining the cancer recurrence score comprises determining a first subscore indicative of the amount of SNVs, insertions/deletions, CNVs and/or fusions present in sequence-variable target region sequences and a second subscore indicative of the amount of abnormal molecules in epigenetic target region sequences, and combining the first and second subscores to provide the cancer recurrence score.
- subscores may be combined by applying a threshold to each subscore independently in sequence-variable target regions, respectively, and greater than a predetermined fraction of abnormal molecules (i.e., molecules with an epigenetic state different from the DNA found in a corresponding sample from a healthy subject; e.g., tumor) in epigenetic target regions), or training a machine learning classifier to determine status based on a plurality of positive and negative training samples.
- a value for the combined score in the range of -4 to 2 or -3 to 1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence.
- the cancer recurrence status of the subject may be at risk for cancer recurrence and/or the subject may be classified as a candidate for a subsequent cancer treatment.
- Atty. Docket No. GH0154WO / 01228-0041-00PCT the cancer is any one of the types of cancer described elsewhere herein, e.g., colorectal cancer. 3.
- the present methods can be used to monitor one or more aspects of a condition in a subject over time, such as a subject’s response to receiving a treatment for a condition (such as a response to a chemotherapeutic or immunotherapeutic), the severity of the condition (such as a cancer stage) in the subject, a recurrence of the condition (such as a cancer), and/or the subject’s risk of developing the condition (such as a cancer) and/or to monitor a subject’s health as part of a preventative health monitoring program (such as to determine whether and/or when a subject is in need of further diagnostic screening).
- a condition in a subject over time such as a subject’s response to receiving a treatment for a condition (such as a response to a chemotherapeutic or immunotherapeutic), the severity of the condition (such as a cancer stage) in the subject, a recurrence of the condition (such as a cancer), and/or the subject’s risk of developing the condition (such as a cancer)
- monitoring comprises analysis of at least two samples collected from a subject at at least two different time points as described herein.
- the methods according to the present disclosure can be useful in predicting a subject’s response to a particular treatment option, such as over a period of time.
- successful treatment options may increase the amount of cancer associated DNA sequences detected in a subject's blood, such as if the treatment is successful as more cancers may die and shed DNA.
- certain treatment options may be correlated with genetic profiles of cancers over time. This correlation may be useful in selecting a therapy.
- quantities of each of a plurality of cell types are determined based on sequencing and analysis (such as determination of epigenetic and/or genomic signatures) of DNA isolated from at least one sample comprising cells (such as a buffy coat sample or another type of blood sample (e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample) from a subject.
- sequencing and analysis such as determination of epigenetic and/or genomic signatures
- DNA isolated from at least one sample comprising cells such as a buffy coat sample or another type of blood sample (e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample) from a subject.
- differences in levels and/or presence of particular genetic and/or epigenetic signatures in DNA isolated from blood samples from a subject can be used to quantify cell types, such as cancer cell types and/or immune cell types, within the sample.
- a comparison of the disclosed genetic and/or epigenetic signatures in DNA isolated from blood samples collected from a subject at two or more time points can be used to monitor changes in cell type quantities in the subject under different conditions (such as prior to and after a treatment), or over time (e.g., as part of a preventative health monitoring program).
- the disclosed methods can include evaluating (such as quantifying) and/or interpreting cell types (such as cancer cell types and/or immune cell types) present in one or more samples comprising cells (such as a buffy coat sample or another type of blood sample (e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample), a leukapheresis sample, or a PBMC sample) collected from a subject at one or more timepoints in comparison to a selected baseline value or reference standard (or a selected set of baseline values or reference standards).
- a buffy coat sample or another type of blood sample e.g., a whole blood sample, a leukapheresis sample, or a PBMC sample
- a leukapheresis sample e.g., a PBMC sample
- a baseline value or reference standard may be a quantity of cell types measured in one or more samples (such as an average quantity or range of quantities of cell types present in at least two samples) collected from the subject at one or more time points, such as prior to receiving a treatment, prior to diagnosis of a condition (such as a cancer), or as part of a preventative health monitoring program.
- a baseline value or reference standard may be a quantity of cell types measured in one or more samples (such as an average quantity or range of quantities of cell types present in at least two samples) collected at one or more timepoints from one or more subjects that do not have the condition (such as a healthy subject that does not have a cancer), one or more subjects that responded favorably to the treatment, or one or more subjects that have not received the treatment.
- the baseline value or reference standard utilized is a standard or profile derived from a single reference subject. In other embodiments, the baseline value or reference standard utilized is a standard or profile derived from averaged data from multiple reference subjects.
- the reference standard in various embodiments, can be a single value, a mean, an average, a numerical mean or range of numerical means, a numerical pattern, or a graphical pattern created from the cell type quantity data derived from a single reference subject or from multiple reference subjects. Selection of the particular baseline values or reference standards, or selection of the one or more reference subjects, depends upon the use to which the methods described herein are to be put by, for example, a research scientist or a clinician (such as a physician).
- one or more samples comprising cells may be collected from a subject at two or more timepoints, to assess changes in cell types (such as changes in quantities of cell types) between the two or more timepoints.
- a sample collected at a first time point is a tissue sample or a blood sample
- a sample collected at a subsequent time point is a blood sample.
- a sample collected at a first time point is a tissue sample and a Atty. Docket No.
- GH0154WO / 01228-0041-00PCT sample collected at a subsequent time point is a blood sample.
- the present methods can be used, for example, to determine the presence or absence of a condition (such as a cancer), a response of the subject to a treatment, one or more characteristic of a condition (such as a cancer stage) in the subject, recurrence of a condition (such as a cancer), and/or a subject’s risk of developing a condition (such as a cancer).
- methods are provided wherein quantities of cell types present in at least one sample (such as at least one whole blood sample, buffy coat sample, leukapheresis sample, or PBMC sample) collected from a subject at one or more timepoints (such as prior to receiving a treatment) are compared to quantities of cell types present in at least one sample collected from the subject at one or more different time points (such as after receiving the treatment).
- quantities of cell types present in at least one sample such as at least one whole blood sample, buffy coat sample, leukapheresis sample, or PBMC sample
- the disclosed methods can allow for patient-specific monitoring, such that, for example, differences in cell type quantities between samples collected from the subject at different timepoints may indicate changes (such as presence or absence of a condition, response to a treatment, a prognosis, or the like) that are significant with respect to the subject but may yet fall within a normal range of a general healthy population.
- methods are provided for monitoring one or more aspects of a condition in a subject over time, such as but not limited to, a subject’s response to receiving a treatment for a condition (such as a response to a chemotherapeutic or immunotherapeutic).
- one or more samples comprising cells are collected from a subject at least once per week, such as on 1-4 days, 1-2 days, or on 1, 2, 3, 4, 5, 6, or 7 days per week.
- one or more samples is collected from the subject at least once per month, such as 1-15 times, 1-10 times, 2-5 times, or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times per month.
- the immunotherapy or immunotherapeutic agent is an antagonist of an inhibitory immune checkpoint molecule.
- the inhibitory immune checkpoint molecule is PD-1.
- the inhibitory immune checkpoint molecule is PD-L1.
- the antagonist of the inhibitory immune checkpoint molecule is an antibody (e.g., a monoclonal antibody).
- the antibody or monoclonal antibody is an anti- CTLA4, anti-PD-1, anti-PD-L1, or anti-PD-L2 antibody.
- the antibody is a monoclonal anti-PD-1 antibody.
- the immune checkpoint molecule is a ligand of a co-stimulatory molecule, including, for example, CD80, CD86, B7RP1, B7-H3, B7-H4, CD137L, OX40L, or CD70.
- agonists that target these co-stimulatory checkpoint molecules can be used to enhance antigen-specific T cell responses against certain cancers.
- the immunotherapy or immunotherapeutic agent is an agonist of a co-stimulatory checkpoint molecule.
- the agonist of the co-stimulatory checkpoint molecule is an agonist antibody and preferably is a monoclonal antibody.
- the agonist antibody or monoclonal antibody is an anti-CD28 antibody.
- the agonist Atty. Docket No. GH0154WO / 01228-0041-00PCT antibody or monoclonal antibody is an anti-ICOS, anti-CD137, anti-OX40, or anti-CD27 antibody.
- the agonist antibody or monoclonal antibody is an anti-CD80, anti-CD86, anti-B7RP1, anti-B7-H3, anti-B7-H4, anti-CD137L, anti-OX40L, or anti-CD70 antibody.
- Therapeutic agents such as immunotherapeutic agents can function by helping the immune system destroy cancer cells.
- the therapeutic agents comprise one or more of PARP inhibitors, such as Olaparib (LYNPARZA®), Rucaparib (RUBRACA®), Niraparib (ZEJULA®), and Talazoparib (TALZENNA®). These may be used for treating mutations in BRCA1, BRCA2, ATM, BARD1, BRIP1, CDK12, CHEK1, CHEK2, FANCL, PALB2, RAD51B, RAD51 C, RAD51D and RAD54L alterations, and/or for genes associated Homologous Recombination Repair (HRR). [000463]
- the treatment comprises one or more immunotherapies, immunotherapeutic agents and/or immune checkpoint inhibitors (ICIS).
- Immunotherapies are treatments with one or more agents that act to stimulate the immune system so as to kill or at least to inhibit growth of cancer cells, and preferably to reduce further growth of the cancer, reduce the size of the cancer and/or eliminate the cancer.
- Some such agents bind to a target present on cancer cells; some bind to a target present on immune cells and not on cancer cells; some bind to a target present on both cancer cells and immune cells.
- Such agents include, but are not limited to, checkpoint inhibitors and/or antibodies.
- Checkpoint inhibitors are inhibitors of pathways of the immune system that maintain self-tolerance and modulate the duration and amplitude of physiological immune responses in peripheral tissues to minimize collateral tissue damage (see, e.g., Pardoll, Nature Reviews Cancer 12, 252-264 (2012)).
- Exemplary agents include antibodies against any of PD-1, PD-2, PD-L1, PD-L2, CTLA-4, OX40, B7.1, B7He, LAG3, CD137, KIR, CCR5, CD27, CD40, or CD47.
- Other exemplary agents include Atty. Docket No. GH0154WO / 01228-0041-00PCT proinflammatory cytokines, such as IL-1 ⁇ , IL-6, and TNF- ⁇ .
- Other exemplary agents are T-cells activated against a tumor, such as T-cells activated by expressing a chimeric antigen targeting a tumor antigen recognized by the T-cell.
- anti-PD-1 or anti-PD- L1 therapies comprise pembrolizumab (KEYTRUDA®), nivolumab (OPDIVO®), and cemiplimab (LIBTAYO®), atezolizumab (TECENTRIQ®), durvalumab (INFINZI®), and avelumab (BAVENCIO®). These therapies may be used to treat patients identified as having high microsatellite instability (MSI) status or high tumor mutational burden (TMB).
- a therapeutic agent targets a mutated form of the EGFR protein.
- Therapeutic agents can include osimertinib (TAGRISSO®), erlotinib (TARCEVA®), and gefinitib (IRESSA®).
- Therapeutic agents can include one or more targeted therapeutic agents, including any one or more of abemaciclib (VERZENIO®), abiraterone acetate (ZYTIGA®), acalabrutinib (CALQUENCE®), adagrasib (KRAZATI®), ado-trastuzumab emtansine (KADCYLA®), afatinib dimaleate (GILOTRIF®), alectinib (ALCENSA®), alemtuzumab (CAMPATH®), alitretinoin (PANRETIN®), alpelisib (PIQRAY®), amivantamab- vmjw (RYBREVANT®), anastrozole (ARIMIDEX®), apalutamide (ERLEADA®), asc
- encorafenib (BRAFTOVI®), enfortumab vedotin-ejfv (PADCEV®), entrectinib (ROZLYTREK®), enzalutamide (XTANDI®), erdafitinib (BALVERSA®), erlotinib hydrochloride (TARCEVA®), everolimus (AFINITOR®), exemestane (AROMASIN®), fam- trastuzumab deruxtecan-nxki (ENHERTU®), fedratinib hydrochloride (INREBIC®), fulvestrant (FASLODEX®), futibatinib (LYTGOBI®), gefitinib (IRESSA®), gemtuzumab ozogamicin (MYLOTARG®), gilteritinib fumarate (XOSPATA®), glasdegi
- BRAFTOVI® enfortumab vedotin
- Table 6 provides an exemplary list of drugs used to treat cancers with mutations observed in target genes associated with certain cancer types.
- the subject has a cancer of a type listed in Table 6 including a mutation in one or more target genes listed in Table 6 for that cancer type, and the therapy administered to the subject comprises the drug listed in Table 6 for that cancer type and mutation.
- Table 6 Exemplary drugs Cancer type Drug Target genes Breast abemaciclib (VERZENIO®) CDK4, CDK6 + Atty. Docket No.
- GH0154WO / 01228-0041-00PCT prostate apalutamide ERLEADA®
- SCEMBLIX® AR leukemia asciminib hydrochloride
- Bcr-Abl bladder atezolizumab TECENTRIQ®
- PDL1 2 Atty Docket No. GH0154WO / 01228-0041-00PCT liver and bile cabozantinib-s-malate (CABOMETYX®)
- TRUSELTIQ® bile infigratinib phosphate
- GH0154WO / 01228-0041-00PCT stomach ramucirumab CYRAMZA®
- VEGFR2 gastric
- colorectal regorafenib STIVARGA®
- GH0154WO / 01228-0041-00PCT skin vemurafenib ZELBORAF® BRAF, BRAF V600E, CRAF, ARAF, SRMS, ACK1, MAP4K5, FGR leukemia venetoclax (VENCLEXTA®) BCL-2 ss y (i) detecting one or more mutations in the one or more target genes listed in Table 6; and (ii) administering the corresponding one or more drugs listed in Table 6.
- these therapies may be used alone or in combination with other therapies to treat a disease.
- the methods and systems disclosed herein may be used to identify customized or targeted therapies to treat a given disease or condition in patients based on the classification of a nucleic acid variant as being of somatic or germline origin.
- the status of a nucleic acid variant from a sample from a subject as being of somatic or germline origin may be compared with a database of comparator results from a reference population to identify customized or targeted therapies for that subject.
- the reference population includes patients with the same cancer or disease type as the subject and/or patients who are receiving, or who have received, the same therapy as the subject.
- a customized or targeted therapy may be identified when the nucleic variant and the comparator results satisfy certain classification criteria (e.g., are a substantial or an approximate match).
- the customized therapies described herein are typically administered parenterally (e.g., intravenously or subcutaneously).
- Pharmaceutical compositions containing an immunotherapeutic agent are typically administered intravenously.
- Certain therapeutic agents are administered orally.
- customized therapies may also be administered by any method known in the art, for example, buccal, sublingual, rectal, vaginal, intraurethral, topical, intraocular, intranasal, and/or intraauricular, which administration may include tablets, capsules, granules, aqueous suspensions, gels, sprays, suppositories, salves, ointments, or the like.
- therapeutic options for treating specific genetic-based diseases, disorders, or conditions, other than cancer are generally well-known to those of ordinary skill in the art and will be apparent given the particular disease, disorder, or condition under consideration. Atty. Docket No.
- kits for use in the methods as described herein comprises a modified Y-shaped adapter as disclosed herein.
- a kit comprises a first reagent for end repair to generate end-repaired DNA, wherein the first reagent comprises at least one type of dNTP that comprises a modified base.
- the kit further comprises a second reagent for ligating adapters (such as a modified Y-shaped adapter as disclosed herein) to the end-repaired DNA to generate adapted DNA, wherein the second reagent may also seal nicks present in the end-repaired DNA.
- the kit further comprises a third reagent for modification-sensitive sequencing that is capable of identifying the base modification in the at least one type of dNTP.
- the kit may comprise the first, second, and/or third reagents and additional elements as discussed below and/or elsewhere herein.
- a kit comprises instructions for performing a method described herein.
- Kits may further comprise a plurality of oligonucleotide probes that selectively hybridize to least 5, 6, 7, 8, 9, 10, 20, 30, 40 or all genes selected from the group consisting of ALK, APC, BRAF, CDKN2A, EGFR, ERBB2, FBXW7, KRAS, MYC, NOTCH1, NRAS, PIK3CA, PTEN, RBI, TP53, MET, AR, ABLl, AKTl, ATM, CDHl, CSFIR, CTNNBl, ERBB4, EZH2, FGFRl, FGFR2, FGFR3, FLT3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, IDH2, JAK2, JAK3, KDR, KIT, MLH1, MPL, NPM1, PDGFRA, PROC, PTPN11, RET,SMAD4, SMARCB1, SMO, SRC, STK11, VHL, TERT, CCND1, CDK
- ALK
- the number genes to which the oligonucleotide probes can selectively hybridize can vary.
- the number of genes can comprise 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54.
- the kit can include a container that includes the plurality of oligonucleotide probes and instructions for performing any of the methods described herein. [000473]
- the oligonucleotide probes can selectively hybridize to exon regions of the genes, e.g., of the at least 5 genes.
- the oligonucleotide probes can selectively hybridize to at least 30 exons of the genes, e.g., of the at least 5 genes. In some cases, the multiple probes can selectively hybridize to each of the at least 30 exons. The probes that hybridize to each exon can Atty. Docket No. GH0154WO / 01228-0041-00PCT have sequences that overlap with at least 1 other probe. In some embodiments, the oligoprobes can selectively hybridize to non-coding regions of genes disclosed herein, for example, intronic regions of the genes. The oligoprobes can also selectively hybridize to regions of genes comprising both exonic and intronic regions of the genes disclosed herein.
- any number of exons can be targeted by the oligonucleotide probes. For example, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 400, 500, 600, 700, 800, 900, 1,000, or more, exons can be targeted.
- the kit can comprise at least 4, 5, 6, 7, or 8 different library adaptors having distinct molecular barcodes and identical sample barcodes.
- the library adaptors may not be sequencing adaptors.
- the library adaptors do not include flow cell sequences or sequences that permit the formation of hairpin loops for sequencing.
- the different variations and combinations of molecular barcodes and sample barcodes are described throughout, and are applicable to the kit.
- the adaptors are not sequencing adaptors.
- the adaptors provided with the kit can also comprise sequencing adaptors.
- a sequencing adaptor can comprise a sequence hybridizing to one or more sequencing primers.
- a sequencing adaptor can further comprise a sequence hybridizing to a solid support, e.g., a flow cell sequence.
- a sequencing adaptor can be a flow cell adaptor.
- the sequencing adaptors can be attached to one or both ends of a polynucleotide fragment.
- the kit can comprise at least 8 different library adaptors having distinct molecular barcodes and identical sample barcodes.
- the library adaptors may not be sequencing adaptors.
- the kit can further include a sequencing adaptor having a first sequence that selectively hybridizes to the library adaptors and a second sequence that selectively hybridizes to a flow cell sequence.
- a sequencing adaptor can be hairpin shaped.
- the hairpin shaped adaptor can comprise a complementary double stranded portion and a loop portion, where the double stranded portion can be attached (e.g., ligated) to a double-stranded polynucleotide.
- Hairpin shaped sequencing adaptors can be attached to both ends of a polynucleotide fragment to generate a circular molecule, which can be sequenced multiple times.
- a sequencing adaptor can be up to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, Atty. Docket No.
- the sequencing adaptor can comprise 20-30, 20- 40, 30-50, 30-60, 40-60, 40-70, 50-60, 50-70, bases from end to end. In a particular example, the sequencing adaptor can comprise 20-30 bases from end to end. In another example, the sequencing adaptor can comprise 50-60 bases from end to end.
- the DNA is sequenced using NGS (sequencing by synthesis) and one or more base calling metrics (such as clusters passing filter, phasing/pre-phasing, and/or color matrix corrections values) is calibrated based at least in part on data from the resistant region of the Y- shaped adapter ligated to the DNA molecule.
- base calling metrics such as clusters passing filter, phasing/pre-phasing, and/or color matrix corrections values
- At least a portion of nucleobases in a converted region of the DNA molecule are called using the one or more calibrated base calling metrics, resulting in higher quality and higher yield sequencing results as compared to a sequencing by synthesis reaction performed using the same DNA but with current/standard NGS Y-shaped adapters (such as Illumina NGS adapters).
- Example 2 Methylation detection in a dsDNA library preparation-based SSM workflow using end repair as disclosed herein [000479]
- a set of samples from healthy subjects and subjects with early-stage colorectal cancer, a type of early-stage cancer (such as lung, breast, prostate, or bladder cancer), or a type of late- stage cancer (such as lung, breast, prostate, or bladder cancer) are analyzed by a blood-based assay to detect differentially methylated regions associated with colorectal cancer.
- DNA is isolated from a buffy coat sample from the subject and is fragmented.
- the DNA is subjected to end repair using a modified dNTP mix comprising modified cytosines, such as d5mCTP, rather than dCTP, and an A-tailing reaction.
- the modified cytosines of the dNTP mix are incorporated into the DNA molecule during end repair and are resistant to a conversion procedure.
- the DNA molecules are subjected to bisulfite sequencing (or, in other examples, DM-seq or EM-seq), wherein unmodified cytosines are converted to uracils.
- the modified bases incorporated into the DNA molecule during end repair are not converted.
- the DNA is sequenced using NGS (sequencing by synthesis) and one or more base calling metrics (such as clusters passing filter, phasing/pre-phasing, and/or color matrix corrections values) is calibrated based at least in part on data from the resistant region of the DNA molecule.
- the high complexity of the resistant region allows for high quality and high yield sequencing because it is at or near the start of read 1 and is Atty. Docket No. GH0154WO / 01228-0041-00PCT therefore used for cluster quality control (e.g., for the first approximately 25 cycles). At least a portion of nucleobases in a converted region of the DNA molecule are called using the one or more calibrated base calling metrics, resulting in higher quality and higher yield sequencing results as compared to a sequencing by synthesis reaction performed using the same DNA but without the use of modified cytosines in the end repair (and without the high complexity region of the end repair ‘scar’ being sequenced at the beginning of read 1).
- Example 3 Methylation detection in a dsDNA library preparation-based SSM workflow using Y-shaped adapters and end repair as disclosed herein
- the workflow described in this example is illustrated in FIG.1B.
- An exemplary standard workflow is illustrated in FIG.1A for comparison purposes.
- a set of samples from healthy subjects and subjects with early-stage colorectal cancer, a type of early-stage cancer (such as lung, breast, prostate, or bladder cancer), or a type of late- stage cancer (such as lung, breast, prostate, or bladder cancer) are analyzed by a blood-based assay to detect differentially methylated regions associated with colorectal cancer.
- DNA is isolated from a buffy coat sample from the subject and is fragmented.
- the DNA is subjected to end repair using a modified dNTP mix comprising modified cytosines, such as d5mCTP, rather than dCTP, and an A-tailing reaction.
- modified cytosines of the dNTP mix are incorporated into the DNA molecule during end repair and are resistant to a conversion procedure.
- Modified Y-shaped adapters (as disclosed herein, such as shown in Figure 3, left side) comprising molecular tags are ligated to the DNA such that a sequencing read 1 begins from the 3’ end of an adapter-ligated DNA molecule.
- the Y-shaped adapters comprise modified cytosines (such as methylated cytosines) that are resistant to deamination.
- the DNA molecules are subjected to bisulfite sequencing (or, in other examples, DM- seq or EM-seq), wherein unmodified cytosines are converted to uracils.
- the DNA is sequenced using NGS (sequencing by synthesis) and one or more base calling metrics (such as clusters passing filter, phasing/pre-phasing, and/or color matrix corrections values) is calibrated based at least in part on data from the resistant region of the DNA molecule.
- the high complexity of the resistant region allows for high quality and high yield sequencing because it is at or near the start of read 1 and is therefore used for cluster quality control (e.g., for the first approximately 25 cycles). At least a portion of nucleobases in a converted region of the DNA molecule are called using the one or more calibrated base calling Atty. Docket No.
- GH0154WO / 01228-0041-00PCT metrics resulting in higher quality and higher yield sequencing results as compared to a sequencing by synthesis reaction performed using the same DNA but without the use of modified cytosines in the end repair (and without the high complexity region of the end repair ‘scar’ being sequenced at the beginning of read 1), and/or with current/standard NGS Y-shaped adapters (such as Illumina NGS adapters).
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des procédés et des compositions pour étalonner une ou plusieurs mesures d'appel de base dans un séquençage par réaction de synthèse et pour appeler au moins une partie de nucléobases dans une région convertie d'une molécule d'ADN à l'aide de la ou des mesures d'appel de base étalonnées. La présente invention concerne également des procédés permettant de déterminer la probabilité qu'un sujet soit atteint d'une maladie ou d'une affection, telle que le cancer.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363613592P | 2023-12-21 | 2023-12-21 | |
| US63/613,592 | 2023-12-21 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025137620A1 true WO2025137620A1 (fr) | 2025-06-26 |
Family
ID=94386158
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/061535 Pending WO2025137620A1 (fr) | 2023-12-21 | 2024-12-20 | Procédés de séquençage de méthylation de haute qualité et de haute précision |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025137620A1 (fr) |
Citations (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20010053519A1 (en) | 1990-12-06 | 2001-12-20 | Fodor Stephen P.A. | Oligonucleotides |
| US20030152490A1 (en) | 1994-02-10 | 2003-08-14 | Mark Trulson | Method and apparatus for imaging a sample on a device |
| US7537898B2 (en) | 2001-11-28 | 2009-05-26 | Applied Biosystems, Llc | Compositions and methods of selective nucleic acid isolation |
| US20110160078A1 (en) | 2009-12-15 | 2011-06-30 | Affymetrix, Inc. | Digital Counting of Individual Molecules by Stochastic Attachment of Diverse Labels |
| US8486630B2 (en) | 2008-11-07 | 2013-07-16 | Industrial Technology Research Institute | Methods for accurate sequence data and modified base position determination |
| US9598731B2 (en) | 2012-09-04 | 2017-03-21 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
| US9738894B2 (en) | 2003-03-21 | 2017-08-22 | Roche Innovation Center Copenhagen A/S | Short interfering RNA (siRNA) analogues |
| US9850523B1 (en) | 2016-09-30 | 2017-12-26 | Guardant Health, Inc. | Methods for multi-resolution analysis of cell-free nucleic acids |
| US9902992B2 (en) | 2012-09-04 | 2018-02-27 | Guardant Helath, Inc. | Systems and methods to detect rare mutations and copy number variation |
| WO2018119452A2 (fr) | 2016-12-22 | 2018-06-28 | Guardant Health, Inc. | Procédés et systèmes pour analyser des molécules d'acide nucléique |
| US10260088B2 (en) | 2015-10-30 | 2019-04-16 | New England Biolabs, Inc. | Compositions and methods for analyzing modified nucleotides |
| WO2020160414A1 (fr) | 2019-01-31 | 2020-08-06 | Guardant Health, Inc. | Compositions et méthodes pour isoler de l'adn acellulaire |
| US10961525B2 (en) | 2017-07-05 | 2021-03-30 | The Trustees Of The University Of Pennsylvania | Hyperactive AID/APOBEC and hmC dominant TET enzymes |
| WO2021236778A2 (fr) | 2020-05-19 | 2021-11-25 | The Trustees Of The University Of Pennsylvania | Compositions et procédés pour la carboxyméthylation de la cytosine d'adn |
| WO2022197593A1 (fr) | 2021-03-15 | 2022-09-22 | Illumina, Inc. | Détection de méthylcytosine et de ses dérivés à l'aide d'analogues de s-adénosyl-l-méthionine (xsams) |
| WO2023288222A1 (fr) | 2021-07-12 | 2023-01-19 | The Trustees Of The University Of Pennsylvania | Adaptateurs modifiés pour désamination enzymatique d'adn et leurs procédés d'utilisation pour le séquençage épigénétique d'adn libre et immobilisé |
| WO2023003851A1 (fr) * | 2021-07-20 | 2023-01-26 | Freenome Holdings, Inc. | Compositions et procédés pour une résolution améliorée de la cytosine 5-hydroxyméthylée dans le séquençage d'acides nucléiques |
| CN116287166A (zh) * | 2023-04-19 | 2023-06-23 | 纳昂达(南京)生物科技有限公司 | 甲基化测序接头及其应用 |
| WO2024159053A1 (fr) * | 2023-01-25 | 2024-08-02 | Guardant Health, Inc. | Procédé pour établir le profil de méthylation d'acides nucléiques |
| WO2024229143A1 (fr) * | 2023-05-01 | 2024-11-07 | Guardant Health, Inc. | Procédé de contrôle qualité pour les procédures de conversion enzymatique |
| WO2024238523A2 (fr) * | 2023-05-15 | 2024-11-21 | Foundation Medicine, Inc. | Adaptateurs de séquençage pour séquençage de méthylation |
| WO2024235696A1 (fr) * | 2023-05-12 | 2024-11-21 | F. Hoffmann-La Roche Ag | Conversion enzymatique d'acides nucléiques méthylés pour le séquençage |
-
2024
- 2024-12-20 WO PCT/US2024/061535 patent/WO2025137620A1/fr active Pending
Patent Citations (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6582908B2 (en) | 1990-12-06 | 2003-06-24 | Affymetrix, Inc. | Oligonucleotides |
| US20010053519A1 (en) | 1990-12-06 | 2001-12-20 | Fodor Stephen P.A. | Oligonucleotides |
| US20030152490A1 (en) | 1994-02-10 | 2003-08-14 | Mark Trulson | Method and apparatus for imaging a sample on a device |
| US7537898B2 (en) | 2001-11-28 | 2009-05-26 | Applied Biosystems, Llc | Compositions and methods of selective nucleic acid isolation |
| US9738894B2 (en) | 2003-03-21 | 2017-08-22 | Roche Innovation Center Copenhagen A/S | Short interfering RNA (siRNA) analogues |
| US8486630B2 (en) | 2008-11-07 | 2013-07-16 | Industrial Technology Research Institute | Methods for accurate sequence data and modified base position determination |
| US20110160078A1 (en) | 2009-12-15 | 2011-06-30 | Affymetrix, Inc. | Digital Counting of Individual Molecules by Stochastic Attachment of Diverse Labels |
| US9598731B2 (en) | 2012-09-04 | 2017-03-21 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
| US9902992B2 (en) | 2012-09-04 | 2018-02-27 | Guardant Helath, Inc. | Systems and methods to detect rare mutations and copy number variation |
| US10260088B2 (en) | 2015-10-30 | 2019-04-16 | New England Biolabs, Inc. | Compositions and methods for analyzing modified nucleotides |
| US9850523B1 (en) | 2016-09-30 | 2017-12-26 | Guardant Health, Inc. | Methods for multi-resolution analysis of cell-free nucleic acids |
| WO2018119452A2 (fr) | 2016-12-22 | 2018-06-28 | Guardant Health, Inc. | Procédés et systèmes pour analyser des molécules d'acide nucléique |
| US10961525B2 (en) | 2017-07-05 | 2021-03-30 | The Trustees Of The University Of Pennsylvania | Hyperactive AID/APOBEC and hmC dominant TET enzymes |
| WO2020160414A1 (fr) | 2019-01-31 | 2020-08-06 | Guardant Health, Inc. | Compositions et méthodes pour isoler de l'adn acellulaire |
| WO2021236778A2 (fr) | 2020-05-19 | 2021-11-25 | The Trustees Of The University Of Pennsylvania | Compositions et procédés pour la carboxyméthylation de la cytosine d'adn |
| WO2022197593A1 (fr) | 2021-03-15 | 2022-09-22 | Illumina, Inc. | Détection de méthylcytosine et de ses dérivés à l'aide d'analogues de s-adénosyl-l-méthionine (xsams) |
| WO2023288222A1 (fr) | 2021-07-12 | 2023-01-19 | The Trustees Of The University Of Pennsylvania | Adaptateurs modifiés pour désamination enzymatique d'adn et leurs procédés d'utilisation pour le séquençage épigénétique d'adn libre et immobilisé |
| WO2023003851A1 (fr) * | 2021-07-20 | 2023-01-26 | Freenome Holdings, Inc. | Compositions et procédés pour une résolution améliorée de la cytosine 5-hydroxyméthylée dans le séquençage d'acides nucléiques |
| WO2024159053A1 (fr) * | 2023-01-25 | 2024-08-02 | Guardant Health, Inc. | Procédé pour établir le profil de méthylation d'acides nucléiques |
| CN116287166A (zh) * | 2023-04-19 | 2023-06-23 | 纳昂达(南京)生物科技有限公司 | 甲基化测序接头及其应用 |
| WO2024229143A1 (fr) * | 2023-05-01 | 2024-11-07 | Guardant Health, Inc. | Procédé de contrôle qualité pour les procédures de conversion enzymatique |
| WO2024235696A1 (fr) * | 2023-05-12 | 2024-11-21 | F. Hoffmann-La Roche Ag | Conversion enzymatique d'acides nucléiques méthylés pour le séquençage |
| WO2024238523A2 (fr) * | 2023-05-15 | 2024-11-21 | Foundation Medicine, Inc. | Adaptateurs de séquençage pour séquençage de méthylation |
Non-Patent Citations (43)
| Title |
|---|
| "MethBank3.0: a database of DNA methylomes across a variety of species", NUCLEIC ACIDS RES, 2018 |
| BOCK ET AL., NAT BIOTECH, vol. 28, 2010, pages 1106 - 1114 |
| BOOTH ET AL., SCIENCE, vol. 336, 2012, pages 934 - 937 |
| BROWN: "Genomes", 2002, ED., JOHN WILEY & SONS, INC., article "Mutation, Repair, and Recombination." |
| CORONEL: "Database Systems: Design, Implementation, & Management", 2014, CENGAGE LEARNING |
| DIETARY GUIDELINES FOR AMERICANS, 2020 |
| ELMASRI: "Fundamentals of Database Systems", 2010, ADDISON WESLEY |
| FREIER ET AL., NUCLEIC ACIDS RES., vol. 25, 1997, pages 4429 - 4443 |
| GAJULA ET AL., NUCLEIC ACIDS RES., vol. 42, no. 15, September 2014 (2014-09-01), pages 9964 - 75 |
| GALE ET AL., PLOS ONE, vol. 13, 2018, pages e0194630 |
| GANSAUGE ET AL., NATURE PROTOCOLS, vol. 8, 2013, pages 737 - 748 |
| GREER ET AL., CELL, vol. 161, 2015, pages 868 - 878 |
| HAN ET AL., MOL. CELL, vol. 63, 2016, pages 711 - 719 |
| IURLARO ET AL., GENOME BIOL., vol. 14, 2013, pages R119 |
| JANG ET AL., GENES (BASEL)., vol. 8, no. 6, June 2017 (2017-06-01), pages 148 |
| KINDE ET AL., PROC NAT'L ACAD SCI USA, vol. 108, 2011, pages 9530 - 9535 |
| KOU ET AL., PLOS ONE, vol. 11, 2016, pages e0146638 |
| KUMAR ET AL., FRONTIERS GENET., vol. 9, 2018, pages 640 |
| KUROSE: "Computer Networking: A Top-Down Approach", 2016, PEARSON |
| LIU ET AL., NAT CHEM BIOL, vol. 13, 2017, pages 181 - 187 |
| LOZANO ET AL., NATURE MEDICINE, vol. 28, 2022, pages 353 - 362 |
| MOSS ET AL., NAT COMMUN., vol. 9, 2018, pages 5068 |
| NABET ET AL., CELL, vol. 183, 2020, pages 363 - 376 |
| PARDOLL, NATURE REVIEWS CANCER, vol. 12, 2012, pages 252 - 264 |
| PETERSON: "Cloud Computing Architected: Solution Design Handbook", 2011, RECURSIVE PRESS |
| SAMBROOK ET AL.: "Molecular Cloning, A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS |
| SCHUTSKY ET AL., NATURE BIOTECHNOLOGY, vol. 36, 2018, pages 1083 - 1090 |
| SCHUTSKY ET AL., NUCLEIC ACIDS RES., vol. 45, no. 13, 27 July 2017 (2017-07-27), pages 7655 - 7665 |
| SCOTT, C.A.DURYEA, J.D.MACKAY, H. ET AL.: "Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data", GENOME BIOL, vol. 21, 2020, pages 156 |
| SEVERIN ET AL., NUCLEIC ACIDS RES., vol. 39, 2011, pages 8740 - 8751 |
| SONG ET AL., NAT BIOTECH, vol. 29, 2011, pages 68 - 72 |
| SUN ET AL., BIOESSAYS, vol. 37, 2015, pages 1155 - 62 |
| TROLL ET AL., BMC GENOMICS, vol. 20, 2019, pages 1023 |
| TUCKER: "Programming Languages", 2006, MCGRAW-HILL SCIENCE/ENGINEERING/MATH |
| VAISVILA ET AL., MOL CELL., vol. 84, no. 5, 7 March 2024 (2024-03-07), pages 854 - 866 |
| VAISVILA ET AL.: "Discovery of novel DNA cytosine deaminase activities enables a nondestructive single-enzyme methylation sequencing method for base resolution high-coverage methylome mapping of cell-free and ultra-low input DNA", BIORXIV, 2023 |
| VAISVILA R ET AL.: "EM-seq: Detection of DNA methylation at single base resolution from picograms of DNA", BIORXIV, 2019 |
| WANG TONG ET AL: "Direct enzymatic sequencing of 5-methylcytosine at single-base resolution", NATURE CHEMICAL BIOLOGY, vol. 19, no. 8, 15 June 2023 (2023-06-15), New York, pages 1004 - 1012, XP093149495, ISSN: 1552-4450, Retrieved from the Internet <URL:https://www.nature.com/articles/s41589-023-01318-1> DOI: 10.1038/s41589-023-01318-1 * |
| WATSON ET AL., SCI. IMMUNOL., vol. 6, 2021, pages eabj8825 |
| WELLSW WU ET AL: "Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 19, no. 1, 4 May 2018 (2018-05-04), pages 1 - 10, XP021255961, DOI: 10.1186/S12864-018-4677-Y * |
| XIAO ET AL., MOLECULAR CELL, vol. 71, 19 July 2018 (2018-07-19), pages 306 - 318 |
| XU CHANG: "A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data", COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, vol. 16, 1 January 2018 (2018-01-01), Sweden, pages 15 - 24, XP055781134, ISSN: 2001-0370, DOI: 10.1016/j.csbj.2018.01.003 * |
| YU ET AL., CELL, vol. 149, 2012, pages 1368 - 80 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240191290A1 (en) | Methods for detection and reduction of sample preparation-induced methylation artifacts | |
| US20250084464A1 (en) | Compositions and methods for synthesis and use of probes targeting nucleic acid rearrangements | |
| US20240263241A1 (en) | Methods and compositions for copy-number informed tissue-of-origin analysis | |
| WO2024137880A2 (fr) | Procédés recourant à une amplification préservant la méthylation avec correction des erreurs | |
| WO2025090956A1 (fr) | Procédés de détection de variants d'acide nucléique à l'aide de sondes de capture | |
| US20250101494A1 (en) | Methods for analyzing cytosine methylation and hydroxymethylation | |
| US20240093292A1 (en) | Quality control method | |
| WO2025029475A1 (fr) | Procédés d'enrichissement de variants nucléotidiques par sélection négative | |
| WO2024229143A1 (fr) | Procédé de contrôle qualité pour les procédures de conversion enzymatique | |
| EP4655416A1 (fr) | Procédé pour établir le profil de méthylation d'acides nucléiques | |
| WO2025137620A1 (fr) | Procédés de séquençage de méthylation de haute qualité et de haute précision | |
| WO2025038399A1 (fr) | Procédés d'enrichissement méthylé pour séquençage génétique et épigénétique à molécule unique | |
| WO2025207925A1 (fr) | Procédés d'enrichissement par méthylation par l'utilisation d'une ligature préférentielle d'adaptateurs | |
| WO2025160433A1 (fr) | Procédés d'analyse de lectures de séquençage | |
| WO2025090954A1 (fr) | Procédé de détection de variants d'acide nucléique | |
| EP4659248A1 (fr) | Surveillance non invasive d'altérations génomiques induites par des thérapies d'édition génique | |
| JP2025542260A (ja) | エラー補正を伴うメチル化保存増幅を含む方法 | |
| WO2025155895A1 (fr) | Procédé de profilage de modification d'acide nucléique | |
| WO2025235889A1 (fr) | Procédés impliquant une pcr groupée multiplexée | |
| JP2025542261A (ja) | 統合された対象および全ゲノム体細胞およびdnaメチル化シーケンシングワークフロー | |
| EP4638782A2 (fr) | Flux de travail ciblés et intégrés de séquençage de génome somatique entier et de méthylation d'adn | |
| WO2024229433A1 (fr) | Procédés d'analyse de la méthylation de l'adn | |
| EP4594522A2 (fr) | Procédés et compositions de quantification d'adn de cellules immunitaires | |
| WO2025250656A1 (fr) | Modèle de classification d'apprentissage automatique pour la détection de cancer | |
| US20250084469A1 (en) | Methods for analyzing nucleic acids using sequence read family size distribution |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24847216 Country of ref document: EP Kind code of ref document: A1 |