[go: up one dir, main page]

WO2025072326A1 - Methods, systems, and compositions for cfdna analysis - Google Patents

Methods, systems, and compositions for cfdna analysis Download PDF

Info

Publication number
WO2025072326A1
WO2025072326A1 PCT/US2024/048405 US2024048405W WO2025072326A1 WO 2025072326 A1 WO2025072326 A1 WO 2025072326A1 US 2024048405 W US2024048405 W US 2024048405W WO 2025072326 A1 WO2025072326 A1 WO 2025072326A1
Authority
WO
WIPO (PCT)
Prior art keywords
instances
nucleotides
nucleic acids
polymerase
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/048405
Other languages
French (fr)
Inventor
Swetha Devi VELIVELA
Jon S. ZAWISTOWSKI
Charles GAWAD
Jeff G. BLACKINTON
Emily WISEMAN
Tatiana V. MOROZOVA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bioskryb Genomics Inc
Original Assignee
Bioskryb Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bioskryb Genomics Inc filed Critical Bioskryb Genomics Inc
Publication of WO2025072326A1 publication Critical patent/WO2025072326A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions

Definitions

  • methods of amplification comprising: (a) ligating a plurality of sample nucleic acids to form concatemers; (b) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides,
  • sample nucleic acids comprise the sample nucleic acids comprise cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA). Further provided herein are methods wherein the sample nucleic acids are no more than 300 bases in length. Further provided herein are methods wherein the sample nucleic acids are no more than 200 bases in length. Further provided herein are methods wherein the sample nucleic acids comprise cfDNA. Further provided herein are methods wherein the sample nucleic acids comprise ctDNA. Further provided herein are methods wherein the sample nucleic acids are obtained from an Further provided herein are methods wherein sample. Further provided herein are methods wherein the sample nucleic acids are obtained from spent culture media.
  • cfDNA cell-free DNA
  • ctDNA circulating tumor DNA
  • sample nucleic acids are obtained from spent culture media of an embryo.
  • ligating comprises phosphorylation of sample nucleic acid ends followed by contact with a ligase.
  • the ligase comprises T4 ligase.
  • nucleotides with modification to the alpha group are alpha-thio dideoxynucleotides.
  • the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxyribose.
  • the at least one terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof.
  • the at least one polymerase comprises a B-type polymerase.
  • the at least one polymerase comprises Phi29 polymerase. Further provided herein are methods further comprising polishing an end of a nucleic acid from the plurality of sample nucleic acids prior to (a). Further provided herein are methods wherein polishing comprises using an end-repair and A-tailing (ERAT) cocktail. Further provided herein are methods wherein the ERAT cocktail comprises one or more polymerases, a kinase, or both. Further provided herein are methods wherein the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase.
  • amplification comprising: (a) obtaining a sample comprising nucleic acids from spent culture media (SCM); (b) ligating the nucleic acids to form concatemers; (c) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5 ’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleot
  • SCM is obtained by culturing one or more embryonic cells for about 5 to 7 days. Further provided herein are methods further comprising sequencing a cDNA library comprising polynucleotides amplified from mRNA transcripts the SCM. Further provided herein are methods wherein sequencing the cDNA library comprises reverse transcription. Further provided herein are methods wherein the mRNA transcripts are amplified via template-switching reverse transcription. Further provided herein are methods wherein (b) comprises mixing an end-repair and A-tailing (ERAT) cocktail and a ligation buffer.
  • EERT end-repair and A-tailing
  • (b) comprises polishing one or more ends of the nucleic acids using an end-repair and A-tailing (ERAT) cocktail and exposing the nucleic acids to a ligation buffer.
  • ERAT cocktail comprises one or more polymerases, a kinase, or both.
  • the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase.
  • (b) comprises generating circular molecules using hairpin adapters.
  • the hairpin adapter further comprises a linker.
  • FIGS. 1A-1B DNA yields after ResolveDNA (Bioskryb Genomics) PTA (primary -template mediated amplification) of short cfDNA size proxy fragments (lOObp- lOOObp). There is no detectable amplification on fragments below 200-300bp and very poor amplification below fragments 500bp.
  • FIG. IB Agilent high sensitivity D 5000 screen tape analysis of individual fragments before and after Resolve DNA amplification.
  • FIG. 3. DNA yields after Resolve DNA amplification of short cfDNA size proxy fragments (lOObp-lOOObp) before and after phosphorylation by PNK followed by ligation with T4DNA kinase. There is no detectable amplification of small-size fragments before ligation After ligation of small-size fragments, DNA yields after amplification were restored to the expected amounts for the ResolveDNA kit.
  • FIG. 4 ResolveDNA amplified DNA fragment size before and after phosphorylation by PNK followed by ligation with T4DNA ligase when run on Agilent high sensitivity D 1000 screen tape. Size of the amplified DNA after ligation is as expected for ResolveDNA kit.
  • FIGS. 5A-5B Representative IGV snapshot for cfOnco std (from Horizon): Coverage at BRCA2 (FIG. 5A) & MSH3 (FIG. 5B) genes on chr 13 & 5 respectively upon lowpass sequencing. The ligated sample covered the genes at some of the reported regions (by Horizon) whereas the non-ligated sample did not. In (FIG. 5B) the ligated sample covered a region of MSH3 gene (GRCh38: chr5: 80873118- 80873118) where GCA to ACA variation (A1045T) occurred. The unligated sample did not show any coverage in this region. This variant has been reported in Horizon exome data for this sample (120x coverage using Agilent SureSelect Human All Exon V6 kit and Illumina sequencing data).
  • FIGS. 6A-6B IGV snapshots of ResolveDNA coverage at FLT3-130bp (FIG. 6A) and FLT3-207bp (FIG. 6B) synthetic fragments: Lowpass sequencing data of the ligated sample (bottom) showed more uniform coverage of the targeted regions of FLT3-130 and FLT3-207 relative to the unligated sample (top).
  • FIGS. 7A-7B IGV views of ResolveDNA coverage of a sample containing a 1:1 mixture of FLT3- 130bp and FLT3-207bp synthetic fragments: Lowpass sequencing data showed that the ligated sample (bottom panels) of both FLT3-130 (FIG. 7A) and FLT3-207 (FIG. 7B) harbored more uniform coverage at both regions relative to the unligated control (top panels).
  • FIGS. 8A-8B IGV snapshots of representative single nucleotide variants (SNV): FIG. 8A: Both unligated & ligated samples, when deep sequenced, showed A1045T variation in MSH3; This is the only variation detected in the unligated sample.
  • FIG. 8B IGV view showing detection db SNP ID:rs2453056 in NOTCH2 in the ligated but not in the in unligated sample. Eighty-four variants (reported by Horizon exome sequencing (150X) for cfDNA onco were only detected in the ligated sample.
  • SNV single nucleotide variants
  • FIGS. 9A-9D Agilent high sensitivity D 5000 screen tape analysis of ligation of a 130 bp synthetic fragment with and without phosphorylation by PNK. Prior phosphorylation by PNK resulted in better ligation by T4 DNA ligase.
  • FIG. 9A depicts an electrophoresis gel of ladder (lane 1), ligated (lane 2), +PNK + ligated (lane 3), and -PNK + ligated (lane 4).
  • FIG. 9B depicts a Bioanalyzer tracer of the FLT3-130bp fragment.
  • FIG. 9C depicts a Bioanalyzer tracer of the FLT3-130bp fragment treated with PNK and ligated with T4DNA ligase.
  • FIG. 9D depicts a Bioanalyzer tracer of the FLT3-130bp fragment treated T4DNA ligase directly.
  • FIG. 10 Cell line proxy system of embryo spent culture media (SCM) ascertainment by PTA, according to some embodiments.
  • the workflow using embryo SCM includes, in some embodiments, retrieval of 5-10 pL of the SCM media, harvested post-culture (5-7 days), ligation, PTA/ResolveOME, followed by PGT-A.
  • the workflow using SCM from a cell line proxy system includes, in some embodiments, 10 mL of SCM, centrifuged and extracellular DNA purified by 1.2X SPRI, which is followed by ligation, PTA/ResolveOME, and copy number analysis.
  • the cell line proxy system can comprise, in some embodiments, MOLM-13 AML cells/NA12878 B lymphoblasts.
  • FIG. 11 Circularization strategy of short DNA fragments to serve as template for PTA, according to some embodiments.
  • the strategy can include using hairpin adapters to generate boundaries for the circular form and by the inclusion of a T-overhang adapter to facilitate efficient concatenation of A-tailed SCM small DNA fragments internal to the hairpin adapters.
  • FIG. 12. SCM ligation workflow options upstream of PTA in ResolveOME multiomic chemistry, according to some embodiments.
  • the SCM ligation modes can include linear concatenation (ligase only) and circularization (hairpin adapter (+ linker) and ligase).
  • the SCM input ligation of the workflow can include ERAT, followed by ligation as two steps, or ERAT + ligation in one step.
  • the ResolveOME section of the workflow can comprise reverse transcription, lysis, PTA, separation of DNA and RNA, followed by PreAmp.
  • FIG. 13 Effect of ligation on PTA yield and on segment score of MAD indication of CNV signal to noise, according to some embodiments.
  • the left plot illustrates PTA yield (ng) for NA12878 (left) and Molm (right) with higher yields shown with ligate (“+ ligase”) than without (“- ligase”).
  • the right plot illustrates the segment score of MAD for NA12878 (left) and Molm (right) with lower scores shown with ligate (“+ ligase”) than without ligase (“- ligase”).
  • FIG. 14 CNV profiles in NA12878 and MOLM-13 cells with alternative linear concatenation ligase workflows prior to PTA, according to some embodiments.
  • the y-axis for the plots include copy numbers with increments of 2 from 0 to 8, while the x-axis is the number of chromosomes from 1 through 22 followed by X and Y.
  • the left plots shows results for NA12878 (with MAD scores of 0.28, 0.29, and 0.39 from top to bottom, respectively) while the left plots show results for MOLM-13 (with MAD scores of 0.56, 0.55, and 0.62 from top to bottom, respectively).
  • the top plots show results from a 2 step 30 minute ligation
  • the middle plots show results from a 2 step 15 minute ligation
  • the bottom plots show results from a 1 step ligation.
  • Results are show for NA12878, and the y-axis for the plots include copy numbers with increments of 2 from 0 to 8, while the x-axis is the number of chromosomes from 1 through 22 followed by X and Y.
  • the top plot illustrates results when the concentration of the hairpin was 0 nM and linker was 0 nM, which had an MAD score of 0.55.
  • the middle and bottom plots illustrate results when the concentration of the hairpin was 400 nM and linker was 400 nM, which had an MAD score of 0.37 and 0.39, respectively.
  • PTA primary template-directed amplification
  • the methods provided herein may generally comprise those that make PTA amenable to short DNA fragments as template.
  • amplification of short templates can be inherently challenging because the value of PTA can stem from the low propensity of short amplicon re-copying and a concomitant focusing of amplification machinery to the primary template.
  • one or more beneficial attributes of PTA for the elucidation of genomic variants can be in principle extended to cell-free (cf) DNA, circulating tumor DNA (ctDNA), or potentially to short DNA fragments damaged by the FFPE process.
  • cf cell-free
  • ctDNA circulating tumor DNA
  • ligation strategies to linearly concatenate short DNA fragments (e.g., ⁇ 300 bp) resulting in PTA template amenability.
  • the systems and methods provided herein demonstrate enhanced sensitivity of single nucleotide variant (SNV) calling upon concatenation.
  • SNV single nucleotide variant
  • CM spent culture media
  • Free nucleic acid found in cell culture media may be often degraded to the size of a single nucleosome footprint (e.g., -150 bp), while human embryonic samples SCM DNA species can be found, in some instances, to approach 400 bp, likely representing multi-nucleosome stretches
  • PTA amenability e.g., like for ctDNA
  • PTA preimplantation genetic testing
  • techniques that are non-invasive to the embryo e.g., a spent culture media aliquot
  • enabling PTA of SCM DNA via ligation in the context of ResolveDNA genomic amplification, or in the case of the genomic amplification arm in DNA and RNA multiomic chemistry (referred to as “ResolveOME”), provides the opportunity to enhance current PGT-A applications.
  • enabling PTA in these context can enhance current PGT-A applications by overcoming the larger input amount requirements for library preparations lacking upstream genomic amplification and/or accommodate inputs on the order of 10s of picograms. Additionally, utilizing PTA for SCM PGT-A can provide the opportunity to improve upon existing methodologies for copy number profiling quality due to the inherent beneficial attributes of PTA.
  • nucleic acid encompasses multi-stranded, as well as single-stranded molecules.
  • nucleic acids are short nucleic acids.
  • short nucleic acids are 10-200, 20-200, 50-200, 75-200, 100-100, 125-200, 150-200, 100-300, 150- 300, or 200-300 nucleotides in length.
  • the nucleic acid strands need not be coextensive (i.e., a double- stranded nucleic acid need not be doublestranded along the entire length of both strands).
  • Nucleic acid templates described herein may be any size depending on the sample (from small cell-free DNA fragments to entire genomes), including but not limited to 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100- 5000 bases, 50-10,000 bases, or 50-2000 bases in length.
  • templates are at least 50, 100, 200, 300, 400, 500, 1000, 2000, 5000, 10,000, 20,000 50,000, 100,000, 200,000, 500,000, 1,000,000 or more than 1,000,000 bases in length.
  • circular templates are 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100-5000 bases, 50-10,000 bases, or 50-2000 bases in length.
  • Circular nucleic acids in some instances comprise at least 1, 2, 3, 4, 5, 7, 10, 12, 15, 20 or more linear nucleic acid fragments which have been joined together.
  • Methods described herein provide for the amplification of nucleic acid acids, such as nucleic acid templates. Methods described herein additionally provide for the generation of isolated and at least partially purified nucleic acids and libraries of nucleic acids.
  • methods described herein provide for extracted nucleic acids (e.g., extracted from tissues, cells, or media). In some embodiments, the nucleic acids are extracted from spent culture media (SCM).
  • Nucleic acids include but are not limited to those comprising DNA, RNA, circular RNA, mtDNA (mitochondrial DNA), cfDNA (cell free DNA), cfRNA (cell free RNA), ctDNA (circulating tumor DNA), siRNA (small interfering RNA), cffDNA (cell free fetal DNA), mRNA, tRNA, rRNA, miRNA (microRNA), extrachromosomal DNAs (ecDNAs), synthetic polynucleotides, polynucleotide analogues, any other nucleic acid consistent with the specification, or any combinations thereof.
  • polynucleotides when provided, are described as the number of bases and abbreviated, such as nt (nucleotides), bp (bases), kb (kilobases), or Gb (gigabases).
  • PTA Primary Template- Directed Amplification
  • amplicons are preferentially generated from the primary template (“direct copies”) using a polymerase (e.g., a strand displacing polymerase). Consequently, errors are propagated at a lower rate from daughter amplicons during subsequent amplifications compared to MDA.
  • this method can amplify low DNA input including the genomes of single cells with high coverage breadth and uniformity in an accurate and reproducible manner.
  • PTA enables kinetic control of an amplification reaction.
  • PTA results in a pseudo-linear amplification reaction (rather than exponential amplification).
  • the terminated amplification products can undergo direction ligation after removal of the terminators, allowing for the attachment of a cell barcode to the amplification primers so that products from all cells can be pooled after undergoing parallel amplification reactions.
  • nucleic acid polymerases with strand displacement activity for amplification.
  • such polymerases comprise strand displacement activity and low error rate.
  • such polymerases comprise strand displacement activity and proofreading exonuclease activity, such as 3 ’->5’ proofreading activity.
  • nucleic acid polymerases are used in conjunction with other components such as reversible or irreversible terminators, or additional strand displacement factors.
  • the polymerase has strand displacement activity, but does not have exonuclease proofreading activity.
  • such polymerases include bacteriophage phi29 (029) polymerase, which also has very low error rate that is the result of the 3’--> 5’ proofreading exonuclease activity (see, e.g., U.S. Pat. Nos. 5,198,543 and 5,001,050).
  • examples of strand displacing nucleic acid polymerases include, e.g., genetically modified phi29 (029) DNA polymerase, KI enow Fragment of DNA polymerase I (Jacobsen et al., Eur. J. Biochem.
  • phage M2 DNA polymerase (Matsumoto et al., Gene 84:247 (1989)), phage phiPRDl DNA polymerase (Jung et al., Proc. Natl. Acad. Sci. USA 84:8287 (1987); Zhu and Ito, Biochim. Biophys. Acta. 1219:267-276 (1994)), Bst DNA polymerase (e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal.
  • Bst DNA polymerase e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal.
  • T7 DNA polymerase T7-Sequenase
  • T7 gp5 DNA polymerase PRDI DNA polymerase
  • T4 DNA polymerase Kaboord and Benkovic, Curr. Biol. 5: 149-157 (1995)
  • Additional strand displacing nucleic acid polymerases are also compatible with the methods described herein.
  • the ability of a given polymerase to carry out strand displacement replication can be determined, for example, by using the polymerase in a strand displacement replication assay (e.g., as disclosed in U.S. Pat. No. 6,977,148).
  • Such assays in some instances are performed at a temperature suitable for optimal activity for the enzyme being used, for example, 32°C for phi29 DNA polymerase, from 46°C to 64°C for exo(-) Bst DNA polymerase, or from about 60°C to 70°C for an enzyme from a hyperthermophylic organism.
  • Another useful assay for selecting a polymerase is the primerblock assay described in Kong et al., J. Biol. Chem. 268:1965-1975 (1993).
  • the assay consists of a primer extension assay using an M13 ssDNA template in the presence or absence of an oligonucleotide that is hybridized upstream of the extending primer to block its progress.
  • polymerases incorporate dNTPs and terminators at approximately equal rates.
  • the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are about 1 : 1, about 1.5:1, about 2: 1, about 3 : 1 about 4:l about 5:l, about 10:1, about 20:1 about 50:1, about 100: 1, about 200: 1, about 500: 1, or about 1000: 1.
  • the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are 1 :1 to 1000: 1, 2:1 to 500: 1, 5: 1 to 100:1, 10:1 to 1000:1, 100: 1 to 1000: 1, 500: 1 to 2000: 1, 50: 1 to 1500: 1, or 25:1 to 1000:1.
  • a polynucleotide mixture used herein for PTA may comprise terminators.
  • terminators comprise ddNTPs.
  • terminators comprise irreversible terminators.
  • irreversible terminators comprise alpha-thio dideoxynucleotides.
  • the concentration of terminators is no more than 1, 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, or no more than 0.001 mM.
  • the concentration of dNTPs is 0.05-1, 0.05-0.5, 0.05-0.3, 0.05-0.25, 0.05-0.2, 0.05-0.15, 0.05-0.1, 0.01-0.5, 0.01-0.3, 0.01-0.3, 0.1-0.3, 0.05-0.25, or 0.1-0.2 mM.
  • strand displacement factors such as, e.g., helicase.
  • additional amplification components such as polymerases, terminators, or other component.
  • a strand displacement factor is used with a polymerase that does not have strand displacement activity.
  • a strand displacement factor is used with a polymerase having strand displacement activity.
  • strand displacement factors may increase the rate that smaller, double stranded amplicons are reprimed.
  • bacterial SSB e.g., A. coli SSB
  • RPA Replication Protein A
  • mtSSB human mitochondrial SSB
  • Recombinases e.g., Recombinase A (RecA) family proteins, T4 UvsX, T4 UvsY, Sak4 of Phage HK620, Rad51, Dmcl, or Radb.
  • RecA Recombinase A family proteins
  • the PTA method comprises use of a singlestrand DNA binding protein (SSB, T4 gp32, or other single stranded DNA binding protein), a helicase, and a polymerase (e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase).
  • a polymerase e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase.
  • reverse transcriptases are used in conjunction with the strand displacement factors described herein.
  • reverse transcriptases are used in conjunction with the strand displacement factors described herein.
  • amplification is conducted using a polymerase and a nicking enzyme (e.g., “NEAR”), such as those described in US 9,617,586.
  • the nicking enzyme is Nt.BspQI, Nb.BbvCi, Nb.BsmI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nt.BbvCI, Nt.BstNBI, Nt.CviPII, Nb.BpulOI, orNt.BpulOI.
  • amplification methods comprising use of terminator nucleotides, polymerases, and additional factors or conditions.
  • factors are used in some instances to fragment the nucleic acid template(s) or amplicons during amplification.
  • factors comprise endonucleases.
  • factors comprise transposases.
  • mechanical shearing is used to fragment nucleic acids during amplification.
  • nucleotides are added during amplification that may be fragmented through the addition of additional proteins or conditions. For example, uracil is incorporated into amplicons; treatment with uracil D-glycosylase fragments nucleic acids at uracil-containing positions.
  • amplification methods comprising use of terminator nucleotides, which terminate nucleic acid replication thus decreasing the size of the amplification products.
  • terminator nucleotides are in some instances used in conjunction with polymerases, strand displacement factors, or other amplification components described herein.
  • terminator nucleotides reduce or lower the efficiency of nucleic acid replication.
  • Such terminators in some instances reduce extension rates by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%.
  • Such terminators reduce extension rates by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%- 80%.
  • terminators reduce the average amplicon product length by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Terminators in some instances reduce the average amplicon length by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%-80%. In some instances, amplicons comprising terminator nucleotides form loops or hairpins which reduce a polymerase’s ability to use such amplicons as templates.
  • terminators slows the rate of amplification at initial amplification sites through the incorporation of terminator nucleotides (e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension), resulting in smaller amplification products.
  • terminator nucleotides e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension
  • PTA amplification products undergo direct ligation of adapters without the need for fragmentation, allowing for efficient incorporation of cell barcodes and unique molecular identifiers (UMI).
  • UMI unique molecular identifiers
  • the ratio of non-terminator to terminator nucleotides is about 2:l, 5: 1, 7:1, 10:1, 20: 1, 50:1, 100:1, 200:1, 500:1, 1000:1, 2000: 1, or 5000: 1. In some instances the ratio of non-terminator to terminator nucleotides is 2:1-10:1, 5: 1-20:1, 10: 1-100:1, 20:1-200:1, 50: 1-1000:1, 50:1-500: 1, 75:1-150: 1, or 100: 1-500:1. In some instances, at least one of the nucleotides present during amplification using a method described herein is a terminator nucleotide.
  • a terminator pairing with guanine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with adenine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with uracil is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. Any nucleotide capable of terminating nucleic acid extension by a nucleic acid polymerase in some instances is used as a terminator nucleotide in the methods described herein.
  • a reversible terminator is used to terminate nucleic acid replication.
  • a non-reversible terminator is used to terminate nucleic acid replication.
  • non-limited examples of terminators include reversible and non-reversible nucleic acids and nucleic acid analogs, such as, e.g., 3’ blocked reversible terminator comprising nucleotides, 3’ unblocked reversible terminator comprising nucleotides, terminators comprising 2’ modifications of deoxynucleotides, terminators comprising modifications to the nitrogenous base of deoxynucleotides, or any combination thereof.
  • terminator nucleotides are dideoxynucleotides.
  • terminators are polynucleotides comprising 1, 2, 3, 4, or more bases in length.
  • terminators do not comprise a detectable moiety or tag (e.g., mass tag, fluorescent tag, dye, radioactive atom, or other detectable moiety).
  • terminators do not comprise a chemical moiety allowing for attachment of a detectable moiety or tag (e.g., “click” azide/alkyne, conjugate addition partner, or other chemical handle for attachment of a tag).
  • all terminator nucleotides comprise the same modification that reduces amplification to at region (e.g., the sugar moiety, base moiety, or phosphate moiety) of the nucleotide.
  • At least one terminator has a different modification that reduces amplification.
  • all terminators have a substantially similar fluorescent excitation or emission wavelengths.
  • terminators without modification to the phosphate group are used with polymerases that do not have exonuclease proofreading activity. Terminators, when used with polymerases which have 3’->5’ proofreading exonuclease activity (such as, e.g., phi29) that can remove the terminator nucleotide, are in some instances further modified to make them exonucl ease-resistant.
  • dideoxynucleotides are modified with an alpha-thio group that creates a phosphorothioate linkage which makes these nucleotides resistant to the 3 ’->5’ proofreading exonuclease activity of nucleic acid polymerases.
  • Such modifications in some instances reduce the exonuclease proofreading activity of polymerases by at least 99.5%, 99%, 98%, 95%, 90%, or at least 85%.
  • nucleotides with modification include base-modified nucleic acids comprising free 3’ OH groups (e.g., 2-nitrobenzyl alkylated HOMedU triphosphates, bases comprising modification with large chemical groups, such as beads or other large moiety).
  • a polymerase with strand displacement activity but without 3 ’->5 ’exonuclease proofreading activity is used with terminator nucleotides with or without modifications to make them exonuclease resistant.
  • nucleic acid polymerases include, without limitation, Bst DNA polymerase, Bsu DNA polymerase, Deep Vent (exo-) DNA polymerase, Klenow Fragment (exo-) DNA polymerase, Therminator DNA polymerase, and VentR(exo-).
  • amplicon libraries resulting from amplification of at least one target nucleic acid molecule are in some instances generated using the methods described herein, such as those using terminators. Such methods comprise use of strand displacement polymerases or factors, terminator nucleotides (reversible or irreversible), or other features and embodiments described herein.
  • reversible terminators are capable of removal by an exonuclease (e g., or polymerase having exonuclease activity).
  • irreversible terminators are not capable of substantial removal by an exonuclease (e.g., or polymerase having exonuclease activity).
  • amplicon libraries generated by use of terminators described herein are further amplified in a subsequent amplification reaction (e.g., PCR). In some instances, subsequent amplification reactions do not comprise terminators. In some instances, amplicon libraries comprise polynucleotides, wherein at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% of the polynucleotides comprise at least one terminator nucleotide. In some instances, the amplicon library comprises the target nucleic acid molecule from which the amplicon library was derived.
  • the amplicon library comprises a plurality of polynucleotides, wherein at least some of the polynucleotides are direct copies (e.g., replicated directly from a target nucleic acid molecule, such as genomic DNA, RNA, or other target nucleic acid). For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
  • At least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 15% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 50% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
  • 3%-5%, 3-10%, 5%- 10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
  • at least some of the polynucleotides are direct copies of the target nucleic acid molecule, or daughter (a first copy of the target nucleic acid) progeny.
  • at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
  • At least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 30% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
  • 3%-5%, 3%-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
  • direct copies of the target nucleic acid are 50- 2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length.
  • daughter progeny are 1000-5000, 2000-5000, 1000-10,000, 2000-5000, 1500-5000, 3000-7000, or 2000-7000 bases in length.
  • the average length of PTA amplification products is 25-3000 nucleotides in length, 50-2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length.
  • amplicons generated from PTA are no more than 5000, 4000, 3000, 2000, 1700, 1500, 1200, 1000, 700, 500, or no more than 300 bases in length.
  • amplicons generated from PTA are 1000-5000, 1000-3000, 200-2000, 200-4000, 500-2000, 750-2500, or 1000-2000 bases in length.
  • Amplicon libraries generated using the methods described herein comprise at least 1000, 2000, 5000, 10,000, 100,000, 200,000, 500,000 or more than 500,000 amplicons comprising unique sequences.
  • the library comprises at least 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 2000, 2500, 3000, or at least 3500 amplicons.
  • at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of less than 1000 bases are direct copies of the at least one target nucleic acid molecule.
  • At least 5%, 10%, 1 %, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of no more than 2000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of 3000-5000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10:1, 100: 1, 1000: 1, 10,000:1, 100,000:1, 1,000,000:1, 10,000,000:1, or more than 10,000,000: 1.
  • the ratio of direct copy amplicons to target nucleic acid molecules is at least 10:1, 100:1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000:1, 10,000,000: 1, or more than 10,000,000:1, wherein the direct copy amplicons are no more than 700-1200 bases in length. In some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10:1, 100:1, 1000:1, 10,000:1, 100,000: 1, 1,000,000: 1, 10,000,000:1, or more than 10,000,000: 1.
  • the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000:1, 10,000,000:1, or more than 10,000,000:1, wherein the direct copy amplicons are 700-1200 bases in length, and the daughter amplicons are 2500-6000 bases in length.
  • the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50- 1000, about 150-2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule.
  • the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50-1000, about 150- 2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule or daughter amplicons.
  • the number of direct copies may be controlled in some instances by the number of amplification cycles. In some instances, no more than 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, about 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or about 3 cycles are used to generate copies of the target nucleic acid molecule.
  • cycles are used to generate copies of the target nucleic acid molecule.
  • 2-4, 2-5, 2-7, 2-8, 2-10, 2-15, 3-5, 3-10, 3-15, 4-10, 4-15, 5-10 or 5-15 cycles are used to generate copies of the target nucleic acid molecule.
  • Amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further amplification. In some instances, such additional steps precede a sequencing step.
  • the cycles are PCR cycles.
  • the cycles represent annealing, extension, and denaturation.
  • the cycles represent annealing, extension, and denaturation which occur under isothermal or essentially isothermal conditions.
  • Methods described herein may additionally comprise one or more enrichment or purification steps.
  • one or more polynucleotides (such as cDNA, PTA amplicons, or other polynucleotide) are enriched during a method described herein.
  • polynucleotide probes are used to capture one or more polynucleotides.
  • probes are configured to capture one or more genomic exons.
  • a library of probes comprises at least 1000, 2000, 5000, 10,000, 50,000, 100,000, 200,000, 500,000, or more than 1 million different sequences.
  • a library of probes comprises sequences capable of binding to at least 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000 or more than 10,000 genes.
  • probes comprise a moiety for capture by a bead, such as biotin.
  • an enrichment step occurs after a PTA step.
  • an enrichment step occurs before a PTA step.
  • probes are configured to bind genomic DNA libraries.
  • probes are configured to bind cDNA libraries.
  • Amplicon libraries of polynucleotides generated from the PTA methods and compositions (terminators, polymerases, etc.) described herein in some instances have increased uniformity. Uniformity, in some instances, is described using a Lorenz curve, or other such method. Such increases in some instances lead to lower sequencing reads needed for the desired coverage of a target nucleic acid molecule (e g., genomic DNA, RNA, or other target nucleic acid molecule). For example, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 80% of a cumulative fraction of sequences of the target nucleic acid molecule.
  • no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 60% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 70% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 90% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, uniformity is described using a Gini index (wherein an index of 0 represents perfect equality of the library and an index of 1 represents perfect inequality).
  • amplicon libraries described herein have a Gini index of no more than 0.55, 0.50, 0.45, 0.40, or 0.30. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50. In some instances, amplicon libraries described herein have a Gini index of no more than 0.40.
  • Such uniformity metrics in some instances are dependent on the number of reads obtained. For example, no more than 100 million, 200 million, 300 million, 400 million, or no more than 500 million reads are obtained. In some instances, the read length is about 50,75, 100, 125, 150, 175, 200, 225, or about 250 bases in length. In some instances, uniformity metrics are dependent on the depth of coverage of a target nucleic acid.
  • the average depth of coverage is about 10X, 15X, 20X, 25X, or about 30X. In some instances, the average depth of coverage is 10-30X, 20-50X, 5-40X, 20-60X, 5-20X, or 10-20X.
  • amplicon libraries described herein have a Gini index of no more than 0.55, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein about 300 million reads was obtained.
  • amplicon libraries described herein have a Gini index of no more than 0.55, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is about 15X.
  • amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is no more than 15X.
  • amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is no more than 15X. Uniform amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further PCR amplification. In some instances, such additional steps precede a sequencing step.
  • Primers comprise nucleic acids used for priming the amplification reactions described herein.
  • Such primers in some instances include, without limitation, random deoxynucleotides of any length with or without modifications to make them exonuclease resistant, random ribonucleotides of any length with or without modifications to make them exonuclease resistant, modified nucleic acids such as locked nucleic acids, DNA or RNA primers that are targeted to a specific genomic region, and reactions that are primed with enzymes such as primase.
  • a set of primers having random or partially random nucleotide sequences be used.
  • nucleic acid sample of significant complexity specific nucleic acid sequences present in the sample need not be known and the primers need not be designed to be complementary to any particular sequence. Rather, the complexity of the nucleic acid sample results in a large number of different hybridization target sequences in the sample, which will be complementary to various primers of random or partially random sequence.
  • the complementary portion of primers for use in PTA are in some instances fully randomized, comprise only a portion that is randomized, or be otherwise selectively randomized.
  • the number of random base positions in the complementary portion of primers in some instances, for example, is from 20% to 100% of the total number of nucleotides in the complementary portion of the primers.
  • the number of random base positions in the complementary portion of primers is 10% to 90%, 15-95%, 20%-100%, 30%-100%, 50%- 100%, 75-100% or 90-95% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or at least 90% of the total number of nucleotides in the complementary portion of the primers.
  • Sets of primers having random or partially random sequences are in some instances synthesized using standard techniques by allowing the addition of any nucleotide at each position to be randomized. In some instances, sets of primers are composed of primers of similar length and/or hybridization characteristics.
  • random primer refers to a primer which can exhibit four-fold degeneracy at each position. In some instances, the term “random primer” refers to a primer which can exhibit three-fold degeneracy at each position.
  • Random primers used in the methods described herein in some instances comprise a random sequence that is 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more bases in length. In some instances, primers comprise random sequences that are 3-20, 5-15, 5-20, 6-12, or 4-10 bases in length. Primers may also comprise non-extendable elements that limit subsequent amplification of amplicons generated thereof. For example, primers with non-extendable elements in some instances comprise terminators.
  • primers comprise terminator nucleotides, such as 1, 2, 3, 4, 5, 10, or more than 10 terminator nucleotides. Primers need not be limited to components which are added externally to an amplification reaction. In some instances, primers are generated in-si tu through the addition of nucleotides and proteins which promote priming.
  • primase-like enzymes in combination with nucleotides is in some instances used to generate random primers for the methods described herein.
  • Primase-like enzymes in some instances are members of the DnaG or AEP enzyme superfamily. In some instances, a primase- like enzyme is TthPrimPol. In some instances, a primase-like enzyme is T7 gp4 helicase- primase.
  • primases are in some instances used with the polymerases or strand displacement factors described herein. In some instances, primases initiate priming with deoxyribonucleotides. In some instances, primases initiate priming with ribonucleotides. In some instances, primers are irreversible primers. In some instances, irreversible primers comprise phosphonothioate linkages.
  • the PTA amplification can be followed by selection for a specific subset of amplicons. Such selections are in some instances dependent on size, affinity, activity, hybridization to probes, or other known selection factor in the art. In some instances, selections precede or follow additional steps described herein, such as adapter ligation and/or library amplification. In some instances, selections are based on size (length) of the amplicons. In some instances, smaller amplicons are selected that are less likely to have undergone exponential amplification, which enriches for products that were derived from the primary template while further converting the amplification from an exponential into a quasi-linear amplification process.
  • amplicons comprising 50-2000, 25-5000, 40-3000, 50-1000, 200-1000, 300- 1000, 400-1000, 400-600, 600-2000, or 800-1000 bases in length are selected. Size selection in some instances occurs with the use of protocols, e.g., utilizing solid-phase reversible immobilization (SPRI) on carboxylated paramagnetic beads to enrich for nucleic acid fragments of specific sizes, or other protocol known by those skilled in the art.
  • SPRI solid-phase reversible immobilization
  • selection occurs through preferential ligation and amplification of smaller fragments during PCR while preparing sequencing libraries, as well as a result of the preferential formation of clusters from smaller sequencing library fragments during sequencing (e.g., sequencing by synthesis, nanopore sequencing, or other sequencing method).
  • Other strategies to select for smaller fragments are also consistent with the methods described herein and include, without limitation, isolating nucleic acid fragments of specific sizes after gel electrophoresis, the use of silica columns that bind nucleic acid fragments of specific sizes, and the use of other PCR strategies that more strongly enrich for smaller fragments. Any number of library preparation protocols may be used with the PTA methods described herein.
  • Amplicons generated by PTA are in some instances ligated to adapters (optionally with removal of terminator nucleotides).
  • amplicons generated by PTA comprise regions of homology generated from transposase-based fragmentation which are used as priming sites.
  • libraries are prepared by fragmenting nucleic acids mechanically or enzymatically.
  • libraries are prepared using tagmentation via transposomes.
  • libraries are prepared via ligation of adapters, such as Y-adapters, universal adapters, or circular adapters.
  • the non-complementary portion of a primer used in PTA can include sequences which can be used to further manipulate and/or analyze amplified sequences.
  • Detection tags have sequences complementary to detection probes and are detected using their cognate detection probes. There may be one, two, three, four, or more than four detection tags on a primer. There is no fundamental limit to the number of detection tags that can be present on a primer except the size of the primer. In some instances, there is a single detection tag on a primer. In some instances, there are two detection tags on a primer. When there are multiple detection tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different detection probe. In some instances, multiple detection tags have the same sequence. In some instances, multiple detection tags have a different sequence.
  • a sequence that can be included in the non-complementary portion of a primer is an “address tag” that can encode other details of the amplicons, such as the location in a tissue section.
  • a cell barcode comprises an address tag.
  • An address tag has a sequence complementary to an address probe. Address tags become incorporated at the ends of amplified strands. If present, there may be one, or more than one, address tag on a primer. There is no fundamental limit to the number of address tags that can be present on a primer except the size of the primer. When there are multiple address tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different address probe.
  • the address tag portion can be any length that supports specific and stable hybridization between the address tag and the address probe.
  • nucleic acids from more than one source can incorporate a variable tag sequence.
  • This tag sequence can be up to 100 nucleotides in length, preferably 1 to 10 nucleotides in length, most preferably 4, 5 or 6 nucleotides in length and comprises combinations of nucleotides.
  • a tag sequence is 1-20, 2-15, 3-13, 4-12, 5-12, or 1-10 nucleotides in length For example, if six base-pairs are chosen to form the tag and a permutation of four different nucleotides is used, then a total of 4096 nucleic acid anchors (e g.
  • tags identify the source of a sample or analyte. In some instances, tags uniquely identify every molecule in a population.
  • Primers described herein may be present in solution or immobilized on a bead. In some instances, primers bearing sample barcodes and/or UMI sequences can be immobilized on a bead. In some instances, individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some instances, lysates from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates.
  • extracted nucleic acid from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell.
  • the beads can be manipulated in any suitable manner as is known in the art, for example, using droplet actuators as described herein.
  • the beads may be any suitable size, including for example, microbeads, microparticles, nanobeads and nanoparticles.
  • beads are magnetically responsive; in other embodiments beads are not significantly magnetically responsive.
  • suitable beads include flow cytometry microbeads, polystyrene microparticles and nanoparticles, functionalized polystyrene microparticles and nanoparticles, coated polystyrene microparticles and nanoparticles, silica microbeads, fluorescent microspheres and nanospheres, functionalized fluorescent microspheres and nanospheres, coated fluorescent microspheres and nanospheres, color dyed microparticles and nanoparticles, magnetic microparticles and nanoparticles, superparamagnetic microparticles and nanoparticles (e.g., DYNABEADS® available from Invitrogen Group, Carlsbad, CA), fluorescent microparticles and nanoparticles, coated magnetic microparticles and nanoparticles, ferromagnetic microparticles and nanoparticles, coated ferromagnetic microparticles and nanoparticles, and those described in U.S.
  • DYNABEADS® available from Invitrogen Group, Carlsbad, CA
  • Beads may be pre-coupled with an antibody, protein or antigen, DNA/RNA probe or any other molecule with an affinity for a desired target.
  • primers bearing sample barcodes and/or UMI sequences can be in solution.
  • a plurality of droplets can be presented, wherein each droplet in the plurality bears a sample barcode which is unique to a droplet and the UMI which is unique to a molecule such that the UMI are repeated many times within a collection of droplets.
  • individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell.
  • lysates from individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates.
  • extracted nucleic acid from individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell.
  • PTA primers may comprise a sequence-specific or random primer, a cell barcode and/or a unique molecular identifier (UMI) (e g., linear primer and or hairpin primer).
  • UMI unique molecular identifier
  • the PTA primer comprises a hairpin primer as shown for example in Figure 11.
  • a primer comprises a sequence-specific primer.
  • a primer comprises a random primer.
  • a primer comprises a cell barcode.
  • a primer comprises a sample barcode.
  • a primer comprises a unique molecular identifier.
  • primers comprise two or more cell barcodes. Such barcodes in some instances identify a unique sample source, or unique workflow.
  • Such barcodes or UMIs are in some instances 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 25, 30, or more than 30 bases in length.
  • Primers in some instances comprise at least 1000, 10,000, 50,000, 100,000, 250,000, 500,000, 10 6 , 10 7 , 10 8 , 10 9 , or at least 10 10 unique barcodes or UMIs.
  • primers comprise at least 8, 16, 96, or 384 unique barcodes or UMIs.
  • a standard adapter is then ligated onto the amplification products prior to sequencing; after sequencing, reads are first assigned to a specific cell based on the cell barcode.
  • Suitable adapters that may be utilized with the PTA method include, e.g., xGen® Dual Index UMI adapters available from Integrated DNA Technologies (IDT). Reads from each cell is then grouped using the UMI and reads with the same UMI may be collapsed into a consensus read.
  • the use of a cell barcode allows all cells to be pooled prior to library preparation, as they can later be identified by the cell barcode.
  • the use of the UMI to form a consensus read in some instances corrects for PCR bias, improving the copy number variation (CNV) detection.
  • sequencing errors may be corrected by requiring that a fixed percentage of reads from the same molecule have the same base change detected at each position. This approach has been utilized to improve CNV detection and correct sequencing errors in bulk samples.
  • UMIs are used with the methods described herein, for example, U.S Pat. No. 8,835,358 discloses the principle of digital counting after attaching a random amplifiable barcode. Schmitt, et al and Fan et al. disclose similar methods of correcting sequencing errors.
  • a library is generated for sequencing using primers.
  • the library comprises fragments of 200-700 bases, 100-1000, 300-800, 300-550, 300-700, or 200-800 bases in length.
  • the library comprises fragments of at least 50, 100, 150, 200, 300, 500, 600, 700, 800, or at least 1000 bases in length.
  • the library comprises fragments of about 50, 100, 150, 200, 300, 500, 600, 700, 800, or about 1000 bases in length.
  • the methods described herein may further comprise additional steps, including steps performed on the sample or template.
  • samples or templates in some instances are subjected to one or more steps prior to PTA.
  • samples comprising cells are subjected to a pre-treatment step.
  • cells undergo lysis and proteolysis to increase chromatin accessibility using a combination of freeze-thawing, Triton X-100, Tween 20, and Proteinase K.
  • Other lysis strategies may also be suitable for practicing the methods described herein. Such strategies include, without limitation, lysis using other combinations of detergent and/or lysozyme and/or protease treatment and/or physical disruption of cells such as sonication and/or alkaline lysis and/or hypotonic lysis.
  • the primary template or target molecule(s) is subjected to a pre-treatment step.
  • the primary template (or target) is denatured using sodium hydroxide, followed by neutralization of the solution.
  • Other denaturing strategies may also be suitable for practicing the methods described herein. Such strategies may include, without limitation, combinations of alkaline lysis with other basic solutions, increasing the temperature of the sample and/or altering the salt concentration in the sample, addition of additives such as solvents or oils, other modification, or any combination thereof.
  • additional steps include sorting, filtering, or isolating samples, templates, or amplicons by size.
  • cells are lysed with mechanical (e.g., high pressure homogenizer, bead milling) or non-mechanical (physical, chemical, or biological).
  • physical lysis methods comprise heating, osmotic shock, and/or cavitation.
  • chemical lysis comprises alkali and/or detergents.
  • biological lysis comprises use of enzymes. Combinations of lysis methods are also compatible with the methods described herein. Non-limited examples of lysis enzymes include recombinant lysozyme, serine proteases, and bacterial lysins.
  • lysis with enzymes comprises use of lysozyme, lysostaphin, zymolase, cellulose, protease or glycanase.
  • amplicon libraries are enriched for amplicons having a desired length.
  • amplicon libraries are enriched for amplicons having a length of 50-2000, 25-1000, 50-1000, 75-2000, 100-3000, 150-500, 75-250, 170-500, 100-500, or 75-2000 bases.
  • amplicon libraries are enriched for amplicons having a length no more than 75, 100, 150, 200, 500, 750, 1000, 2000, 5000, or no more than 10,000 bases.
  • amplicon libraries are enriched for amplicons having a length of at least 25, 50, 75, 100, 150, 200, 500, 750, 1000, or at least 2000 bases.
  • buffers or other formulations Such buffers are in some instances used for PTA, RT, or other method described herein.
  • buffers in some instances comprise surfactants/detergent or denaturing agents (Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant), salts (potassium or sodium phosphate (monobasic or dibasic), sodium chloride, potassium chloride, TrisHCl, magnesium chloride or sulfate, Ammonium salts such as phosphate, nitrate, or sulfate, EDTA), reducing agents (DTT, THP, DTE, beta-mercaptoethanol, TCEP, or other reducing agent) or other components (glycerol, hydrophilic polymers such as PEG).
  • surfactants/detergent or denaturing agents Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant
  • salts
  • buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. Buffers may comprise one or more crowding agents. In some instances, crowding reagents include polymers. In some instances, crowding reagents comprise polymers such as polyols. In some instances, crowding reagents comprise polyethylene glycol polymers (PEG). In some instances, crowding reagents comprise polysaccharides.
  • crowding reagents include ficoll (e.g., ficoll PM 400, ficoll PM 70, or other molecular weight ficoll), PEG (e.g., PEG1000, PEG 2000, PEG4000, PEG6000, PEG8000, or other molecular weight PEG), dextran (dextran 6, dextran 10, dextran 40, dextran 70, dextran 6000, dextran 138k, or other molecular weight dextran).
  • the nucleic acid molecules amplified according to the methods described herein may be sequenced and analyzed using methods known to those of skill in the art.
  • sequencing methods examples include, e.g., sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309: 1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (Int. Pat. Appl. Pub. No. W02006/073504), multiplex sequencing (U.S. Pat. Appl. Pub. No.
  • allele-specific oligo ligation assays e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout
  • high- throughput sequencing methods such as, e.g., methods using Roche 454, Illumina Solexa, AB- SOLiD, Helicos, Polonator platforms and the like, and light-based sequencing technologies (Landegren et al. (1998) Genome Res.
  • the amplified nucleic acid molecules are shotgun sequenced. Sequencing of the sequencing library is in some instances performed with any appropriate sequencing technology, including but not limited to single-molecule realtime (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis (array/colony-based or nanoball based).
  • SMRT single-molecule realtime
  • Polony sequencing sequencing by ligation
  • reversible terminator sequencing proton detection sequencing
  • ion semiconductor sequencing ion semiconductor sequencing
  • nanopore sequencing nanopore sequencing
  • electronic sequencing pyrosequencing
  • Maxam-Gilbert sequencing Maxam-Gilbert sequencing
  • chain termination e.g., Sanger sequencing
  • +S sequencing or sequencing by synthesis (array/colony-based or nanoball
  • Sequencing libraries generated using the methods described herein may be sequenced to obtain a desired number of sequencing reads.
  • libraries are generated from a single cell or sample comprising a single cell (alone or part of a multiomics workflow).
  • libraries are sequenced to obtain at least 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or at least 10 million reads.
  • libraries are sequenced to obtain no more than 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or no more than 10 million reads.
  • libraries are sequenced to obtain about 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or about 10 million reads. In some instances, libraries are sequenced to obtain 0.1-10, 0.1-5, 0.1-1, 0.2-1, 0.3-1.5, 0.5-1, 1-5, or 0.5-5 million reads per sample. In some instances, the number of reads is dependent on the size of the genome. In some in instances samples comprising bacterial genomes are sequenced to obtain 0.5-1 million reads. In some instances, libraries are sequenced to obtain at least 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or at least 900 million reads.
  • libraries are sequenced to obtain no more than 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or no more than 900 million reads. In some instances, libraries are sequenced to obtain about 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or about 900 million reads. In some in instances samples comprising mammalian genomes are sequenced to obtain 500-600 million reads. In some instances, the type of sequencing library (cDNA libraries or genomic libraries) are identified during sequencing. In some instances, cDNA libraries and genomic libraries are identified during sequencing with unique barcodes.
  • SCM spent culture media
  • DNA, RNA, and/or proteins from the SCM are analyzed in parallel.
  • the analysis may include identification of epigenetic post-translational (e.g., glycosylation, phosphorylation, acetylation, ubiquination, histone modification) and/or post-transcriptional (e.g., methylation, hydroxymethylation) modifications.
  • Such methods may comprise PTA to obtain libraries of nucleic acids for sequencing.
  • PTA is combined with additional steps or methods such as RT- PCR or proteome/protein quantification techniques (e g., mass spectrometry, antibody staining, etc.).
  • additional steps or methods such as RT- PCR or proteome/protein quantification techniques (e g., mass spectrometry, antibody staining, etc.).
  • proteome/protein quantification techniques e g., mass spectrometry, antibody staining, etc.
  • various components of a SCM are physically or spatially separated from each other during individual analysis steps.
  • a workflow in some instances comprises labeling proteins with antibodies.
  • the antibodies comprise a tag or marker (e.g., nucleic acid/oligo tag, mass tag, or fluorescent, tag).
  • a portion of the antibodies comprise an oligo tag.
  • a portion of the antibodies comprise a fluorescent marker.
  • antibodies are labeled by two or more tags or markers.
  • a portion of the antibodies are sorted based on fluorescent markers. After RT-PCR, first strand mRNA products are generated and then removed for analysis. Libraries are then generated from RT-PCR products and barcodes present on protein-specific antibodies, which are subsequently sequenced.
  • genomic DNA from the SCM is subjected to PTA, a library generated, and sequenced.
  • Sequencing results from the genome, proteome, and transcriptome are in some instances pooled using bioinformatics methods.
  • Methods described herein in some instances comprise any combination of labeling, cell sorting, affinity separation/purification, lysing of specific cell components (e.g., outer membrane, nucleus, etc.), RNA amplification, DNA amplification (e.g., PTA), or other step associated with protein, RNA, or DNA isolation or analysis.
  • methods described herein comprise one or more enrichment steps, such as exome enrichment.
  • the analysis comprises analysis of RNA and DNA from a cell culture media (e.g., SCM).
  • the cell culture media may be obtained by culturing one or more cells (e.g., embryonic cells) or using a cell line proxy system, as shown for example in Figure 10.
  • the method comprises isolation of single cells, lysis of single cells, and reverse transcription (RT).
  • RT reverse transcription
  • reverse transcription is carried out with template switching oligonucleotides (TSOs).
  • TSOs comprise a molecular TAG such as biotin, which allows subsequent pull-down of cDNA RT products, and PCR amplification of RT products to generate a cDNA library.
  • centrifugation is used to separate RNA in the supernatant from cDNA in the cell pellet.
  • Remaining cDNA is in some instances fragmented and removed with UDG (uracil DNA glycosylase), and alkaline lysis is used to degrade RNA and denature the genome.
  • UDG uracil DNA glycosylase
  • alkaline lysis is used to degrade RNA and denature the genome.
  • SPRI solid phase reversible immobilization
  • alkaline lysis is then used to degrade RNA and denature the genome.
  • amplification products are in some instances purified on SPRI (solid phase reversible immobilization) beads, and ligated to adapters to generate a gDNA library.
  • RT products are in some instances isolated by pulldown, such as a pulldown with streptavidin beads.
  • antibodies are labeled with either fluorescent labels, nucleic acid barcodes, or both. Labeled antibodies bind to at least one cell in the population, and such cells are sorted, placing one cell per container (e.g., a tube, vial, microwell, etc.).
  • the container comprises a solvent.
  • a region of a surface of a container is coated with a capture moiety.
  • the capture moiety is a small molecule, an antibody, a protein, or other agent capable of binding to one or more cells, organelles, or other cell component in a SCM.
  • at least one cell, or a single cell, or component thereof binds to a region of the container surface.
  • a nucleus binds to the region of the container.
  • the outer membrane of the cell is lysed, releasing mRNA into a solution in the container.
  • the nucleus of the cell containing genomic DNA is bound to a region of the container surface.
  • RT is often performed using the mRNA in solution as a template to generate cDNA.
  • template switching primers comprise from 5’ to 3’ a TSS region (transcription start site), an anchor region, a RNA BC region, and a poly dT tail.
  • the poly dT tail binds to poly A tail of one or more mRNAs.
  • template switching primers comprise from 3’ to 5’ a TSS region, an anchor region, and a poly G region.
  • the poly G region comprises riboG.
  • the poly G region binds to a poly C region on an mRNA transcript.
  • riboG was added to the mRNA transcripts by a terminal transferase. After removal of RT PCR products for subsequent sequencing, any remaining RNA in the cell is removed by UNG. The nucleus is then lysed, and the released genomic DNA is subjected to the PTA method using random primers with an isothermal polymerase.
  • primers are 6-9 bases in length.
  • PTA generates genomic amplicons of 100-5000, 200- 5000, 500-2000, 500-2500, 1000-3000, or 300-3000 bases in length. In some instances, PTA generates genomic amplicons with an average length of 100-5000, 200-5000, 500-2000, 500- 2500, 1000-3000, or 300-3000 bases. In some instances, PTA generates genomic amplicons of 250-1500 bases in length. In some instances, the methods described herein generate a short fragment cDNA pool with about 500, about 750, about 1000, about 5000, or about 10,000 fold amplification. In some instances, the methods described herein generate a short fragment cDNA pool with 500-5000, 750-1500, or 250-10,000 fold amplification. PTA products are optionally subjected to additional amplification and sequenced.
  • the single cells may, in some instances, be cultured for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 days.
  • the single cells comprise embryonic cells.
  • the embryonic cells are cultured for about 5 to 7 days. Any method of single cell isolation may be used with PTA, such as mouth pipetting, micro pipetting, flow cytometry/FACS, microfluidics, methods of sorting nuclei (tetrapioid or other), or manual dilution.
  • PTA any method of single cell isolation may be used with PTA, such as mouth pipetting, micro pipetting, flow cytometry/FACS, microfluidics, methods of sorting nuclei (tetrapioid or other), or manual dilution.
  • about 5 to 10 microliters of SCM media is harvested post-culture for analysis.
  • a method of multiomic analysis described herein comprises mechanical or enzymatic dissociate of cells from larger tissues.
  • Methods of multiomic analysis comprising PTA described herein may comprise one or more methods of processing cell components such as DNA, RNA, and/or proteins.
  • the cell components are obtained from the SCM (e.g., from one or more embryonic cells or a cell line proxy system).
  • the nucleus comprising genomic DNA
  • the cytosol comprising mRNA
  • the cytosol is then separated from the nucleus using methods including micro pipetting, centrifugation, or anti-body conjugated magnetic microbeads.
  • an oligo-dT primer coated magnetic bead binds polyadenylated mRNA for separation from DNA.
  • DNA and RNA are preamplified simultaneously, and then separated for analysis.
  • SCM is split into two equal pieces, with mRNA from one half processed, and genomic DNA from the other half processed.
  • PTA may be used as a replacement for any number of other known methods in the art which are used for single cell sequencing (multiomics or the like).
  • PTA may substitute genomic DNA sequencing methods such as MDA, PicoPlex, DOP- PCR, MALBAC, or target-specific amplifications.
  • PTA replaces the standard genomic DNA sequencing method in a multiomics method including DR-seq (Dey et al., 2015), G&T seq (MacAulay et al., 2015), scMT-seq (Hu et al., 2016), sc-GEM (Cheow et al., 2016), scTrio-seq (Hou et al., 2016), simultaneous multiplexed measurement of RNA and proteins (Darmanis et al., 2016), scCOOL-seq (Guo et al., 2017), CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017), scNMT-seq (Clark et al., 2018), or SIDR-seq (Han et al., 2018).
  • DR-seq Dey et al., 2015
  • a method described herein comprises PTA and a method of polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of non-polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of total (polyadenylated and non-polyadenylated) mRNA transcripts.
  • PTA is combined with a standard RNA sequencing method to obtain genome and transcriptome data.
  • a multiomics method described herein comprises PTA and one of the following: Drop-seq (Macosko, et al.
  • an RT reaction mix is used to generate a cDNA library.
  • the RT reaction mixture comprises a crowding reagent, at least one primer, a template switching oligonucleotide (TSO), a reverse transcriptase, and a dNTP mix.
  • an RT reaction mix comprises an RNAse inhibitor.
  • an RT reaction mix comprises one or more surfactants.
  • an RT reaction mix comprises Tween-20 and/or Triton-X.
  • an RT reaction mix comprises Betaine.
  • an RT reaction mix comprises one or more salts.
  • an RT reaction mix comprises a magnesium salt (e.g., magnesium chloride) and/or tetramethylammonium chloride.
  • an RT reaction mix comprises gelatin.
  • an RT reaction mix comprises PEG (PEG1000, PEG2000, PEG4000, PEG6000, PEG8000, or PEG of other length).
  • Multiomic methods described herein may provide both genomic and RNA transcript information from a SCM (e.g., a combined or dual protocol).
  • genomic information from the SCM is obtained from the PTA method, and RNA transcript information is obtained from reverse transcription to generate a cDNA library.
  • a whole transcript method is used to obtain the cDNA library.
  • 3’ or 5’ end counting is used to obtain the cDNA library.
  • cDNA libraries are not obtained using UMIs.
  • a multiomic method provides RNA transcript information from the SCM for at least 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or at least 15,000 genes.
  • a multiomic method provides RNA transcript information from the SCM for about 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or about 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the SCM for 100-12,000 1000- 10,000, 2000-15,000, 5000-15,000, 10,000-20,000, 8000-15,000, or 10,000-15,000 genes. In some instances, a multiomic method provides genomic sequence information for at least 80%, 90%, 92%, 95%, 97%, 98%, or at least 99% of the genome of the SCM. In some instances, a multiomic method provides genomic sequence information for about 80%, 90%, 92%, 95%, 97%, 98%, or about 99% of the genome of the SCM.
  • Multiomic methods may comprise analysis of SCM from culturing a population of cells (e.g., embryonic cells) or a cell line proxy system (e.g., Figure 10).
  • a population of cells e.g., embryonic cells
  • a cell line proxy system e.g., Figure 10
  • at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or at least 8000 cells are analyzed.
  • about 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or about 8000 cells are analyzed.
  • 5-100, 10-100, 50-500, 100-500, 100-1000, 50-5000, 100-5000, 500-1000, 500-10000, 1000-10000, or 5000-20,000 cells are analyzed.
  • Multiomic methods may generate yields of genomic DNA from the PTA reaction based on the SCM.
  • the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 micrograms. In some instances, the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 nanograms. In some instances, the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 femtograms. In some instances, the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 micrograms. In some instances, the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 nanograms.
  • the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 femtograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 nanograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 micrograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 nanograms.
  • the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 femtograms. In some instances, these amounts of nucleic acids are present in solution, such as no more than 0.1, 1, 1.5, 2, 2.5, 5, 10, 15, 20, 50, or no more than 100 microliters of solution.
  • Methods of detecting methylated genomic bases include selective restriction with methylation-sensitive endonucleases, followed by processing with the PTA method. Sites cut by such enzymes are determined from sequencing, and methylated bases are identified.
  • bisulfite treatment of genomic DNA libraries converts unmethylated cytosines to uracil. Libraries are then in some instances amplified with methylation-specific primers which selectively anneal to methylated sequences.
  • non-methylation-specific PCR is conducted, followed by one or more methods to discriminate between bisulfite-reacted bases, including direct pyrosequencing, MS-SnuPE, HRM, COBRA, MS-SSCA, or base-specific cleavage/MALDI- TOF.
  • genomic DNA samples are split for parallel analysis of the genome (or an enriched portion thereof) and methylome analysis.
  • analysis of the genome and methylome comprises enrichment of genomic fragments (e.g., exome, or other targets) or whole genome sequencing.
  • the data obtained from the analysis methods utilizing PTA described herein may be compiled into a database. Described herein are methods and systems of bioinformatic data integration. Data from the proteome, genome, transcriptome, methylome or other data is in some instances combined/integrated into a database and analyzed. Bioinformatic data integration methods and systems in some instances comprise one or more of protein detection (FACS and/or NGS), mRNA detection, and/or genome variance detection. In some instances, this data is correlated with a disease state or condition. In some instances, data from an SCM is compiled to describe properties of a cell population, such as cells from specific embryonic tissue region. In some instances, protein data is acquired from fluorescently labeled antibodies which selectively bind to proteins.
  • a method of protein detection comprises grouping cells based on fluorescent markers and reporting sample location post-sorting. In some instances, a method of protein detection comprises detecting sample barcodes, detecting protein barcodes, comparing to designed sequences, and grouping cells based on barcode and copy number. In some instances, protein data is acquired from barcoded antibodies which selectively bind to proteins on a cell. In some instances, transcriptome data is acquired from sample and RNA specific barcodes.
  • a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence.
  • Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, populations of organisms, or other areas of the same genome.
  • mutations are identified on a plasmid or chromosome.
  • a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration).
  • a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion).
  • PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High- Throughput Genome-Wide Translocation Sequencing), IDLV (integration-deficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER-seq.
  • Example 1 Ligation of short, cell-free DNA fragments into long concatemers for ResolveDNA amplification.
  • cfDNA cell-free DNA
  • Other body fluids like saliva, urine, cerebrospinal fluid (CSF), and pleural effusion also contain cfDNA.
  • cfDNA is a promising noninvasive diagnostic tool for cancer, stroke, hypertension, autoimmune, and heart disorders.
  • Blood plasma from cancer patients contains tumor cell-derived circulating tumor DNA (ctDNA) along with cfDNA from healthy cells, which is in vast excess to ctDNA. The amount of ctDNA present correlates with cancer progression (stage) across a diversity of tumor types 5.
  • ctDNA evaluation to detect disease-relevant mutations present in the primary tumor offers early detection, longitudinal surveillance, and minimal residual disease identification after therapy in a noninvasive and cost-effective manner compared to traditional tissue biopsy. It has enormous potential to become a routine liquid biopsy not only to screen cancer but in screening and monitoring other diseases such as stroke, immune and cardiovascular diseases. Furthermore, many researchers are actively investigating the potential of cfDNA sequencing in prenatal screening to detect fetal abnormalities.
  • Formalin-fixed paraffin- embedded (FFPE) DNA is fragmented and short ( ⁇ 300-600bp) compared to intact genomic DNA. FFPE DNA sequencing is challenging, due to uneven coverage and decreased SNV detection sensitivity due to the fragmented genome as well as introduction of single-base errors.
  • Exosomes extra cellular vesicles secreted by cells into the blood, are another component of liquid biopsy that is being explored as a disease biomarker.
  • Exosomal DNA (exo-DNA) is not as fragmented as cf/ctDNA; however, it can be shorter than genomic DNA.
  • the ligation strategy we propose for cf/ctDNA can potentially be used to size-enhance fragmented FFPE DNA and exo-DNA to enhance the ability to perform ResolveDNA amplification.
  • cfDNA as a routine clinical diagnostic marker
  • the primary limitations of using cfDNA as a routine clinical diagnostic marker are its ultra-low amounts in the blood plasma and its small size.
  • cfDNA in healthy individuals is primarily hematopoietic, with an average of 2.5-30 ng/ml of blood.
  • cfDNA concentrations vary in cancer patients as a function of the size, stage, and type of the primary cancer; and can be higher in late stage cancers due to the additive contribution of ctDNA with an average concentration of 10-180 ng/ml in blood.
  • the size of cell-free DNA ranges from 120-220 bp and varies in different types of cancers due to altered nucleosome positioning.
  • ResolveDNA Bioskryb Genomics
  • whole genome amplification is proven capable of amplifying very low amounts of DNA uniformly with high fidelity.
  • Deep sequencing data (10X) for the Onco Span cfDNA reference standard amplified by ResolveDNA only covered eleven of 386 regions reported by Horizon exome sequencing data for this sample (120x coverage using Agilent SureSelect Human All Exon V6 kit and Illumina sequencing data) with at least a read covering the focal loci and only revealed one verified mutation out of 386.
  • T4 DNA ligase can act on sticky and blunt ends and ligate short DNA fragments into long concatemers; we envisioned that these long cfDNA concatemers could be ideal templates for efficient Resolve DNA amplification.
  • T4 DNA ligation has not been previously used for cfDNA applications; however, it has been implemented to amplify fragmented DNA from FFPE samples using multiple displacement amplification.
  • Integrative genome viewer (IGV) analysis of the read coverage of the fragments showed that fragments ligated before amplification have increased coverage uniformity across the length of the template compared to the unligated samples (FIGS. 5-7)).
  • the chimeric percentage is high (-40-50%) in both unligated and ligated samples.
  • Reasons for the high-percent chimera must be carefully interpreted as we are not using a classical template for ResolveDNA (gDNA), and by ligating, are effectively making chimeras of a small fraction of the genome.
  • this protocol may also be applied to cell-free DNA isolated from the plasma of a late-stage cancer patient with known mutational profile data from the primary tumor — we seek to identify the same mutation(s) present in the primary solid tumor in the ctDNA.
  • the ligation efficiency may be adjusted with and without PNK while varying input (10pg-25ng).
  • PNK is used for efficient ligation and amplification (FIG. 9). Ligation efficiency for ⁇ 7ng of input performs as with 50ng and starts declining slightly from Ing onwards (data not shown) However, methods such as using stuffer DNA of a known sequence may increase the DNA concentration for ligation.
  • Example 2 Analysis of cfDNA from culture media
  • cell free DNA from spent culture media of an embryo is used to analyze one or more fetal genetic abnormalities.
  • the embryo is being prepared for implantation.
  • the embryo is a human or nonhuman embryo, such as livestock. This method is expected to provide advantages over more invasive fetal abnormality tests (e.g., amniocentesis or biopsy).
  • Example 3 Analysis of SCM Samples from Embryonic Cultures
  • the end-repair cocktail comprised T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase, for repair/fill-in, phosphorylation, and fill-in + A-tailing, respectively.
  • ERAT Ethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminotide kinase
  • ligation was performed with T4 DNA ligase and then subjected to ResolveDNA or Resolve OME workflows.
  • PTA more efficiently amplifies small circular templates like the mitochondrial genome (mtDNA) relative to other genomic amplification methodologies.
  • the ligation approach was designed to generate circular molecules by the inclusion of hairpin adapters to generate boundaries for the circular form and by the inclusion of a T-overhang adapter to facilitate efficient concatenation of A-tailed SCM small DNA fragments internal to the hairpin adapters ( Figure 11).
  • the resulting circular product would be subjected to PTA in the ResolveDNA or ResolveOME workflows.
  • This approach was distinct in that in addition to (or exclusive of) the standard random er priming of PTA, PTA priming could be initiated using a primer directed at the hairpin adapter sequence — which could be engineered to include sequences not represented in the human genome.
  • the segment score of MAD mean absolute deviation
  • the segment score of MAD decreased in the presence of ligase, indicating that the signal to noise had increased with respect to copy number alteration calling.
  • Corresponding preseq values were indicative of a lack of single-cell contamination, providing confidence that the output metrics reflect concatenation of 150 bp SCM-derived DNA fragments.
  • ligation strategies presented here for SCM can have broad relevance and implications for work aimed at creating workflows for FFPE and ascertaining exosome nucleic acid by PTA due to their small size/fragmented nature.
  • the circularization methodology presented here could bring PTA into realms where rolling circle amplification (RCA) is currently exploited (e.g., sequencing platform flow cells).
  • RCA rolling circle amplification

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods and systems for amplification of short nucleic acid fragments. Further provided herein are methods of amplifying cfDNA and/or ctDNA.

Description

METHODS, SYSTEMS, AND COMPOSITIONS FOR CFDNA ANALYSIS
CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Application No. 63/585,497 filed September 26, 2023, which is incorporated herein by reference in its entirety.
BACKGROUND
[0002] Running parallel reactions on a plurality of samples and/or performing parallel screens on a plurality of samples with small volumes in a short period of time have vast applications. Miniaturization of screening systems can reduce the cost of performing the reactions and screens, decreasing the sample sizes required for performing screens, save cost, and significantly reduce the time requirements for gathering large volumes of data which can provide invaluable insights into biological systems, diseases, drug effects, and beyond.
SUMMARY
[0003] There is an unmet need for methods and systems for high-throughput cell analysis and high-throughput nucleic acid analysis. Provided herein are methods and systems for analysis of short nucleic acid fragments, such as cfDNA (cell-free DNA).
[0004] Provided herein are methods of amplification comprising: (a) ligating a plurality of sample nucleic acids to form concatemers; (b) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids, and (c) amplifying the at least one target nucleic acid molecule to generate a plurality of terminated amplification products, wherein the at least one terminator nucleotide is attached to the 3' terminus of the terminated amplification products, and wherein the replication proceeds by strand displacement replication. Further provided herein are methods wherein the sample nucleic acids comprise the sample nucleic acids comprise cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA). Further provided herein are methods wherein the sample nucleic acids are no more than 300 bases in length. Further provided herein are methods wherein the sample nucleic acids are no more than 200 bases in length. Further provided herein are methods wherein the sample nucleic acids comprise cfDNA. Further provided herein are methods wherein the sample nucleic acids comprise ctDNA. Further provided herein are methods wherein the sample nucleic acids are obtained from an Further provided herein are methods wherein sample. Further provided herein are methods wherein the sample nucleic acids are obtained from spent culture media. Further provided herein are methods wherein the sample nucleic acids are obtained from spent culture media of an embryo. Further provided herein are methods wherein ligating comprises phosphorylation of sample nucleic acid ends followed by contact with a ligase. Further provided herein are methods wherein the ligase comprises T4 ligase. Further provided herein are methods wherein the nucleotides with modification to the alpha group are alpha-thio dideoxynucleotides. Further provided herein are methods wherein the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxyribose. Further provided herein are methods wherein the at least one terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. Further provided herein are methods wherein the at least one polymerase comprises a B-type polymerase. Further provided herein are methods wherein the at least one polymerase comprises Phi29 polymerase. Further provided herein are methods further comprising polishing an end of a nucleic acid from the plurality of sample nucleic acids prior to (a). Further provided herein are methods wherein polishing comprises using an end-repair and A-tailing (ERAT) cocktail. Further provided herein are methods wherein the ERAT cocktail comprises one or more polymerases, a kinase, or both. Further provided herein are methods wherein the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase.
[0005] Provided herein are methods of amplification comprising: (a) obtaining a sample comprising nucleic acids from spent culture media (SCM); (b) ligating the nucleic acids to form concatemers; (c) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5 ’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids; and (d) amplifying the at least one target nucleic acid molecule to generate a plurality of terminated amplification products, wherein the at least one terminator nucleotide is attached to the 3' terminus of the terminated amplification products, and wherein the replication proceeds by strand displacement replication. Further provided herein are methods wherein the SCM is obtained by culturing one or more embryonic cells for about 5 to 7 days. Further provided herein are methods further comprising sequencing a cDNA library comprising polynucleotides amplified from mRNA transcripts the SCM. Further provided herein are methods wherein sequencing the cDNA library comprises reverse transcription. Further provided herein are methods wherein the mRNA transcripts are amplified via template-switching reverse transcription. Further provided herein are methods wherein (b) comprises mixing an end-repair and A-tailing (ERAT) cocktail and a ligation buffer. Further provided herein are methods wherein (b) comprises polishing one or more ends of the nucleic acids using an end-repair and A-tailing (ERAT) cocktail and exposing the nucleic acids to a ligation buffer. Further provided herein are methods wherein the ERAT cocktail comprises one or more polymerases, a kinase, or both. Further provided herein are methods wherein the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase. Further provided herein are methods wherein (b) comprises generating circular molecules using hairpin adapters. Further provided herein are methods wherein the hairpin adapter further comprises a linker.
INCORPORATION BY REFERENCE
[0006] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0008] FIGS. 1A-1B. FIG. 1A: DNA yields after ResolveDNA (Bioskryb Genomics) PTA (primary -template mediated amplification) of short cfDNA size proxy fragments (lOObp- lOOObp). There is no detectable amplification on fragments below 200-300bp and very poor amplification below fragments 500bp. FIG. IB: Agilent high sensitivity D 5000 screen tape analysis of individual fragments before and after Resolve DNA amplification.
[0009] FIG. 2. Fragment size before and after phosphorylation by polynucleotide kinase (PNK) followed by ligation with T4DNA ligase when run on Agilent high sensitivity D 5000 screen tape. [0010] FIG. 3. DNA yields after Resolve DNA amplification of short cfDNA size proxy fragments (lOObp-lOOObp) before and after phosphorylation by PNK followed by ligation with T4DNA kinase. There is no detectable amplification of small-size fragments before ligation After ligation of small-size fragments, DNA yields after amplification were restored to the expected amounts for the ResolveDNA kit.
[0011] FIG. 4 ResolveDNA amplified DNA fragment size before and after phosphorylation by PNK followed by ligation with T4DNA ligase when run on Agilent high sensitivity D 1000 screen tape. Size of the amplified DNA after ligation is as expected for ResolveDNA kit.
[0012] FIGS. 5A-5B. Representative IGV snapshot for cfOnco std (from Horizon): Coverage at BRCA2 (FIG. 5A) & MSH3 (FIG. 5B) genes on chr 13 & 5 respectively upon lowpass sequencing. The ligated sample covered the genes at some of the reported regions (by Horizon) whereas the non-ligated sample did not. In (FIG. 5B) the ligated sample covered a region of MSH3 gene (GRCh38: chr5: 80873118- 80873118) where GCA to ACA variation (A1045T) occurred. The unligated sample did not show any coverage in this region. This variant has been reported in Horizon exome data for this sample (120x coverage using Agilent SureSelect Human All Exon V6 kit and Illumina sequencing data).
[0013] FIGS. 6A-6B. IGV snapshots of ResolveDNA coverage at FLT3-130bp (FIG. 6A) and FLT3-207bp (FIG. 6B) synthetic fragments: Lowpass sequencing data of the ligated sample (bottom) showed more uniform coverage of the targeted regions of FLT3-130 and FLT3-207 relative to the unligated sample (top).
[0014] FIGS. 7A-7B. IGV views of ResolveDNA coverage of a sample containing a 1:1 mixture of FLT3- 130bp and FLT3-207bp synthetic fragments: Lowpass sequencing data showed that the ligated sample (bottom panels) of both FLT3-130 (FIG. 7A) and FLT3-207 (FIG. 7B) harbored more uniform coverage at both regions relative to the unligated control (top panels).
[0015] FIGS. 8A-8B. IGV snapshots of representative single nucleotide variants (SNV): FIG. 8A: Both unligated & ligated samples, when deep sequenced, showed A1045T variation in MSH3; This is the only variation detected in the unligated sample. FIG. 8B: IGV view showing detection db SNP ID:rs2453056 in NOTCH2 in the ligated but not in the in unligated sample. Eighty-four variants (reported by Horizon exome sequencing (150X) for cfDNA onco were only detected in the ligated sample.
[0016] FIGS. 9A-9D. Agilent high sensitivity D 5000 screen tape analysis of ligation of a 130 bp synthetic fragment with and without phosphorylation by PNK. Prior phosphorylation by PNK resulted in better ligation by T4 DNA ligase. FIG. 9A depicts an electrophoresis gel of ladder (lane 1), ligated (lane 2), +PNK + ligated (lane 3), and -PNK + ligated (lane 4). FIG. 9B depicts a Bioanalyzer tracer of the FLT3-130bp fragment. FIG. 9C depicts a Bioanalyzer tracer of the FLT3-130bp fragment treated with PNK and ligated with T4DNA ligase. FIG. 9D depicts a Bioanalyzer tracer of the FLT3-130bp fragment treated T4DNA ligase directly.
[0017] FIG. 10. Cell line proxy system of embryo spent culture media (SCM) ascertainment by PTA, according to some embodiments. The workflow using embryo SCM (top) includes, in some embodiments, retrieval of 5-10 pL of the SCM media, harvested post-culture (5-7 days), ligation, PTA/ResolveOME, followed by PGT-A. The workflow using SCM from a cell line proxy system (bottom) includes, in some embodiments, 10 mL of SCM, centrifuged and extracellular DNA purified by 1.2X SPRI, which is followed by ligation, PTA/ResolveOME, and copy number analysis. The cell line proxy system can comprise, in some embodiments, MOLM-13 AML cells/NA12878 B lymphoblasts.
[0018] FIG. 11. Circularization strategy of short DNA fragments to serve as template for PTA, according to some embodiments. The strategy can include using hairpin adapters to generate boundaries for the circular form and by the inclusion of a T-overhang adapter to facilitate efficient concatenation of A-tailed SCM small DNA fragments internal to the hairpin adapters. [0019] FIG. 12. SCM ligation workflow options upstream of PTA in ResolveOME multiomic chemistry, according to some embodiments. The SCM ligation modes can include linear concatenation (ligase only) and circularization (hairpin adapter (+ linker) and ligase). The SCM input ligation of the workflow can include ERAT, followed by ligation as two steps, or ERAT + ligation in one step. The ResolveOME section of the workflow can comprise reverse transcription, lysis, PTA, separation of DNA and RNA, followed by PreAmp.
[0020] FIG. 13. Effect of ligation on PTA yield and on segment score of MAD indication of CNV signal to noise, according to some embodiments. The left plot illustrates PTA yield (ng) for NA12878 (left) and Molm (right) with higher yields shown with ligate (“+ ligase”) than without (“- ligase”). The right plot illustrates the segment score of MAD for NA12878 (left) and Molm (right) with lower scores shown with ligate (“+ ligase”) than without ligase (“- ligase”). [0021] FIG. 14. CNV profiles in NA12878 and MOLM-13 cells with alternative linear concatenation ligase workflows prior to PTA, according to some embodiments. The y-axis for the plots include copy numbers with increments of 2 from 0 to 8, while the x-axis is the number of chromosomes from 1 through 22 followed by X and Y. The left plots shows results for NA12878 (with MAD scores of 0.28, 0.29, and 0.39 from top to bottom, respectively) while the left plots show results for MOLM-13 (with MAD scores of 0.56, 0.55, and 0.62 from top to bottom, respectively). The top plots show results from a 2 step 30 minute ligation, the middle plots show results from a 2 step 15 minute ligation, and the bottom plots show results from a 1 step ligation. [0022] FIG. 15. CNV profile improvement upon inclusion of hairpin and linker in SCM ligation scheme indicative of template circularization, according to some embodiments. Results are show for NA12878, and the y-axis for the plots include copy numbers with increments of 2 from 0 to 8, while the x-axis is the number of chromosomes from 1 through 22 followed by X and Y. The top plot illustrates results when the concentration of the hairpin was 0 nM and linker was 0 nM, which had an MAD score of 0.55. The middle and bottom plots illustrate results when the concentration of the hairpin was 400 nM and linker was 400 nM, which had an MAD score of 0.37 and 0.39, respectively.
DETAILED DESCRIPTION
[0023] It can be needed or beneficial to perform a series of reactions, screens, or measurements on short nucleic acid fragments. There are vast applications for such techniques in research and clinical laboratories for various reasons and applications. In some aspects, provided herein are methods and systems for analysis of short nucleic acid fragments such as cfDNA or ctDNA. Such techniques may have vast applications for diagnostics, prognostics, point-of-care diagnostics, personalized medicine, drug discovery, and research.
[0024] Provided herein are systems and methods comprising primary template-directed amplification (PTA), as further described herein. The methods provided herein may generally comprise those that make PTA amenable to short DNA fragments as template. In some embodiments, amplification of short templates can be inherently challenging because the value of PTA can stem from the low propensity of short amplicon re-copying and a concomitant focusing of amplification machinery to the primary template. In some embodiments, by creating PTA amenability to short template inputs, one or more beneficial attributes of PTA (e.g., completeness, uniformity of coverage with allelic balance) for the elucidation of genomic variants can be in principle extended to cell-free (cf) DNA, circulating tumor DNA (ctDNA), or potentially to short DNA fragments damaged by the FFPE process. Provided herein are methods and systems that illustrate the lower limits of template size (e g., ~ 300 bp) for efficient PTA amplifications, according to some embodiments. Further provided herein are ligation strategies to linearly concatenate short DNA fragments (e.g., <300 bp) resulting in PTA template amenability. Additionally, the systems and methods provided herein demonstrate enhanced sensitivity of single nucleotide variant (SNV) calling upon concatenation. In some instances, smaller fragments are joined using a ligase. In some instances, smaller fragments are joined using PCA.
[0025] Further, provided herein are systems and methods to ascertain nucleic acids from spent culture media (SCM) using ligation-mediated methodologies. Free nucleic acid found in cell culture media may be often degraded to the size of a single nucleosome footprint (e.g., -150 bp), while human embryonic samples SCM DNA species can be found, in some instances, to approach 400 bp, likely representing multi-nucleosome stretches Thus, provided herein are systems and methods that extend template length for PTA amenability (e.g., like for ctDNA). In some embodiments, there is relevance to preimplantation genetic testing (PGT) whereby techniques that are non-invasive to the embryo (e.g., a spent culture media aliquot) are desired relative to the standard practice of physical embryo biopsy when ascertaining chromosomal copy number. In some embodiments, enabling PTA of SCM DNA via ligation, in the context of ResolveDNA genomic amplification, or in the case of the genomic amplification arm in DNA and RNA multiomic chemistry (referred to as “ResolveOME”), provides the opportunity to enhance current PGT-A applications. For example, enabling PTA in these context can enhance current PGT-A applications by overcoming the larger input amount requirements for library preparations lacking upstream genomic amplification and/or accommodate inputs on the order of 10s of picograms. Additionally, utilizing PTA for SCM PGT-A can provide the opportunity to improve upon existing methodologies for copy number profiling quality due to the inherent beneficial attributes of PTA.
[0026] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of any embodiment. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0027] Unless specifically stated or obvious from context, as used herein, the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
[0028] The term “nucleic acid” encompasses multi-stranded, as well as single-stranded molecules. In some instances, nucleic acids are short nucleic acids. In some instances, short nucleic acids are 10-200, 20-200, 50-200, 75-200, 100-100, 125-200, 150-200, 100-300, 150- 300, or 200-300 nucleotides in length. In double- or triple-stranded nucleic acids, the nucleic acid strands need not be coextensive (i.e., a double- stranded nucleic acid need not be doublestranded along the entire length of both strands). Nucleic acid templates described herein may be any size depending on the sample (from small cell-free DNA fragments to entire genomes), including but not limited to 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100- 5000 bases, 50-10,000 bases, or 50-2000 bases in length. In some instances, templates are at least 50, 100, 200, 300, 400, 500, 1000, 2000, 5000, 10,000, 20,000 50,000, 100,000, 200,000, 500,000, 1,000,000 or more than 1,000,000 bases in length. In some instances circular templates are 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100-5000 bases, 50-10,000 bases, or 50-2000 bases in length. Circular nucleic acids in some instances comprise at least 1, 2, 3, 4, 5, 7, 10, 12, 15, 20 or more linear nucleic acid fragments which have been joined together. Methods described herein provide for the amplification of nucleic acid acids, such as nucleic acid templates. Methods described herein additionally provide for the generation of isolated and at least partially purified nucleic acids and libraries of nucleic acids. In some instances, methods described herein provide for extracted nucleic acids (e.g., extracted from tissues, cells, or media). In some embodiments, the nucleic acids are extracted from spent culture media (SCM). Nucleic acids include but are not limited to those comprising DNA, RNA, circular RNA, mtDNA (mitochondrial DNA), cfDNA (cell free DNA), cfRNA (cell free RNA), ctDNA (circulating tumor DNA), siRNA (small interfering RNA), cffDNA (cell free fetal DNA), mRNA, tRNA, rRNA, miRNA (microRNA), extrachromosomal DNAs (ecDNAs), synthetic polynucleotides, polynucleotide analogues, any other nucleic acid consistent with the specification, or any combinations thereof. The length of polynucleotides, when provided, are described as the number of bases and abbreviated, such as nt (nucleotides), bp (bases), kb (kilobases), or Gb (gigabases).
[0029] Described herein are nucleic acid amplification methods, such as “Primary Template- Directed Amplification (PTA).” With the PTA method, amplicons are preferentially generated from the primary template (“direct copies”) using a polymerase (e.g., a strand displacing polymerase). Consequently, errors are propagated at a lower rate from daughter amplicons during subsequent amplifications compared to MDA. In some examples, this method can amplify low DNA input including the genomes of single cells with high coverage breadth and uniformity in an accurate and reproducible manner. In some instances, PTA enables kinetic control of an amplification reaction. In some instances, PTA results in a pseudo-linear amplification reaction (rather than exponential amplification). Moreover, the terminated amplification products can undergo direction ligation after removal of the terminators, allowing for the attachment of a cell barcode to the amplification primers so that products from all cells can be pooled after undergoing parallel amplification reactions.
[0030] Described herein are methods employing nucleic acid polymerases with strand displacement activity for amplification. In some instances, such polymerases comprise strand displacement activity and low error rate. In some instances, such polymerases comprise strand displacement activity and proofreading exonuclease activity, such as 3 ’->5’ proofreading activity. In some instances, nucleic acid polymerases are used in conjunction with other components such as reversible or irreversible terminators, or additional strand displacement factors. In some instances, the polymerase has strand displacement activity, but does not have exonuclease proofreading activity. For example, in some instances such polymerases include bacteriophage phi29 (029) polymerase, which also has very low error rate that is the result of the 3’--> 5’ proofreading exonuclease activity (see, e.g., U.S. Pat. Nos. 5,198,543 and 5,001,050). In some instances, examples of strand displacing nucleic acid polymerases include, e.g., genetically modified phi29 (029) DNA polymerase, KI enow Fragment of DNA polymerase I (Jacobsen et al., Eur. J. Biochem. 45:623-627 (1974)), phage M2 DNA polymerase (Matsumoto et al., Gene 84:247 (1989)), phage phiPRDl DNA polymerase (Jung et al., Proc. Natl. Acad. Sci. USA 84:8287 (1987); Zhu and Ito, Biochim. Biophys. Acta. 1219:267-276 (1994)), Bst DNA polymerase (e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal. (Netherlands) 12: 185-195 (1996)), exo(-)Bca DNA polymerase (Walker and Linn, Clinical Chemistry 42:1604-1608 (1996)), Bsu DNA polymerase, VentRDNA polymerase including VentR(exo-) DNA polymerase (Kong et al., J. Biol. Chem. 268:1965-1975 (1993)), Deep Vent DNA polymerase including Deep Vent (exo-) DNA polymerase, IsoPol DNA polymerase, DNA polymerase I, Therminator DNA polymerase, T5 DNA polymerase (Chatterjee et al., Gene 97:13-19 (1991)), Sequenase (U.S. Biochemicals), T7 DNA polymerase, T7-Sequenase, T7 gp5 DNA polymerase, PRDI DNA polymerase, T4 DNA polymerase (Kaboord and Benkovic, Curr. Biol. 5: 149-157 (1995)). Additional strand displacing nucleic acid polymerases are also compatible with the methods described herein. The ability of a given polymerase to carry out strand displacement replication can be determined, for example, by using the polymerase in a strand displacement replication assay (e.g., as disclosed in U.S. Pat. No. 6,977,148). Such assays in some instances are performed at a temperature suitable for optimal activity for the enzyme being used, for example, 32°C for phi29 DNA polymerase, from 46°C to 64°C for exo(-) Bst DNA polymerase, or from about 60°C to 70°C for an enzyme from a hyperthermophylic organism. Another useful assay for selecting a polymerase is the primerblock assay described in Kong et al., J. Biol. Chem. 268:1965-1975 (1993). The assay consists of a primer extension assay using an M13 ssDNA template in the presence or absence of an oligonucleotide that is hybridized upstream of the extending primer to block its progress. Other enzymes capable of displacement the blocking primer in this assay are in some instances useful for the disclosed method. In some instances, polymerases incorporate dNTPs and terminators at approximately equal rates. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are about 1 : 1, about 1.5:1, about 2: 1, about 3 : 1 about 4:l about 5:l, about 10:1, about 20:1 about 50:1, about 100: 1, about 200: 1, about 500: 1, or about 1000: 1. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are 1 :1 to 1000: 1, 2:1 to 500: 1, 5: 1 to 100:1, 10:1 to 1000:1, 100: 1 to 1000: 1, 500: 1 to 2000: 1, 50: 1 to 1500: 1, or 25:1 to 1000:1.
[0031] A polynucleotide mixture used herein for PTA may comprise dNTPs. In some instances, dNTPs comprise one or more of dA, dT, dG, and dC. In some instances, the concentration of dNTPs is no more than 10, 8, 7, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 mM. In some instances, the concentration of dNTPs is 0.5-10, 0.5-5, 0.5-3, 0.5-2.5, 0.5-2, 0.5-1.5, 0.5-1, 0.1-5, 0.1-3, 0.1-3, 1-3, 0.5-2.5, or 1-2 mM. Such mixtures in some instances also comprise one or more terminators.
[0032] A polynucleotide mixture used herein for PTA may comprise terminators. In some instances, terminators comprise ddNTPs. In some instances, terminators comprise irreversible terminators. In some instances, irreversible terminators comprise alpha-thio dideoxynucleotides. In some instances, the concentration of terminators is no more than 1, 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, or no more than 0.001 mM. In some instances, the concentration of dNTPs is 0.05-1, 0.05-0.5, 0.05-0.3, 0.05-0.25, 0.05-0.2, 0.05-0.15, 0.05-0.1, 0.01-0.5, 0.01-0.3, 0.01-0.3, 0.1-0.3, 0.05-0.25, or 0.1-0.2 mM.
[0033] Described herein are methods of amplification wherein strand displacement can be facilitated through the use of a strand displacement factor, such as, e.g., helicase. Such factors are in some instances used in conjunction with additional amplification components, such as polymerases, terminators, or other component. In some instances, a strand displacement factor is used with a polymerase that does not have strand displacement activity. In some instances, a strand displacement factor is used with a polymerase having strand displacement activity. Without being bound by theory, strand displacement factors may increase the rate that smaller, double stranded amplicons are reprimed. In some instances, any DNA polymerase that can perform strand displacement replication in the presence of a strand displacement factor is suitable for use in the PTA method, even if the DNA polymerase does not perform strand displacement replication in the absence of such a factor. Strand displacement factors useful in strand displacement replication in some instances include (but are not limited to) BMRF1 polymerase accessory subunit (Tsurumi et al., J. Virology 67(12):7648-7653 (1993)), adenovirus DNA-binding protein (Zijderveld and van der Vliet, J. Virology 68(2): 1158-1164 (1994)), herpes simplex viral protein ICP8 (Boehmer and Lehman, J. Virology 67(2):711-715 (1993);
Skaliter and Lehman, Proc. Natl. Acad. Sci. USA 91(22): 10665-10669 (1994)); single-stranded DNA binding proteins (SSB; Rigler and Romano, J. Biol. Chem. 270:8910-8919 (1995)); phage T4 gene 32 protein (Villemain and Giedroc, Biochemistry 35:14395-14404 (1996);T7 helicase- primase; T7 gp2.5 SSB protein; Tte-UvrD (from Thermoanaerobacter tengcongensis), calf thymus helicase (Siegel et al., J. Biol. Chem. 267:13629-13635 (1992)); bacterial SSB (e.g., A. coli SSB), Replication Protein A (RPA) in eukaryotes, human mitochondrial SSB (mtSSB), and recombinases, (e.g., Recombinase A (RecA) family proteins, T4 UvsX, T4 UvsY, Sak4 of Phage HK620, Rad51, Dmcl, or Radb). Combinations of factors that facilitate strand displacement and priming are also consistent with the methods described herein. For example, a helicase is used in conjunction with a polymerase. In some instances, the PTA method comprises use of a singlestrand DNA binding protein (SSB, T4 gp32, or other single stranded DNA binding protein), a helicase, and a polymerase (e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase). In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, amplification is conducted using a polymerase and a nicking enzyme (e.g., “NEAR”), such as those described in US 9,617,586. In some instances, the nicking enzyme is Nt.BspQI, Nb.BbvCi, Nb.BsmI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nt.BbvCI, Nt.BstNBI, Nt.CviPII, Nb.BpulOI, orNt.BpulOI.
[0034] Described herein are amplification methods comprising use of terminator nucleotides, polymerases, and additional factors or conditions. For example, such factors are used in some instances to fragment the nucleic acid template(s) or amplicons during amplification. In some instances, such factors comprise endonucleases. In some instances, factors comprise transposases. In some instances, mechanical shearing is used to fragment nucleic acids during amplification. In some instances, nucleotides are added during amplification that may be fragmented through the addition of additional proteins or conditions. For example, uracil is incorporated into amplicons; treatment with uracil D-glycosylase fragments nucleic acids at uracil-containing positions. Additional systems for selective nucleic acid fragmentation are also in some instances employed, for example an engineered DNA glycosylase that cleaves modified cytosine-pyrene base pairs. (Kwon, et al. Chem Biol. 2003, 10(4), 351)
[0035] Described herein are amplification methods comprising use of terminator nucleotides, which terminate nucleic acid replication thus decreasing the size of the amplification products. Such terminators are in some instances used in conjunction with polymerases, strand displacement factors, or other amplification components described herein. In some instances, terminator nucleotides reduce or lower the efficiency of nucleic acid replication. Such terminators in some instances reduce extension rates by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Such terminators in some instances reduce extension rates by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%- 80%. In some instances, terminators reduce the average amplicon product length by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Terminators in some instances reduce the average amplicon length by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%-80%. In some instances, amplicons comprising terminator nucleotides form loops or hairpins which reduce a polymerase’s ability to use such amplicons as templates. Use of terminators in some instances slows the rate of amplification at initial amplification sites through the incorporation of terminator nucleotides (e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension), resulting in smaller amplification products. By producing smaller amplification products than the currently used methods (e g., average length of 50-2000 nucleotides in length for PTA methods as compared to an average product length of >10,000 nucleotides for MDA methods) PTA amplification products in some instances undergo direct ligation of adapters without the need for fragmentation, allowing for efficient incorporation of cell barcodes and unique molecular identifiers (UMI).
[0036] Terminator nucleotides are present at various concentrations depending on factors such as polymerase, template, or other factors. For example, the amount of terminator nucleotides in some instances is expressed as a ratio of non-terminator nucleotides to terminator nucleotides in a method described herein. Such concentrations in some instances allow control of amplicon lengths. In some instances, the ratio of terminator to non-terminator nucleotides is modified for the amount of template present or the size of the template. In some instances, the ratio of ratio of terminator to non-terminator nucleotides is reduced for smaller samples sizes (e.g., femtogram to picogram range). In some instances, the ratio of non-terminator to terminator nucleotides is about 2:l, 5: 1, 7:1, 10:1, 20: 1, 50:1, 100:1, 200:1, 500:1, 1000:1, 2000: 1, or 5000: 1. In some instances the ratio of non-terminator to terminator nucleotides is 2:1-10:1, 5: 1-20:1, 10: 1-100:1, 20:1-200:1, 50: 1-1000:1, 50:1-500: 1, 75:1-150: 1, or 100: 1-500:1. In some instances, at least one of the nucleotides present during amplification using a method described herein is a terminator nucleotide. Each terminator need not be present at approximately the same concentration; in some instances, ratios of each terminator present in a method described herein are optimized for a particular set of reaction conditions, sample type, or polymerase. Without being bound by theory, each terminator may possess a different efficiency for incorporation into the growing polynucleotide chain of an amplicon, in response to pairing with the corresponding nucleotide on the template strand. For example, in some instances a terminator pairing with cytosine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with thymine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with guanine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with adenine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances a terminator pairing with uracil is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. Any nucleotide capable of terminating nucleic acid extension by a nucleic acid polymerase in some instances is used as a terminator nucleotide in the methods described herein. In some instances, a reversible terminator is used to terminate nucleic acid replication. In some instances, a non-reversible terminator is used to terminate nucleic acid replication. In some instances, non-limited examples of terminators include reversible and non-reversible nucleic acids and nucleic acid analogs, such as, e.g., 3’ blocked reversible terminator comprising nucleotides, 3’ unblocked reversible terminator comprising nucleotides, terminators comprising 2’ modifications of deoxynucleotides, terminators comprising modifications to the nitrogenous base of deoxynucleotides, or any combination thereof. In one embodiment, terminator nucleotides are dideoxynucleotides. Other nucleotide modifications that terminate nucleic acid replication and may be suitable for practicing the invention include, without limitation, any modifications of the r group of the 3’ carbon of the deoxyribose such as inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. In some instances, terminators are polynucleotides comprising 1, 2, 3, 4, or more bases in length. In some instances, terminators do not comprise a detectable moiety or tag (e.g., mass tag, fluorescent tag, dye, radioactive atom, or other detectable moiety). In some instances, terminators do not comprise a chemical moiety allowing for attachment of a detectable moiety or tag (e.g., “click” azide/alkyne, conjugate addition partner, or other chemical handle for attachment of a tag). In some instances, all terminator nucleotides comprise the same modification that reduces amplification to at region (e.g., the sugar moiety, base moiety, or phosphate moiety) of the nucleotide. In some instances, at least one terminator has a different modification that reduces amplification. In some instances, all terminators have a substantially similar fluorescent excitation or emission wavelengths. In some instances, terminators without modification to the phosphate group are used with polymerases that do not have exonuclease proofreading activity. Terminators, when used with polymerases which have 3’->5’ proofreading exonuclease activity (such as, e.g., phi29) that can remove the terminator nucleotide, are in some instances further modified to make them exonucl ease-resistant. For example, dideoxynucleotides are modified with an alpha-thio group that creates a phosphorothioate linkage which makes these nucleotides resistant to the 3 ’->5’ proofreading exonuclease activity of nucleic acid polymerases. Such modifications in some instances reduce the exonuclease proofreading activity of polymerases by at least 99.5%, 99%, 98%, 95%, 90%, or at least 85%. Examples of other terminator nucleotide modifications providing resistance to the 3 ’->5’ exonuclease activity include in some instances: nucleotides with modification to the alpha group, such as alpha-thio dideoxynucleotides creating a phosphorothioate bond, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' Fluoro bases, 3' phosphorylation, 2'-O-Methyl modifications (or other 2’-O-alkyl modification), propyne-modified bases (e.g., deoxycytosine, deoxyuridine), L-DNA nucleotides, L-RNA nucleotides, nucleotides with inverted linkages (e g., 5’ -5’ or 3’-3’), 5’ inverted bases (e.g., 5’ inverted 2’,3’-dideoxy dT), methylphosphonate backbones, and trans nucleic acids. In some instances, nucleotides with modification include base-modified nucleic acids comprising free 3’ OH groups (e.g., 2-nitrobenzyl alkylated HOMedU triphosphates, bases comprising modification with large chemical groups, such as beads or other large moiety). In some instances, a polymerase with strand displacement activity but without 3 ’->5 ’exonuclease proofreading activity is used with terminator nucleotides with or without modifications to make them exonuclease resistant. Such nucleic acid polymerases include, without limitation, Bst DNA polymerase, Bsu DNA polymerase, Deep Vent (exo-) DNA polymerase, Klenow Fragment (exo-) DNA polymerase, Therminator DNA polymerase, and VentR(exo-).
[0037] Described herein are amplicon libraries resulting from amplification of at least one target nucleic acid molecule. Such libraries are in some instances generated using the methods described herein, such as those using terminators. Such methods comprise use of strand displacement polymerases or factors, terminator nucleotides (reversible or irreversible), or other features and embodiments described herein. In some instances, reversible terminators are capable of removal by an exonuclease (e g., or polymerase having exonuclease activity). In some instances, irreversible terminators are not capable of substantial removal by an exonuclease (e.g., or polymerase having exonuclease activity). In some instances, amplicon libraries generated by use of terminators described herein are further amplified in a subsequent amplification reaction (e.g., PCR). In some instances, subsequent amplification reactions do not comprise terminators. In some instances, amplicon libraries comprise polynucleotides, wherein at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% of the polynucleotides comprise at least one terminator nucleotide. In some instances, the amplicon library comprises the target nucleic acid molecule from which the amplicon library was derived. The amplicon library comprises a plurality of polynucleotides, wherein at least some of the polynucleotides are direct copies (e.g., replicated directly from a target nucleic acid molecule, such as genomic DNA, RNA, or other target nucleic acid). For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 15% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 50% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, 3%-5%, 3-10%, 5%- 10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least some of the polynucleotides are direct copies of the target nucleic acid molecule, or daughter (a first copy of the target nucleic acid) progeny. For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 30% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, 3%-5%, 3%-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, direct copies of the target nucleic acid are 50- 2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length. In some instances, daughter progeny are 1000-5000, 2000-5000, 1000-10,000, 2000-5000, 1500-5000, 3000-7000, or 2000-7000 bases in length. In some instances, the average length of PTA amplification products is 25-3000 nucleotides in length, 50-2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length. In some instance, amplicons generated from PTA are no more than 5000, 4000, 3000, 2000, 1700, 1500, 1200, 1000, 700, 500, or no more than 300 bases in length. In some instance, amplicons generated from PTA are 1000-5000, 1000-3000, 200-2000, 200-4000, 500-2000, 750-2500, or 1000-2000 bases in length. Amplicon libraries generated using the methods described herein in some instances comprise at least 1000, 2000, 5000, 10,000, 100,000, 200,000, 500,000 or more than 500,000 amplicons comprising unique sequences. In some instances, the library comprises at least 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 2000, 2500, 3000, or at least 3500 amplicons. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of less than 1000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 1 %, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of no more than 2000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of 3000-5000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10:1, 100: 1, 1000: 1, 10,000:1, 100,000:1, 1,000,000:1, 10,000,000:1, or more than 10,000,000: 1. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10:1, 100:1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000:1, 10,000,000: 1, or more than 10,000,000:1, wherein the direct copy amplicons are no more than 700-1200 bases in length. In some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10:1, 100:1, 1000:1, 10,000:1, 100,000: 1, 1,000,000: 1, 10,000,000:1, or more than 10,000,000: 1. In some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000:1, 10,000,000:1, or more than 10,000,000:1, wherein the direct copy amplicons are 700-1200 bases in length, and the daughter amplicons are 2500-6000 bases in length. In some instances, the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50- 1000, about 150-2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule. In some instances, the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50-1000, about 150- 2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule or daughter amplicons. The number of direct copies may be controlled in some instances by the number of amplification cycles. In some instances, no more than 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, about 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or about 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, 3, 4, 5, 6, 7, or 8 cycles are used to generate copies of the target nucleic acid molecule. In some instances, 2-4, 2-5, 2-7, 2-8, 2-10, 2-15, 3-5, 3-10, 3-15, 4-10, 4-15, 5-10 or 5-15 cycles are used to generate copies of the target nucleic acid molecule. Amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further amplification. In some instances, such additional steps precede a sequencing step. In some instances, the cycles are PCR cycles. In some instances, the cycles represent annealing, extension, and denaturation. In some instances, the cycles represent annealing, extension, and denaturation which occur under isothermal or essentially isothermal conditions.
[0038] Methods described herein may additionally comprise one or more enrichment or purification steps. In some instances, one or more polynucleotides (such as cDNA, PTA amplicons, or other polynucleotide) are enriched during a method described herein. In some instances, polynucleotide probes are used to capture one or more polynucleotides. In some instances, probes are configured to capture one or more genomic exons. In some instances, a library of probes comprises at least 1000, 2000, 5000, 10,000, 50,000, 100,000, 200,000, 500,000, or more than 1 million different sequences. In some instances, a library of probes comprises sequences capable of binding to at least 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000 or more than 10,000 genes. In some instances, probes comprise a moiety for capture by a bead, such as biotin. In some instances, an enrichment step occurs after a PTA step. In some instances, an enrichment step occurs before a PTA step. In some instances, probes are configured to bind genomic DNA libraries. In some instances, probes are configured to bind cDNA libraries.
[0039] Amplicon libraries of polynucleotides generated from the PTA methods and compositions (terminators, polymerases, etc.) described herein in some instances have increased uniformity. Uniformity, in some instances, is described using a Lorenz curve, or other such method. Such increases in some instances lead to lower sequencing reads needed for the desired coverage of a target nucleic acid molecule (e g., genomic DNA, RNA, or other target nucleic acid molecule). For example, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 80% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 60% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 70% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 90% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, uniformity is described using a Gini index (wherein an index of 0 represents perfect equality of the library and an index of 1 represents perfect inequality). In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, 0.50, 0.45, 0.40, or 0.30. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50. In some instances, amplicon libraries described herein have a Gini index of no more than 0.40. Such uniformity metrics in some instances are dependent on the number of reads obtained. For example, no more than 100 million, 200 million, 300 million, 400 million, or no more than 500 million reads are obtained. In some instances, the read length is about 50,75, 100, 125, 150, 175, 200, 225, or about 250 bases in length. In some instances, uniformity metrics are dependent on the depth of coverage of a target nucleic acid. For example, the average depth of coverage is about 10X, 15X, 20X, 25X, or about 30X. In some instances, the average depth of coverage is 10-30X, 20-50X, 5-40X, 20-60X, 5-20X, or 10-20X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is no more than 15X. Uniform amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further PCR amplification. In some instances, such additional steps precede a sequencing step.
[0040] Primers comprise nucleic acids used for priming the amplification reactions described herein. Such primers in some instances include, without limitation, random deoxynucleotides of any length with or without modifications to make them exonuclease resistant, random ribonucleotides of any length with or without modifications to make them exonuclease resistant, modified nucleic acids such as locked nucleic acids, DNA or RNA primers that are targeted to a specific genomic region, and reactions that are primed with enzymes such as primase. In the case of whole genome PTA, it is preferred that a set of primers having random or partially random nucleotide sequences be used. In a nucleic acid sample of significant complexity, specific nucleic acid sequences present in the sample need not be known and the primers need not be designed to be complementary to any particular sequence. Rather, the complexity of the nucleic acid sample results in a large number of different hybridization target sequences in the sample, which will be complementary to various primers of random or partially random sequence. The complementary portion of primers for use in PTA are in some instances fully randomized, comprise only a portion that is randomized, or be otherwise selectively randomized. The number of random base positions in the complementary portion of primers in some instances, for example, is from 20% to 100% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is 10% to 90%, 15-95%, 20%-100%, 30%-100%, 50%- 100%, 75-100% or 90-95% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or at least 90% of the total number of nucleotides in the complementary portion of the primers. Sets of primers having random or partially random sequences are in some instances synthesized using standard techniques by allowing the addition of any nucleotide at each position to be randomized. In some instances, sets of primers are composed of primers of similar length and/or hybridization characteristics. In some instances, the term "random primer” refers to a primer which can exhibit four-fold degeneracy at each position. In some instances, the term "random primer” refers to a primer which can exhibit three-fold degeneracy at each position. Random primers used in the methods described herein in some instances comprise a random sequence that is 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more bases in length. In some instances, primers comprise random sequences that are 3-20, 5-15, 5-20, 6-12, or 4-10 bases in length. Primers may also comprise non-extendable elements that limit subsequent amplification of amplicons generated thereof. For example, primers with non-extendable elements in some instances comprise terminators. In some instances, primers comprise terminator nucleotides, such as 1, 2, 3, 4, 5, 10, or more than 10 terminator nucleotides. Primers need not be limited to components which are added externally to an amplification reaction. In some instances, primers are generated in-si tu through the addition of nucleotides and proteins which promote priming. For example, primase-like enzymes in combination with nucleotides is in some instances used to generate random primers for the methods described herein. Primase-like enzymes in some instances are members of the DnaG or AEP enzyme superfamily. In some instances, a primase- like enzyme is TthPrimPol. In some instances, a primase-like enzyme is T7 gp4 helicase- primase. Such primases are in some instances used with the polymerases or strand displacement factors described herein. In some instances, primases initiate priming with deoxyribonucleotides. In some instances, primases initiate priming with ribonucleotides. In some instances, primers are irreversible primers. In some instances, irreversible primers comprise phosphonothioate linkages.
[0041] The PTA amplification can be followed by selection for a specific subset of amplicons. Such selections are in some instances dependent on size, affinity, activity, hybridization to probes, or other known selection factor in the art. In some instances, selections precede or follow additional steps described herein, such as adapter ligation and/or library amplification. In some instances, selections are based on size (length) of the amplicons. In some instances, smaller amplicons are selected that are less likely to have undergone exponential amplification, which enriches for products that were derived from the primary template while further converting the amplification from an exponential into a quasi-linear amplification process. In some instances, amplicons comprising 50-2000, 25-5000, 40-3000, 50-1000, 200-1000, 300- 1000, 400-1000, 400-600, 600-2000, or 800-1000 bases in length are selected. Size selection in some instances occurs with the use of protocols, e.g., utilizing solid-phase reversible immobilization (SPRI) on carboxylated paramagnetic beads to enrich for nucleic acid fragments of specific sizes, or other protocol known by those skilled in the art. Optionally or in combination, selection occurs through preferential ligation and amplification of smaller fragments during PCR while preparing sequencing libraries, as well as a result of the preferential formation of clusters from smaller sequencing library fragments during sequencing (e.g., sequencing by synthesis, nanopore sequencing, or other sequencing method). Other strategies to select for smaller fragments are also consistent with the methods described herein and include, without limitation, isolating nucleic acid fragments of specific sizes after gel electrophoresis, the use of silica columns that bind nucleic acid fragments of specific sizes, and the use of other PCR strategies that more strongly enrich for smaller fragments. Any number of library preparation protocols may be used with the PTA methods described herein. Amplicons generated by PTA are in some instances ligated to adapters (optionally with removal of terminator nucleotides). In some instances, amplicons generated by PTA comprise regions of homology generated from transposase-based fragmentation which are used as priming sites In some instances, libraries are prepared by fragmenting nucleic acids mechanically or enzymatically. In some instances, libraries are prepared using tagmentation via transposomes. In some instances, libraries are prepared via ligation of adapters, such as Y-adapters, universal adapters, or circular adapters. [0042] The non-complementary portion of a primer used in PTA can include sequences which can be used to further manipulate and/or analyze amplified sequences. An example of such a sequence is a “detection tag.” Detection tags have sequences complementary to detection probes and are detected using their cognate detection probes. There may be one, two, three, four, or more than four detection tags on a primer. There is no fundamental limit to the number of detection tags that can be present on a primer except the size of the primer. In some instances, there is a single detection tag on a primer. In some instances, there are two detection tags on a primer. When there are multiple detection tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different detection probe. In some instances, multiple detection tags have the same sequence. In some instances, multiple detection tags have a different sequence.
[0043] Another example of a sequence that can be included in the non-complementary portion of a primer is an “address tag” that can encode other details of the amplicons, such as the location in a tissue section. In some instances, a cell barcode comprises an address tag. An address tag has a sequence complementary to an address probe. Address tags become incorporated at the ends of amplified strands. If present, there may be one, or more than one, address tag on a primer. There is no fundamental limit to the number of address tags that can be present on a primer except the size of the primer. When there are multiple address tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different address probe. The address tag portion can be any length that supports specific and stable hybridization between the address tag and the address probe. In some instances, nucleic acids from more than one source can incorporate a variable tag sequence. This tag sequence can be up to 100 nucleotides in length, preferably 1 to 10 nucleotides in length, most preferably 4, 5 or 6 nucleotides in length and comprises combinations of nucleotides. In some instances, a tag sequence is 1-20, 2-15, 3-13, 4-12, 5-12, or 1-10 nucleotides in length For example, if six base-pairs are chosen to form the tag and a permutation of four different nucleotides is used, then a total of 4096 nucleic acid anchors (e g. hairpins), each with a unique 6 base tag can be made. In some instances, tags identify the source of a sample or analyte. In some instances, tags uniquely identify every molecule in a population. [0044] Primers described herein may be present in solution or immobilized on a bead. In some instances, primers bearing sample barcodes and/or UMI sequences can be immobilized on a bead In some instances, individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some instances, lysates from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates. In some instances, extracted nucleic acid from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell. The beads can be manipulated in any suitable manner as is known in the art, for example, using droplet actuators as described herein. The beads may be any suitable size, including for example, microbeads, microparticles, nanobeads and nanoparticles. In some embodiments, beads are magnetically responsive; in other embodiments beads are not significantly magnetically responsive. Examples of suitable beads include flow cytometry microbeads, polystyrene microparticles and nanoparticles, functionalized polystyrene microparticles and nanoparticles, coated polystyrene microparticles and nanoparticles, silica microbeads, fluorescent microspheres and nanospheres, functionalized fluorescent microspheres and nanospheres, coated fluorescent microspheres and nanospheres, color dyed microparticles and nanoparticles, magnetic microparticles and nanoparticles, superparamagnetic microparticles and nanoparticles (e.g., DYNABEADS® available from Invitrogen Group, Carlsbad, CA), fluorescent microparticles and nanoparticles, coated magnetic microparticles and nanoparticles, ferromagnetic microparticles and nanoparticles, coated ferromagnetic microparticles and nanoparticles, and those described in U.S. Pat. Appl. Pub. No. US20050260686, US20030132538, US20050118574, 20050277197, 20060159962. Beads may be pre-coupled with an antibody, protein or antigen, DNA/RNA probe or any other molecule with an affinity for a desired target. In some embodiments, primers bearing sample barcodes and/or UMI sequences can be in solution. In certain embodiments, a plurality of droplets can be presented, wherein each droplet in the plurality bears a sample barcode which is unique to a droplet and the UMI which is unique to a molecule such that the UMI are repeated many times within a collection of droplets. In some embodiments, individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some embodiments, lysates from individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates. In some embodiments, extracted nucleic acid from individual cells are contacted with a droplet having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell. [0045] PTA primers may comprise a sequence-specific or random primer, a cell barcode and/or a unique molecular identifier (UMI) (e g., linear primer and or hairpin primer). In some instances, the PTA primer comprises a hairpin primer as shown for example in Figure 11. In some instances, a primer comprises a sequence-specific primer. In some instances, a primer comprises a random primer. In some instances, a primer comprises a cell barcode. In some instances, a primer comprises a sample barcode. In some instances, a primer comprises a unique molecular identifier. In some instances, primers comprise two or more cell barcodes. Such barcodes in some instances identify a unique sample source, or unique workflow. Such barcodes or UMIs are in some instances 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 25, 30, or more than 30 bases in length. Primers in some instances comprise at least 1000, 10,000, 50,000, 100,000, 250,000, 500,000, 106, 107, 108, 109, or at least 1010 unique barcodes or UMIs. In some instances, primers comprise at least 8, 16, 96, or 384 unique barcodes or UMIs. In some instances, a standard adapter is then ligated onto the amplification products prior to sequencing; after sequencing, reads are first assigned to a specific cell based on the cell barcode. Suitable adapters that may be utilized with the PTA method include, e.g., xGen® Dual Index UMI adapters available from Integrated DNA Technologies (IDT). Reads from each cell is then grouped using the UMI and reads with the same UMI may be collapsed into a consensus read. The use of a cell barcode allows all cells to be pooled prior to library preparation, as they can later be identified by the cell barcode. The use of the UMI to form a consensus read in some instances corrects for PCR bias, improving the copy number variation (CNV) detection. In addition, sequencing errors may be corrected by requiring that a fixed percentage of reads from the same molecule have the same base change detected at each position. This approach has been utilized to improve CNV detection and correct sequencing errors in bulk samples. In some instances, UMIs are used with the methods described herein, for example, U.S Pat. No. 8,835,358 discloses the principle of digital counting after attaching a random amplifiable barcode. Schmitt, et al and Fan et al. disclose similar methods of correcting sequencing errors. In some instances, a library is generated for sequencing using primers. In some instances, the library comprises fragments of 200-700 bases, 100-1000, 300-800, 300-550, 300-700, or 200-800 bases in length. In some instances, the library comprises fragments of at least 50, 100, 150, 200, 300, 500, 600, 700, 800, or at least 1000 bases in length. In some instances, the library comprises fragments of about 50, 100, 150, 200, 300, 500, 600, 700, 800, or about 1000 bases in length.
[0046] The methods described herein may further comprise additional steps, including steps performed on the sample or template. Such samples or templates in some instances are subjected to one or more steps prior to PTA. In some instances, samples comprising cells are subjected to a pre-treatment step. For example, cells undergo lysis and proteolysis to increase chromatin accessibility using a combination of freeze-thawing, Triton X-100, Tween 20, and Proteinase K. Other lysis strategies may also be suitable for practicing the methods described herein. Such strategies include, without limitation, lysis using other combinations of detergent and/or lysozyme and/or protease treatment and/or physical disruption of cells such as sonication and/or alkaline lysis and/or hypotonic lysis. In some instances, the primary template or target molecule(s) is subjected to a pre-treatment step. In some instances, the primary template (or target) is denatured using sodium hydroxide, followed by neutralization of the solution. Other denaturing strategies may also be suitable for practicing the methods described herein. Such strategies may include, without limitation, combinations of alkaline lysis with other basic solutions, increasing the temperature of the sample and/or altering the salt concentration in the sample, addition of additives such as solvents or oils, other modification, or any combination thereof. In some instances, additional steps include sorting, filtering, or isolating samples, templates, or amplicons by size. In some instances, cells are lysed with mechanical (e.g., high pressure homogenizer, bead milling) or non-mechanical (physical, chemical, or biological). In some instances, physical lysis methods comprise heating, osmotic shock, and/or cavitation. In some instances, chemical lysis comprises alkali and/or detergents. In some instances, biological lysis comprises use of enzymes. Combinations of lysis methods are also compatible with the methods described herein. Non-limited examples of lysis enzymes include recombinant lysozyme, serine proteases, and bacterial lysins. In some instances, lysis with enzymes comprises use of lysozyme, lysostaphin, zymolase, cellulose, protease or glycanase. For example, after amplification with the methods described herein, amplicon libraries are enriched for amplicons having a desired length. In some instances, amplicon libraries are enriched for amplicons having a length of 50-2000, 25-1000, 50-1000, 75-2000, 100-3000, 150-500, 75-250, 170-500, 100-500, or 75-2000 bases. In some instances, amplicon libraries are enriched for amplicons having a length no more than 75, 100, 150, 200, 500, 750, 1000, 2000, 5000, or no more than 10,000 bases. In some instances, amplicon libraries are enriched for amplicons having a length of at least 25, 50, 75, 100, 150, 200, 500, 750, 1000, or at least 2000 bases.
[0047] Methods and compositions described herein may comprise buffers or other formulations. Such buffers are in some instances used for PTA, RT, or other method described herein. Such buffers in some instances comprise surfactants/detergent or denaturing agents (Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant), salts (potassium or sodium phosphate (monobasic or dibasic), sodium chloride, potassium chloride, TrisHCl, magnesium chloride or sulfate, Ammonium salts such as phosphate, nitrate, or sulfate, EDTA), reducing agents (DTT, THP, DTE, beta-mercaptoethanol, TCEP, or other reducing agent) or other components (glycerol, hydrophilic polymers such as PEG). In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. Buffers may comprise one or more crowding agents. In some instances, crowding reagents include polymers. In some instances, crowding reagents comprise polymers such as polyols. In some instances, crowding reagents comprise polyethylene glycol polymers (PEG). In some instances, crowding reagents comprise polysaccharides. Without limitation, examples of crowding reagents include ficoll (e.g., ficoll PM 400, ficoll PM 70, or other molecular weight ficoll), PEG (e.g., PEG1000, PEG 2000, PEG4000, PEG6000, PEG8000, or other molecular weight PEG), dextran (dextran 6, dextran 10, dextran 40, dextran 70, dextran 6000, dextran 138k, or other molecular weight dextran). [0048] The nucleic acid molecules amplified according to the methods described herein may be sequenced and analyzed using methods known to those of skill in the art. Examples of the sequencing methods which in some instances are used include, e.g., sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309: 1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (Int. Pat. Appl. Pub. No. W02006/073504), multiplex sequencing (U.S. Pat. Appl. Pub. No. US2008/0269068; Porreca et al., 2007, Nat. Methods 4:931), polymerized colony (POLONY) sequencing (U.S. Patent Nos. 6,432,360, 6,485,944 and 6,511,803, and Int. Pat. Appl. Pub. No. W02005/082098), nanogrid rolling circle sequencing (ROLONY) (U.S. Pat. No. 9,624,538), allele-specific oligo ligation assays (e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout), high- throughput sequencing methods such as, e.g., methods using Roche 454, Illumina Solexa, AB- SOLiD, Helicos, Polonator platforms and the like, and light-based sequencing technologies (Landegren et al. (1998) Genome Res. 8:769-76; Kwok (2000) Pharmacogenomics 1:95-100; and Shi (2001) Clin. Chem.47: 164-172). In some instances, the amplified nucleic acid molecules are shotgun sequenced. Sequencing of the sequencing library is in some instances performed with any appropriate sequencing technology, including but not limited to single-molecule realtime (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis (array/colony-based or nanoball based).
[0049] Sequencing libraries generated using the methods described herein (e g , PTA or RNAseq) may be sequenced to obtain a desired number of sequencing reads. In some instances, libraries are generated from a single cell or sample comprising a single cell (alone or part of a multiomics workflow). In some instances, libraries are sequenced to obtain at least 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or at least 10 million reads. In some instances, libraries are sequenced to obtain no more than 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or no more than 10 million reads. In some instances, libraries are sequenced to obtain about 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or about 10 million reads. In some instances, libraries are sequenced to obtain 0.1-10, 0.1-5, 0.1-1, 0.2-1, 0.3-1.5, 0.5-1, 1-5, or 0.5-5 million reads per sample. In some instances, the number of reads is dependent on the size of the genome. In some in instances samples comprising bacterial genomes are sequenced to obtain 0.5-1 million reads. In some instances, libraries are sequenced to obtain at least 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or at least 900 million reads. In some instances, libraries are sequenced to obtain no more than 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or no more than 900 million reads. In some instances, libraries are sequenced to obtain about 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or about 900 million reads. In some in instances samples comprising mammalian genomes are sequenced to obtain 500-600 million reads. In some instances, the type of sequencing library (cDNA libraries or genomic libraries) are identified during sequencing. In some instances, cDNA libraries and genomic libraries are identified during sequencing with unique barcodes. [0050] The term “cycle” when used in reference to a polymerase-mediated amplification reaction is used herein to describe steps of dissociation of at least a portion of a double stranded nucleic acid (e.g., a template from an amplicon, or a double stranded template, denaturation), hybridization of at least a portion of a primer to a template (annealing), and extension of the primer to generate an amplicon. In some instances, the temperature remains constant during a cycle of amplification (e.g., an isothermal reaction). In some instances, the number of cycles is directly correlated with the number of amplicons produced. In some instances, the number of cycles for an isothermal reaction is controlled by the amount of time the reaction is allowed to proceed.
[0051] Spent Culture Media Analysis
[0052] Described herein are methods and compositions for analyzing nucleic acids from cell culture media, such as spent culture media (SCM). In some instances, DNA, RNA, and/or proteins from the SCM are analyzed in parallel. The analysis may include identification of epigenetic post-translational (e.g., glycosylation, phosphorylation, acetylation, ubiquination, histone modification) and/or post-transcriptional (e.g., methylation, hydroxymethylation) modifications. Such methods may comprise PTA to obtain libraries of nucleic acids for sequencing. In some instances, PTA is combined with additional steps or methods such as RT- PCR or proteome/protein quantification techniques (e g., mass spectrometry, antibody staining, etc.). In some instances, various components of a SCM are physically or spatially separated from each other during individual analysis steps.
[0053] For example, a workflow in some instances comprises labeling proteins with antibodies. In some instances, at least some of the antibodies comprise a tag or marker (e.g., nucleic acid/oligo tag, mass tag, or fluorescent, tag). In some instances, a portion of the antibodies comprise an oligo tag. In some instances, a portion of the antibodies comprise a fluorescent marker. In some instances antibodies are labeled by two or more tags or markers. In some instances, a portion of the antibodies are sorted based on fluorescent markers. After RT-PCR, first strand mRNA products are generated and then removed for analysis. Libraries are then generated from RT-PCR products and barcodes present on protein-specific antibodies, which are subsequently sequenced. In parallel, genomic DNA from the SCM is subjected to PTA, a library generated, and sequenced. Sequencing results from the genome, proteome, and transcriptome are in some instances pooled using bioinformatics methods. Methods described herein in some instances comprise any combination of labeling, cell sorting, affinity separation/purification, lysing of specific cell components (e.g., outer membrane, nucleus, etc.), RNA amplification, DNA amplification (e.g., PTA), or other step associated with protein, RNA, or DNA isolation or analysis. In some instances, methods described herein comprise one or more enrichment steps, such as exome enrichment.
[0054] In some embodiments, the analysis comprises analysis of RNA and DNA from a cell culture media (e.g., SCM). The cell culture media may be obtained by culturing one or more cells (e.g., embryonic cells) or using a cell line proxy system, as shown for example in Figure 10. In some instances, the method comprises isolation of single cells, lysis of single cells, and reverse transcription (RT). In some instances, reverse transcription is carried out with template switching oligonucleotides (TSOs). In some instances, TSOs comprise a molecular TAG such as biotin, which allows subsequent pull-down of cDNA RT products, and PCR amplification of RT products to generate a cDNA library.
[0055] Alternatively or in combination, centrifugation is used to separate RNA in the supernatant from cDNA in the cell pellet. Remaining cDNA is in some instances fragmented and removed with UDG (uracil DNA glycosylase), and alkaline lysis is used to degrade RNA and denature the genome. After neutralization, addition of primers and PTA, amplification products are in some instances purified on SPRI (solid phase reversible immobilization) beads, and ligated to adapters to generate a gDNA library.
[0056] In some instances, alkaline lysis is then used to degrade RNA and denature the genome. After neutralization, addition of random primers and PTA, amplification products are in some instances purified on SPRI (solid phase reversible immobilization) beads, and ligated to adapters to generate a gDNA library. RT products are in some instances isolated by pulldown, such as a pulldown with streptavidin beads.
[0057] In some instances, antibodies are labeled with either fluorescent labels, nucleic acid barcodes, or both. Labeled antibodies bind to at least one cell in the population, and such cells are sorted, placing one cell per container (e.g., a tube, vial, microwell, etc.). In some instances, the container comprises a solvent. In some instances, a region of a surface of a container is coated with a capture moiety. In some instances, the capture moiety is a small molecule, an antibody, a protein, or other agent capable of binding to one or more cells, organelles, or other cell component in a SCM. In some instances, at least one cell, or a single cell, or component thereof, binds to a region of the container surface. In some instances, a nucleus binds to the region of the container. In some instances, the outer membrane of the cell is lysed, releasing mRNA into a solution in the container. In some instances, the nucleus of the cell containing genomic DNA is bound to a region of the container surface. Next, RT is often performed using the mRNA in solution as a template to generate cDNA. In some instances, template switching primers comprise from 5’ to 3’ a TSS region (transcription start site), an anchor region, a RNA BC region, and a poly dT tail. In some instances, the poly dT tail binds to poly A tail of one or more mRNAs. In some instances, template switching primers comprise from 3’ to 5’ a TSS region, an anchor region, and a poly G region. In some instances, the poly G region comprises riboG. In some instances the poly G region binds to a poly C region on an mRNA transcript. In some instances, riboG was added to the mRNA transcripts by a terminal transferase. After removal of RT PCR products for subsequent sequencing, any remaining RNA in the cell is removed by UNG. The nucleus is then lysed, and the released genomic DNA is subjected to the PTA method using random primers with an isothermal polymerase. In some instances, primers are 6-9 bases in length. In some instances, PTA generates genomic amplicons of 100-5000, 200- 5000, 500-2000, 500-2500, 1000-3000, or 300-3000 bases in length. In some instances, PTA generates genomic amplicons with an average length of 100-5000, 200-5000, 500-2000, 500- 2500, 1000-3000, or 300-3000 bases. In some instances, PTA generates genomic amplicons of 250-1500 bases in length. In some instances, the methods described herein generate a short fragment cDNA pool with about 500, about 750, about 1000, about 5000, or about 10,000 fold amplification. In some instances, the methods described herein generate a short fragment cDNA pool with 500-5000, 750-1500, or 250-10,000 fold amplification. PTA products are optionally subjected to additional amplification and sequenced.
[0058] Methods described herein may require isolation of single cells for analysis. The single cells may, in some instances, be cultured for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 days. In some examples, the single cells comprise embryonic cells. In some examples, the embryonic cells are cultured for about 5 to 7 days. Any method of single cell isolation may be used with PTA, such as mouth pipetting, micro pipetting, flow cytometry/FACS, microfluidics, methods of sorting nuclei (tetrapioid or other), or manual dilution. In some examples, about 5 to 10 microliters of SCM media is harvested post-culture for analysis. Such methods can be aided by additional reagents and steps, for example, antibody-based enrichment, other small-molecule or protein-based enrichment methods, or fluorescent labeling. In some instances, a method of multiomic analysis described herein comprises mechanical or enzymatic dissociate of cells from larger tissues.
[0059] Methods of multiomic analysis comprising PTA described herein may comprise one or more methods of processing cell components such as DNA, RNA, and/or proteins. In some instances, the cell components are obtained from the SCM (e.g., from one or more embryonic cells or a cell line proxy system). In some examples, the nucleus (comprising genomic DNA) is physically separated from the cytosol (comprising mRNA), followed by a membrane-selective lysis buffer to dissolve the membrane but keep the nucleus intact. The cytosol is then separated from the nucleus using methods including micro pipetting, centrifugation, or anti-body conjugated magnetic microbeads. In another instance, an oligo-dT primer coated magnetic bead binds polyadenylated mRNA for separation from DNA. In another instance, DNA and RNA are preamplified simultaneously, and then separated for analysis. In another instance, SCM is split into two equal pieces, with mRNA from one half processed, and genomic DNA from the other half processed.
[0060] Methods described herein (e.g., PTA) may be used as a replacement for any number of other known methods in the art which are used for single cell sequencing (multiomics or the like). PTA may substitute genomic DNA sequencing methods such as MDA, PicoPlex, DOP- PCR, MALBAC, or target-specific amplifications. In some instances, PTA replaces the standard genomic DNA sequencing method in a multiomics method including DR-seq (Dey et al., 2015), G&T seq (MacAulay et al., 2015), scMT-seq (Hu et al., 2016), sc-GEM (Cheow et al., 2016), scTrio-seq (Hou et al., 2016), simultaneous multiplexed measurement of RNA and proteins (Darmanis et al., 2016), scCOOL-seq (Guo et al., 2017), CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017), scNMT-seq (Clark et al., 2018), or SIDR-seq (Han et al., 2018). In some instances, a method described herein comprises PTA and a method of polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of non-polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of total (polyadenylated and non-polyadenylated) mRNA transcripts.
[0061] In some instances, PTA is combined with a standard RNA sequencing method to obtain genome and transcriptome data. In some instances, a multiomics method described herein comprises PTA and one of the following: Drop-seq (Macosko, et al. 2015), mRNA-seq (Tang et al., 2009), InDrop (Klein et al., 2015), MARS-seq (Jaitin et al., 2014), Smart-seq2 (Hashimshony, et al., 2012; Fish et al., 2016), CEL-seq (Jaitin et al., 2014), STRT-seq (Islam, et al., 2011), Quartz-seq (Sasagawa et al., 2013), CEL-seq2 (Hashimshony, et al. 2016), cytoSeq (Fan et al., 2015), SuPeR-seq (Fan et al., 2011), RamDA-seq (Hayashi, et al. 2018), MATQ-seq (Sheng et al., 2017), or SMARTer (Verboom et al., 2019).
[0062] Various reaction conditions and mixes may be used for generating cDNA libraries for transcriptome analysis. In some instances, an RT reaction mix is used to generate a cDNA library. In some instances, the RT reaction mixture comprises a crowding reagent, at least one primer, a template switching oligonucleotide (TSO), a reverse transcriptase, and a dNTP mix. In some instances, an RT reaction mix comprises an RNAse inhibitor. In some instances an RT reaction mix comprises one or more surfactants. In some instances an RT reaction mix comprises Tween-20 and/or Triton-X. In some instances an RT reaction mix comprises Betaine. In some instances an RT reaction mix comprises one or more salts. In some instances an RT reaction mix comprises a magnesium salt (e.g., magnesium chloride) and/or tetramethylammonium chloride. In some instances an RT reaction mix comprises gelatin. In some instances an RT reaction mix comprises PEG (PEG1000, PEG2000, PEG4000, PEG6000, PEG8000, or PEG of other length).
[0063] Multiomic methods described herein may provide both genomic and RNA transcript information from a SCM (e.g., a combined or dual protocol). In some instances, genomic information from the SCM is obtained from the PTA method, and RNA transcript information is obtained from reverse transcription to generate a cDNA library. In some instances, a whole transcript method is used to obtain the cDNA library. In some instances, 3’ or 5’ end counting is used to obtain the cDNA library. In some instances, cDNA libraries are not obtained using UMIs. In some instances, a multiomic method provides RNA transcript information from the SCM for at least 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or at least 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the SCM for about 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or about 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the SCM for 100-12,000 1000- 10,000, 2000-15,000, 5000-15,000, 10,000-20,000, 8000-15,000, or 10,000-15,000 genes. In some instances, a multiomic method provides genomic sequence information for at least 80%, 90%, 92%, 95%, 97%, 98%, or at least 99% of the genome of the SCM. In some instances, a multiomic method provides genomic sequence information for about 80%, 90%, 92%, 95%, 97%, 98%, or about 99% of the genome of the SCM.
[0064] Multiomic methods may comprise analysis of SCM from culturing a population of cells (e.g., embryonic cells) or a cell line proxy system (e.g., Figure 10). In some instances, at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or at least 8000 cells are analyzed. In some instances, about 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or about 8000 cells are analyzed. In some instances, 5-100, 10-100, 50-500, 100-500, 100-1000, 50-5000, 100-5000, 500-1000, 500-10000, 1000-10000, or 5000-20,000 cells are analyzed.
[0065] Multiomic methods may generate yields of genomic DNA from the PTA reaction based on the SCM. In some instances, the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 micrograms. In some instances, the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 nanograms. In some instances, the amount of DNA generated from SCM is about 0.1, 1, 1.5, 2, 3, 5, or about 10 femtograms. In some instances, the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 micrograms. In some instances, the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 nanograms. In some instances, the amount of DNA generated from SCM is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 femtograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 nanograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 micrograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 nanograms. In some instances, the amount of DNA generated from SCM is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 femtograms. In some instances, these amounts of nucleic acids are present in solution, such as no more than 0.1, 1, 1.5, 2, 2.5, 5, 10, 15, 20, 50, or no more than 100 microliters of solution.
[0066] Described herein are methods comprising PTA, wherein sites of methylated DNA in single cells are determined using the PTA method. In some instances, these methods further comprise parallel analysis of the transcriptome and/or proteome of the SCM. Methods of detecting methylated genomic bases include selective restriction with methylation-sensitive endonucleases, followed by processing with the PTA method. Sites cut by such enzymes are determined from sequencing, and methylated bases are identified. In another instance, bisulfite treatment of genomic DNA libraries converts unmethylated cytosines to uracil. Libraries are then in some instances amplified with methylation-specific primers which selectively anneal to methylated sequences. Alternatively, non-methylation-specific PCR is conducted, followed by one or more methods to discriminate between bisulfite-reacted bases, including direct pyrosequencing, MS-SnuPE, HRM, COBRA, MS-SSCA, or base-specific cleavage/MALDI- TOF. In some instances, genomic DNA samples are split for parallel analysis of the genome (or an enriched portion thereof) and methylome analysis. In some instances, analysis of the genome and methylome comprises enrichment of genomic fragments (e.g., exome, or other targets) or whole genome sequencing.
[0067] The data obtained from the analysis methods utilizing PTA described herein may be compiled into a database. Described herein are methods and systems of bioinformatic data integration. Data from the proteome, genome, transcriptome, methylome or other data is in some instances combined/integrated into a database and analyzed. Bioinformatic data integration methods and systems in some instances comprise one or more of protein detection (FACS and/or NGS), mRNA detection, and/or genome variance detection. In some instances, this data is correlated with a disease state or condition. In some instances, data from an SCM is compiled to describe properties of a cell population, such as cells from specific embryonic tissue region. In some instances, protein data is acquired from fluorescently labeled antibodies which selectively bind to proteins. In some instances, a method of protein detection comprises grouping cells based on fluorescent markers and reporting sample location post-sorting. In some instances, a method of protein detection comprises detecting sample barcodes, detecting protein barcodes, comparing to designed sequences, and grouping cells based on barcode and copy number. In some instances, protein data is acquired from barcoded antibodies which selectively bind to proteins on a cell. In some instances, transcriptome data is acquired from sample and RNA specific barcodes. In some instances, a method of mRNA detection comprises detecting sample and RNA specific barcodes, aligning to genome, aligning to RefSeq/Encode, reporting Exon/Intro/Intergenic sequences, analyzing exon-exon junctions, grouping cells based on barcode and expression variance and clustering analysis of variance and top variable genes. In some instances, genomic data is acquired from sample and DNA specific barcodes. In some instances, a method of genome variance detection comprises detecting sample and DNA specific barcodes, aligning to the genome, determine genome recovery and SNV mapping rate, filtering reads on exon-exon junctions, generating variant call file (VCF), and clustering analysis of variance and top variable mutations.
[0068] In some instances, the methods (e.g., multiomic PTA) described herein result in higher detection sensitivity and/or lower rates of false positives for the detection of mutations. In some instances a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence. Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, populations of organisms, or other areas of the same genome. In some instances, mutations are identified on a plasmid or chromosome. In some instances, a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration). In some instances, a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion). In some instances, PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High- Throughput Genome-Wide Translocation Sequencing), IDLV (integration-deficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER-seq.
Examples
[0069] Example 1: Ligation of short, cell-free DNA fragments into long concatemers for ResolveDNA amplification.
[0070] Short fragments of DNA released into the bloodstream primarily via cell death (apoptosis, necrosis) and active secretion are collectively termed cell-free DNA (cfDNA). Other body fluids like saliva, urine, cerebrospinal fluid (CSF), and pleural effusion also contain cfDNA. cfDNA is a promising noninvasive diagnostic tool for cancer, stroke, hypertension, autoimmune, and heart disorders. Blood plasma from cancer patients contains tumor cell-derived circulating tumor DNA (ctDNA) along with cfDNA from healthy cells, which is in vast excess to ctDNA. The amount of ctDNA present correlates with cancer progression (stage) across a diversity of tumor types 5. ctDNA evaluation to detect disease-relevant mutations present in the primary tumor offers early detection, longitudinal surveillance, and minimal residual disease identification after therapy in a noninvasive and cost-effective manner compared to traditional tissue biopsy. It has enormous potential to become a routine liquid biopsy not only to screen cancer but in screening and monitoring other diseases such as stroke, immune and cardiovascular diseases. Furthermore, many researchers are actively investigating the potential of cfDNA sequencing in prenatal screening to detect fetal abnormalities. Formalin-fixed paraffin- embedded (FFPE) DNA is fragmented and short (~300-600bp) compared to intact genomic DNA. FFPE DNA sequencing is challenging, due to uneven coverage and decreased SNV detection sensitivity due to the fragmented genome as well as introduction of single-base errors. Furthermore, many researchers are actively investigating the potential of cfDNA sequencing in prenatal screening to detect fetal abnormalities (this sentence is already in the document). Invasive and potentially harmful methods such as amniocentesis and embryo biopsies are in practice for prenatal screening where cfDNA stands to provide benefit. In addition, for preimplantation genetic testing in IVF settings, cfDNA from embryo spent culture media provides a safe, non-invasive alternative for prenatal screening. There is potential for the concatenation followed by PTA strategy described here to be employed for extracellular DNA isolated from culture media to be used for multiple prenatal applications, not only human IVF, but including IVF for livestock in the agricultural arena.
[0071] Exosomes, extra cellular vesicles secreted by cells into the blood, are another component of liquid biopsy that is being explored as a disease biomarker. Exosomal DNA (exo-DNA) is not as fragmented as cf/ctDNA; however, it can be shorter than genomic DNA. The ligation strategy we propose for cf/ctDNA can potentially be used to size-enhance fragmented FFPE DNA and exo-DNA to enhance the ability to perform ResolveDNA amplification.
[0072] The primary limitations of using cfDNA as a routine clinical diagnostic marker are its ultra-low amounts in the blood plasma and its small size. cfDNA in healthy individuals is primarily hematopoietic, with an average of 2.5-30 ng/ml of blood. cfDNA concentrations vary in cancer patients as a function of the size, stage, and type of the primary cancer; and can be higher in late stage cancers due to the additive contribution of ctDNA with an average concentration of 10-180 ng/ml in blood. The size of cell-free DNA ranges from 120-220 bp and varies in different types of cancers due to altered nucleosome positioning. Methods such as digital PCR, targeted amplicon sequencing, directly ligating library adapters to the isolated cfDNA, and using gene panels are currently in practice to evaluate the mutational profile of ctDNA. These workflows often require large amounts of starting material (-5-10 ng) and require large amounts of blood draws from patients (-10-40 ml) and suffer from non-uniformity in representing original cfDNA molecules in the sequencing data, and reduced sensitivity in detecting mutations, hindering their progress toward a routine clinical diagnostic assay. An assay that can efficiently/uniformly amplify low quantities of cf/ct DNA is needed to alleviate above mentioned limitations towards cost-effective longitudinal monitoring of cancer patients. [0073] ResolveDNA (Bioskryb Genomics) whole genome amplification is proven capable of amplifying very low amounts of DNA uniformly with high fidelity. We employed Resolve DNA amplification to the Onco Span cfDNA reference standard (~120bp) (Horizon, Catalog: HD833) and synthetic DNA fragments (100bp,130bp,200bp,300bp,500bp, and lOOObp) (Azenta life sciences) in order to mimic a ctDNA template. The lower limit of template size for ResolveDNA/primary template-directed amplification had hitherto not been formally determined; and we surmised that very small templates would not be efficiently amplified — as the low amplification propensity of short PTA- terminated amplicons this is a core tenet of ResolveDNA and the purported mechanism for driving primers back to the primary template/genome of interest. With this testing we found that Resolve DNA WGA chemistry begins to amplify templates of 300-500 bp at detectable levels, but templates less than 300 bp (which includes the size of ctDNA) poorly amplify (FIG. 1). In addition, we found that sequencing coverage/read pileup is not uniform across the length of the small fragments (FIGS. 6A & 7A, unligated). Deep sequencing data (10X) for the Onco Span cfDNA reference standard amplified by ResolveDNA only covered eleven of 386 regions reported by Horizon exome sequencing data for this sample (120x coverage using Agilent SureSelect Human All Exon V6 kit and Illumina sequencing data) with at least a read covering the focal loci and only revealed one verified mutation out of 386. These results are concordant, and perhaps not surprising, with ResolveDNA amplification’s completeness and uniformity of genome amplification by limiting the recopying of short amplicons.
[0074] To improve performance with ResolveDNA, we proposed a strategy to effectively increase the length of the cfDNA template as input into the ResolveDNA chemistry by ligating the short cfDNA fragments using T4DNA ligase. T4 DNA ligase can act on sticky and blunt ends and ligate short DNA fragments into long concatemers; we envisioned that these long cfDNA concatemers could be ideal templates for efficient Resolve DNA amplification. T4 DNA ligation has not been previously used for cfDNA applications; however, it has been implemented to amplify fragmented DNA from FFPE samples using multiple displacement amplification.
[0075] We first treated short DNA fragments with a T4 polynucleotide kinase (PNK) (New England Biolabs, catalog: M0201S) to phosphorylate the 5' ends for increased ligation efficiency. Pilot experiments started with 50ng of template DNA in 50ul reactions with PNK(IOU) where we used 10X T4 DNA ligase buffer (NEB) to achieve a final ATP concentration of lOmM. The reaction was incubated at 37C for 30min and heat-inactivated at 65C for 20min. The PNK reaction was cleaned with a 2X ratio of SPRI beads (BioSkryb bead purification kit) and eluted in 20ul EB. ~ 3 Ong of PNK-treated DNA was input to the T4DNA ligase reaction (2000U/20ul reaction) (New England Biolabs, catalog: M0202T). Ligation was performed at 16C for 15hrs and heat-inactivated at 65C for lOmin. Unligated and ligated reactions were run in parallel on an Agilent high-sensitivity D5000 tape to visualize the increase in the size of the concatenated fragments compared to the unligated fragments (FIG. 2).We observed a ladder-like profile on the tapestation gel post-ligation, indicating a heterogenous mix of ligation products was present in the size range of 400 to 2000.
[0076] Next, we used lOOpg (~ 300pg/5ml of blood is attainable) of unligated or ligated DNA in a ResolveDNA whole genome amplification reaction. The yield of ligated fragments was similar to control gDNAs, while unligated fragments failed to amplify efficiently (FIG. 3), suggesting that the ligated concatemers of cfDNA fragments can function as a more optimal template in ResolveDNA amplification. We then made libraries with lOOng of amplified DNA from ligated samples as input or with the entire amplification product of unligated samples because the yield was ~ 20ng (for the unligated cell-free onco reference standard DNA we took lOOng) (FIG. 4). [0077] After low-pass sequencing (2E6 reads per sample), Integrative genome viewer (IGV) analysis of the read coverage of the fragments showed that fragments ligated before amplification have increased coverage uniformity across the length of the template compared to the unligated samples (FIGS. 5-7)). The chimeric percentage is high (-40-50%) in both unligated and ligated samples. Reasons for the high-percent chimera must be carefully interpreted as we are not using a classical template for ResolveDNA (gDNA), and by ligating, are effectively making chimeras of a small fraction of the genome. Despite this high chimeric rate, which would impact the ability to call genomic fusion events, we have previously shown that a high percent chimera does not affect the ability of robust SNV calling. In addition, most clinical applications of cfDNA have focused on the ability to detect oncogenic SNVs vs structural variation. To validate our ability to call SNVs, we 2X150 deep sequenced (400 million total reads per sample) a ResolveDNA-amplified ligated and unligated Horizon Cell free Onco Span reference standard for which we have defined variant frequency information on from the vendor. Analysis of the sequencing data showed that among the expected 386 focal loci in the Horizon reference panel, the unligated library had reads covering 11 sites with one verified variant (A1045T in the MSH3 gene). This A1047T variant of MSH3 was also seen in the ligated sample in the lowpass sequencing data as well as with deep sequencing (but not in the unligated lowpass sample) In contrast, the ligated sample revealed 261 sites and 85 verified variants supported by at least one read (FIG. 8).
[0078] This data showed that ligating short fragments before ResolveDNA amplification increased amplification yield and improved coverage and SNV calling sensitivity. In some embodiments this protocol may also be applied to cell-free DNA isolated from the plasma of a late-stage cancer patient with known mutational profile data from the primary tumor — we seek to identify the same mutation(s) present in the primary solid tumor in the ctDNA. In some embodiments, the ligation efficiency may be adjusted with and without PNK while varying input (10pg-25ng). In some embodiments, PNK is used for efficient ligation and amplification (FIG. 9). Ligation efficiency for ~7ng of input performs as with 50ng and starts declining slightly from Ing onwards (data not shown) However, methods such as using stuffer DNA of a known sequence may increase the DNA concentration for ligation.
[0079] This examples demonstrates ligation of cfDNA followed by ResolveDNA amplification has the potential to outperform existing methods for the detection of rare, low- frequency mutations and, by definition, uncover more variation with an unbiased survey of genetic variation vs. an approach using probes to target specific loci. Namely, we will detect more mutations at a given depth of sequencing. Commercially available kits for cfDNA make libraries from high concentrations of cfDNA (lOng/ul -lOOng/ul) when using PCR-free approaches (to avoid PCR-introduced bias and errors). Ing- 1 Ong is used when using PCR amplification in the library preparation; nevertheless, PCR introduces more errors. Our approach of ResolveDNA amplification of ligated short fragments should reduce the starting material requirement to picogram quantities, making it more suitable in the context of the low amounts of cf/ct DNA that can be obtained from a patient. With our approach, we envision more direct clinical utility in evaluating cfDNA than existing commercial kits. In a rarefaction-type analysis, we will assess this directly in a head-to-head test with ResolveDNA with the patient plasma- derived cfDNA.
[0080] Example 2: Analysis of cfDNA from culture media
[0081] Following the general procedures of Example 1, cell free DNA from spent culture media of an embryo is used to analyze one or more fetal genetic abnormalities. In some instances the embryo is being prepared for implantation. In some instances the embryo is a human or nonhuman embryo, such as livestock. This method is expected to provide advantages over more invasive fetal abnormality tests (e.g., amniocentesis or biopsy).
[0082] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
[0083] Example 3: Analysis of SCM Samples from Embryonic Cultures
[0084] To model difficult-to-obtain SCM samples from actual human embryonic cultures, a proxy/model system was developed to assess PTA-driven copy number profiles from SCM, in the presence or absence of ligation workflows upstream of the amplification (Figure 10). Both a euploid model, the NIST Genome in a Bottle B-lymphocyte cell line NA12878/HG001, and an aneuploid acute myeloid leukemia cell line was utilized to ensure the ability to faithfully report discrete chromosomal deletions and gains. To directly measure nucleic acid content from these spent medias, a system was devised to centrifuge (350xg, 5min) 10 ml of spent culture media after 2-3 days of cell growth, serving to remove cells from free nucleic acid. Careful and conservative decanting was used to avoid introduction of cells to the spent culture media, and in the case of cell contamination it would be indicated by a high preseq value not achieved by free nucleic acid. The spent media was then subjected to a 1.2X SPRI bead purification, to achieve a concentrated and pure nucleic acid fraction used for input for ligation and PTA. Using this methodology and NA12878 or MOLM-13 cells, an average size of -150 bp upon TapeStation readout was obtained, which was indicative of a single nucleosome footprint.
[0085] Two general ligation approaches were designed: linear concatenation and circularization. [0086] Based on prior work, linearly concatenating the ~150bp small fragments derived from the SCM model systems was thought to be needed to obtain usable PTA yield and to achieve a copy number profile with acceptable signal-to-noise. However, here, “polishing” of the nucleic acid ends prior to ligation was performed using an end-repair and A-tailing (ERAT) cocktail in order to make ligation more efficient. The end-repair cocktail comprised T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase, for repair/fill-in, phosphorylation, and fill-in + A-tailing, respectively. After ERAT, ligation was performed with T4 DNA ligase and then subjected to ResolveDNA or Resolve OME workflows.
[0087] Generally, PTA more efficiently amplifies small circular templates like the mitochondrial genome (mtDNA) relative to other genomic amplification methodologies. Thus, the ligation approach was designed to generate circular molecules by the inclusion of hairpin adapters to generate boundaries for the circular form and by the inclusion of a T-overhang adapter to facilitate efficient concatenation of A-tailed SCM small DNA fragments internal to the hairpin adapters (Figure 11). The resulting circular product would be subjected to PTA in the ResolveDNA or ResolveOME workflows. This approach was distinct in that in addition to (or exclusive of) the standard random er priming of PTA, PTA priming could be initiated using a primer directed at the hairpin adapter sequence — which could be engineered to include sequences not represented in the human genome.
[0088] Two ligation workflow that could be employed for either the linear or for the circular concatenation schemes were devised (Figure 12). Firstly, a “2-step” workflow, whereby ERAT fragment polishing and ligation were performed sequentially with two different reagent cocktails, where each addition was accompanied by its own thermal cycling or temperature hold conditions. An alternative workflow “l-step” was defined by a mixing of all ERAT and ligation buffers + enzymes and one unified cycling scheme to provide appropriate temperatures for all enzymes in the cocktail.
[0089] Initial data were generated with 60 pg of NA12878 or MOLM-13 SCM input DNA — a quantity modeling the estimated free DNA present in a 5-10 microliter aliquot of 5-7 day embryo cultures, whereby that aliquot size is typical for that provided from clients to PGT providers. Using this quantity of input SCM data, the linear concatenation approach in the presence of absence of T4 DNA ligase resulted in a marked PTA yield difference upon executing the ResolveOME protocol (Figure 13). This was consistent with previous data showing that PTA does not amplify precipitously below a 300 bp threshold. In addition to the presence of ligase providing usable PTA yield, the segment score of MAD (mean absolute deviation) decreased in the presence of ligase, indicating that the signal to noise had increased with respect to copy number alteration calling. Corresponding preseq values were indicative of a lack of single-cell contamination, providing confidence that the output metrics reflect concatenation of 150 bp SCM-derived DNA fragments.
[0090] Copy number profiles from NA12878 and MOLM-13 purified SCM DNA generated from either the “2-step” or “1-step” linear concatenation methodology upstream of ResolveOME multiomic chemistry were compared (Figure 14). Feasibility was demonstrated for both protocols, with robust MAD values. MOLM-13 chromosomal abnormalities were captured (with a higher mean MAD relative to NA12878 that occurs by default with aneuploidy with this analysis pipeline), while euploidy was faithfully reported for the NA12878. In both models, the 1-step ligation protocol resulted in a modest reduction in copy number profile quality relative to the 2-step. The 1-step ligation protocol would be preferred from a user/customer perspective for workflow ease. However, in some instances, can be accompanied by some data quality tradeoffs until formulation optimization is complete.
[0091] Initial data showed feasibility of inclusion of hairpin adapters and DNA linker (Figure 11) for the generation of the anticipated euploid copy number profiles in NA12878 cells, and also showed a decrease in MAD value upon such inclusion relative to ligation that occurred in the absence of adapter and linker (Figure 15). This experiment was performed with a randomer PTA primer vs a primer directed at the hairpin (Figure 11). Further work can include definitively characterizing the quantitative benefits of circularization, as well as linker and hairpin design optimization.
[0092] The ligation strategies presented here for SCM can have broad relevance and implications for work aimed at creating workflows for FFPE and ascertaining exosome nucleic acid by PTA due to their small size/fragmented nature. In addition, the circularization methodology presented here could bring PTA into realms where rolling circle amplification (RCA) is currently exploited (e.g., sequencing platform flow cells).

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method of amplification comprising:
(a) ligating a plurality of sample nucleic acids to form concatemers;
(b) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids, and
(c) amplifying the at least one target nucleic acid molecule to generate a plurality of terminated amplification products, wherein the at least one terminator nucleotide is attached to the 3' terminus of the terminated amplification products, and wherein the replication proceeds by strand displacement replication.
2. The method of claim 1, wherein the sample nucleic acids comprise cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA).
3. The method of claim 1, wherein the sample nucleic acids are no more than 300 bases in length.
4. The method of claim 1, wherein the sample nucleic acids are no more than 200 bases in length.
5. The method of claim 1, wherein the sample nucleic acids comprise cfDNA.
6. The method of claim 1, wherein the sample nucleic acids comprise ctDNA.
7. The method of claim 1, wherein the sample nucleic acids are obtained from an formalin- fixed paraffin-embedded (FFPE) sample.
8. The method of claim 1, wherein the sample nucleic acids are obtained from spent culture media.
9. The method of claim 1, wherein the sample nucleic acids are obtained from spent culture media of an embryo.
10. The method of claim 1, wherein ligating comprises phosphorylation of sample nucleic acid ends followed by contact with a ligase.
11. The method of claim 10, wherein the ligase comprises T4 ligase.
12. The method of claim 1, wherein the nucleotides with modification to the alpha group are alpha-thio dideoxynucleotides.
13. The method of claim 1, wherein the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxyribose.
14. The method of claim 1, wherein the at least one terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof.
15. The method of claim 1, wherein the at least one polymerase comprises a B-type polymerase.
16. The method of claim 1, wherein the at least one polymerase comprises Phi29 polymerase.
17. The method of claim 1, further comprising polishing an end of a nucleic acid from the plurality of sample nucleic acids prior to (a).
18. The method of claim 17, wherein polishing comprises using an end-repair and A-tailing (ERAT) cocktail.
19. The method of claim 18, wherein the ERAT cocktail comprises one or more polymerases, a kinase, or both.
20. The method of claim 18, wherein the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase.
21. A method of amplification comprising:
(a) obtaining a sample comprising nucleic acids from spent culture media (SCM);
(b) ligating the nucleic acids to form concatemers;
(c) contacting the concatemers with at least one amplification primer, at least one nucleic acid polymerase comprising 3'-5' exonuclease activity, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the at least one polymerase, wherein the at least one terminator nucleotide is an irreversible terminator which inhibits 3 ’-5’ exonuclease activity of the polymerase, and is selected from nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids; and
(d) amplifying the at least one target nucleic acid molecule to generate a plurality of terminated amplification products, wherein the at least one terminator nucleotide is attached to the 3' terminus of the terminated amplification products, and wherein the replication proceeds by strand displacement replication.
22. The method of claim 21, wherein the SCM is obtained by culturing one or more embryonic cells for about 5 to 7 days.
23. The method of claim 21, further comprising sequencing a cDNA library comprising polynucleotides amplified from mRNA transcripts the SCM.
24. The method of claim 23, wherein sequencing the cDNA library comprises reverse transcription.
25. The method of claim 23, wherein the mRNA transcripts are amplified via template-switching reverse transcription.
26. The method of claim 21, wherein (b) comprises mixing an end-repair and A-tailing (ERAT) cocktail and a ligation buffer.
27. The method of claim 21, wherein (b) comprises polishing one or more ends of the nucleic acids using an end-repair and A-tailing (ERAT) cocktail and exposing the nucleic acids to a ligation buffer.
28. The method of claim 26 or 27, wherein the ERAT cocktail comprises one or more polymerases, a kinase, or both.
29. The method of claim 26 or 27, wherein the ERAT cocktail comprises one or more of T4 DNA polymerase, T4 polynucleotide kinase, and Taq-B polymerase.
30. The method of claim 21, wherein (b) comprises generating circular molecules using hairpin adapters.
31. The method of claim 30, wherein the hairpin adapter further comprises a linker.
PCT/US2024/048405 2023-09-26 2024-09-25 Methods, systems, and compositions for cfdna analysis Pending WO2025072326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363585497P 2023-09-26 2023-09-26
US63/585,497 2023-09-26

Publications (1)

Publication Number Publication Date
WO2025072326A1 true WO2025072326A1 (en) 2025-04-03

Family

ID=95202263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/048405 Pending WO2025072326A1 (en) 2023-09-26 2024-09-25 Methods, systems, and compositions for cfdna analysis

Country Status (1)

Country Link
WO (1) WO2025072326A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220277805A1 (en) * 2019-07-31 2022-09-01 BioSkryb Genomics, Inc. Genetic mutational analysis
US20220315984A1 (en) * 2019-06-28 2022-10-06 Cs Genetics Limited Reagents and Methods for the Analysis of Microparticles
US20230212689A1 (en) * 2021-07-06 2023-07-06 Singular Genomics Systems, Inc. Compositions and methods for detecting genetic features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220315984A1 (en) * 2019-06-28 2022-10-06 Cs Genetics Limited Reagents and Methods for the Analysis of Microparticles
US20220277805A1 (en) * 2019-07-31 2022-09-01 BioSkryb Genomics, Inc. Genetic mutational analysis
US20230212689A1 (en) * 2021-07-06 2023-07-06 Singular Genomics Systems, Inc. Compositions and methods for detecting genetic features

Similar Documents

Publication Publication Date Title
US11643682B2 (en) Method for nucleic acid amplification
JP7379418B2 (en) Deep sequencing profiling of tumors
US9745614B2 (en) Reduced representation bisulfite sequencing with diversity adaptors
US20120003657A1 (en) Targeted sequencing library preparation by genomic dna circularization
CN110536967A (en) Reagents and methods for analyzing associated nucleic acids
CN108138175B (en) Reagents, kits and methods for molecular barcode encoding
KR20220041875A (en) single cell analysis
JP2013544498A (en) Direct capture, amplification, and sequencing of target DNA using immobilized primers
WO2025072326A1 (en) Methods, systems, and compositions for cfdna analysis
US11268087B2 (en) Isolation and immobilization of nucleic acids and uses thereof
WO2025085821A1 (en) Methods, systems, and compositions for cell storage and analysis
WO2023215524A2 (en) Primary template-directed amplification and methods thereof
KR20240032631A (en) Highly sensitive methods for accurate parallel quantification of variant nucleic acids
JP2022548000A (en) Multiplex method for sequencing library production
HK40042337A (en) Method for nucleic acid amplification
HK40042337B (en) Method for nucleic acid amplification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24873487

Country of ref document: EP

Kind code of ref document: A1