WO2025221722A1 - Methods of selectively tagging nucleic acids for analysis of rna sequences - Google Patents
Methods of selectively tagging nucleic acids for analysis of rna sequencesInfo
- Publication number
- WO2025221722A1 WO2025221722A1 PCT/US2025/024672 US2025024672W WO2025221722A1 WO 2025221722 A1 WO2025221722 A1 WO 2025221722A1 US 2025024672 W US2025024672 W US 2025024672W WO 2025221722 A1 WO2025221722 A1 WO 2025221722A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tag
- binding
- nucleic acid
- molecules
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
Definitions
- the present invention relates to preparation of nucleic acids for sequencing or other analysis.
- the present invention also relates to target enrichment of nucleic acids.
- Biological samples may contain nucleic acid molecules to be analyzed.
- plasma or serum samples used as liquid biopsies may contain DNA and RNA molecules present in relatively amounts but with potentially high value for diagnostic purposes. In some circumstances, it is desirable to analyze both DNA and RNA molecules which are present in a sample, and some diagnostic assays distinguish between them. Some methods for extracting nucleic acid from a sample isolate both DNA and RNA from the sample, but additional steps must be performed if one wishes to distinguish or separate between DNA and RNA molecules.
- DNA and RNA molecules in plasma or serum are released into circulation by different biological processes. DNA molecules usually come from apoptotic or necrotic cells, whereas RNA molecules are mostly from living and actively dividing cells. Distinguishing between the types of nucleic acids within a sample can provide further information and diagnostic value. Additionally, analysis of nucleic acids, especially by sequencing, typically requires one or more processing steps, and it may be desirable to process the RNA molecules differently than DNA molecules.
- RNA can be separated from DNA in a sample in various ways. For instance, samples can be mixed with a phenol: chloroform mixture and centrifuged. Because the phenol: chloroform mixture is immiscible with water, centrifugation forms two distinct phases: an upper aqueous phase and a lower organic phase. Proteins will remain at the interphase between the two phases, while the nucleic acids (as well as other contaminants such as salts, sugars, etc.) remain in the upper aqueous phase. If the mixture is acidic, DNA precipitates into the organic phase while RNA remains in the aqueous phase. The upper aqueous phase can then be pipetted off, and the RNA can be separated from contaminants.
- nucleic acid target capture methods can allow specific genes, exons, and other genomic regions of interest to be enriched for targeted sequencing or other analysis.
- target capture-based sequencing methods can involve cumbersome lengthy protocols and costly processes, as well as a low on-target rate for a small capture panel (e.g., less than 500 probes).
- current methods for nucleic acid target capture can be ill-suited for low input and damaged DNA because of a low recovery rate.
- Sequencing of nucleic acids often provides extremely valuable information from the sample.
- Next-Generation Sequencing (NGS) methods and systems involve the parallel sequencing of a library of nucleic acids by a sequencing platform.
- Preparation of a sequencing library generally includes various steps such as amplification of the nucleic acids, attachment of adaptors, and/or other preparatory steps.
- Emerging polynucleotide sequencing platforms can enable direct detection and analysis of nucleic acid molecules without the need for amplification, though the nucleic acids often require an adaptor or other moiety to immobilize the nucleic acid for sequencing steps.
- An adaptor can be attached to one or both ends of nucleic acid molecules in order to add sites for primer binding and for immobilization of the nucleic acid on a surface such as a flowcell or a bead, and to add other functional sequences to the fragments.
- Various kinds of adaptors are used in sequencing preparation kits to add these sites or sequences to the nucleic acids from the sample.
- Adaptors can be attached in various ways, such as by ligation, primer extension, tagmentation, and other techniques.
- a sequencing library can be generated in a variety of ways, with different objectives regarding the nucleic acids to be used as inputs. For instance, PCR can be used with targetspecific primers to generate a library of amplicons covering regions of interest in the nucleic acid sample. Other methods of library preparation involve random fragmentation of the nucleic acid sample by enzymatic or physical shearing methods, followed by amplification using common adaptor sequences. Enrichment procedures are used to remove or separate sequences of interest from the rest of the sample.
- the present disclosure provides methods of preparing nucleic acid molecules for sequencing or other analysis.
- methods are provided for use with a sample comprising input DNA molecules and input RNA molecules.
- Tagged cDNA molecules are synthesized by hybridizing a tagged primer to the input RNA molecules, and extending the primer to produce the tagged cDNA molecules.
- the tagged primer comprises a first tag
- the first tag comprises a binding partner which is a member of a first binding pair.
- the methods also comprise attaching a second tag to the first tag to produce a dual tagged cDNA complex.
- the second tag comprises a reciprocal binding partner of the first binding pair, thereby facilitating attachment of the second tag.
- the first tag can be digoxigenin (DIG), and the second tag can comprise anti-DIG antibody.
- DIG and anti-DIG antibody constitute the first binding pair.
- the second tag can also comprise a binding partner of a second binding pair, and the present methods can comprise attaching the second tag to a reciprocal binding partner of the second binding pair, which itself may be attached to the solid support.
- the second tag can comprise a biotin moiety attached to an anti-DIG antibody, and the reciprocal binding partner of the second binding pair can be a streptavidin coated bead; in this example, biotin and streptavidin constitute the second binding pair.
- the present methods can further comprise one or more steps for capture or target enrichment of a tagged cDNA molecule.
- the tagged cDNA molecule can be hybridized with a probe, such as a capture probe or a bridge probe which is hybridized with an anchor probe.
- the capture probe or anchor probe can be attached to an enrichment tag such as a biotin moiety.
- the present methods further comprise processing the tagged cDNA molecules and the input DNA molecules separately. For instance, the processing may be amplification to produce amplicons of the input molecules.
- the present methods can also comprise re-combining the processed cDNA molecules and the process input DNA molecules (or amplicons thereof) for sequencing or other analysis.
- FIG. 1 illustrates synthesis of a tagged cDNA molecule which is complementary to an input RNA molecule.
- a primer comprising a first tag hybridizes to the input RNA molecule.
- the first tag comprises a binding partner (DIG) of a first binding pair (DIG:anti-DIG antibody).
- FIG. 2 illustrates a target enrichment procedure in which a tagged cDNA molecule having an attached first tag is hybridized with bridge probes and the cDNA molecule:bridge probe complex is hybridized with biotinylated universal (or anchor) probes.
- FIG. 3 illustrates how one of the input nucleic acids (e.g. RNA) is differentially tagged with a first tag described here while another of the input nucleic acids (e g. DNA) is not tagged. Following optional target enrichment, RNA (or cDNA) molecules with the first tag are then isolated.
- RNA or cDNA
- the present disclosure provides methods of preparing RNA molecules for analysis.
- the methods comprise providing a sample comprising input DNA molecules and input RNA molecules, and selectively hybridizing a tagged primer to the input RNA molecules.
- the tagged primer comprises a first tag, and the first tag comprises a binding partner of a first binding pair.
- the first tag can be attached to the cDNA by a covalent bond or by non-covalent binding.
- the input RNA molecules can comprise a primer binding site that hybridizes with the tagged primer, and the input DNA molecules do not comprise the primer binding site.
- the primer binding site is polyadenine, or poly(A), which can be naturally present on the input RNA molecules, or may be added to input RNA molecules.
- the primer binding site is ligated to the input RNA molecules before the hybridizing of the tagged primer.
- the present methods also comprise extending the primer to produce tagged cDNA molecules.
- the methods also comprise attaching a second tag to the first tag to produce a dual tagged cDNA complex, wherein the second tag comprises a reciprocal binding partner of the first binding pair.
- the second tag can be attached to the first tag by a covalent bond or by non-covalent binding.
- the second tag also comprises a binding partner of a second binding pair. The binding partners of the first binding pair do not bind with the binding partners of the second binding pair.
- the second tag can include binding partners from two different binding pairs. Synthesizing a Tagged cDNA Molecule Having A First Tag
- the present methods are particularly advantages with samples that contain input DNA molecules and input RNA molecules.
- the methods comprise selectively hybridizing a tagged primer to the input RNA molecules, meaning that the primer hybridizes to the input RNA molecules without significant hybridization to the input DNA molecules.
- the tagged primer comprises a first tag, and the first tag includes a binding partner of a first binding pair.
- a first binding pair includes two binding partners which reciprocally bind to each other; the first tag includes one of the binding partners from that binding pair.
- the methods also comprise extending the tagged primer to produce tagged cDNA molecules.
- extending refers to the extension of a primer by the addition of nucleotides using a primer extension enzyme. If a primer that is annealed to a nucleic acid (such as an input RNA molecule) is extended, the nucleic acid acts as a template for an extension reaction. The sequence of nucleotides added during the extension process is determined by the sequence of the nucleic acid template. Primers can be extended by primer extension enzymes such as DNA polymerases and reverse transcriptases.
- the first tag is any binding partner suitable for covalent or non-covalent attachment to the tagged primer, so that the first tag is attached to a cDNA molecule synthesized by primer extension of a primer hybridized with an input RNA molecule.
- the first tag is selected from the group consisting of digoxigenin, 5- bromo-2’-deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotri acetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine (e.g., a primary amine), an aldehyde, an alkyne or an azide or other groups reacting by click chemistry, and mixtures thereof.
- NTA nitrilotriacetate
- Ni-NTA nickel nitrilotriacetate
- tris-Nitrilotriacetate tris-Nitrilotriacetate
- a tyramine e.g., a primary amine
- an aldehyde e
- examples of first binding pairs include DIG:anti-DIG antibody, BrdU:anti-BrdU antibody, DNP:anti-DNP antibody, Ni- NTA:poly-Histidine (His-tag), Tyramide:tyrosine residues, ThiokThiol (disulfide bonds), Amine:Aldehyde conjugation, alkyne:azide, or other click chemistry reactants.
- examples of tyramines include Tyramine, N-Methyltyramine, N,N-Dimethyltyramine, and N,N,N- Trimethyltyramine.
- alkynes and azides binding via click chemistry examples include copper- catalyzed reaction of an azide and alkyne to form a triazole (Huisgen 1, 3-dipolar cycloaddition) and strain-promoted azide alkyne cycloaddition (SPAAC).
- SPAAC strain-promoted azide alkyne cycloaddition
- Digoxigenin is advantageous as the binding partner included in the first tag, due to its very low non-specific binding and the availability of high-affinity anti-digoxigenin antibodies.
- anti -DIG antibodies include Perkin Elmer’s Anti -Digoxigenin biotin conjugate.
- the present methods comprise a second tag that includes the reciprocal binding partner of the first tag (that is, the other binding partner of the first binding pair).
- the first tag can comprise an antigen or hapten
- the second tag can comprise an antibody that selectively binds that antigen or hapten.
- the attachment of the second tag to the tagged cDNA molecule results in dual tagged DNA complexes, which can facilitate enrichment or isolation of the cDNA.
- the binding partners of the second binding pair do not bind with the binding partners of the first binding pair.
- the present methods comprise forming a dual tagged cDNA complex by binding the second tag to the first tag which was previously attached to the tagged cDNA molecule.
- the second tag also includes a binding partner of a second binding pair (that is, a binding pair that is different from the first binding pair).
- the second tag comprises biotin, 5-bromo-2’-deoxyuridine (BrdU), 2,4- dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine (e.g., a primary amine), an aldehyde, an alkyne or an azide or other groups reacting by click chemistry, and mixtures thereof.
- the reciprocal binding partner of the second binding pair can comprise an avidin moiety such as avidin, streptavidin, neutravidin, captavidin, etc., or antibodies that specifically bind the binding partner, His-tags, tyrosine residues, thiols, amines, aldehydes, alkynes, azides, etc.
- second binding pairs include biotin: streptavidin, BrdU: anti -BrdU antibody, DNP:anti-DNP antibody, Ni-NTA:His-tag, Tyramide:tyrosine residues, Thiol:Thiol (disulfide bonds), Amine:Aldehyde conjugation, alkyne:azide, or other click chemistry reactants.
- Click chemistry can be a reaction between and azide and alkyne that can yield a covalent product (e.g., 1,5 -di substituted 1,2,3-triazole). Copper can serve as a catalyst.
- the proteins avidin and streptavidin form exceptionally tight complexes with biotin moieties. In general, when a biotin moiety is coupled to a second molecule through its carboxyl side chain, the resulting conjugate is still tightly bound by avidin or streptavidin. The second molecule is said to be "biotinylated" when such conjugates are prepared.
- Emerging nucleic-acid sequencing platforms can enable direct detection and analysis of nucleic acid molecules without the need for traditional sample preparation and amplification steps.
- input nucleic acid molecules can be tagged with a biotin molecule through a simple enzymatic addition step.
- the biotinylated nucleic acid molecule can then be bound to a flowcell containing avidin or streptavidin moieties.
- Subsequent analysis can then be performed, e.g., using fluorescently labelled probes.
- biotin moiety is tagged to input nucleic acid molecules, as is done in some sample preparation procedures, it would interfere with certain probe-based capture and enrichment protocols that uses biotin-tagged oligonucleotide probes.
- addition of a biotin tag to nucleic acids after target enrichment is challenging due to the single stranded nature of the enriched nucleic acid molecules.
- residual components from the target enrichment protocol could also be processed, impacting subsequent analysis.
- the present disclosure provides an approach for selectively tagging cDNA molecules that is compatible with existing target enrichment procedures and supports subsequent biotin-tag addition, enabling compatibility with direct nucleic acid sequence analysis platforms.
- the present methods can be used with samples comprising input nucleic acid molecules of various types, particularly where the sample comprises a mixture of input DNA and RNA molecules.
- Input DNA molecules include genomic DNA (gDNA), mitochondrial DNA, viral DNA, cDNA, cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), cell-free fetal DNA (cffDNA), or synthetic DNA.
- the DNA can be double-stranded DNA, single-stranded DNA, fragmented DNA, or damaged DNA.
- the input nucleic molecules can include input RNA, input DNA, or a mixture of input RNA and input DNA.
- the input RNA molecules can be mRNA, pre-mRNA, tRNA, rRNA, microRNA, snRNA, piRNA, small non-coding RNA, polysomal RNA, intron RNA, pre-mRNA, viral RNA, or cell-free RNA.
- the DNA comprises fragmented genomic DNA and the RNA comprises mRNA or pre-mRNA.
- the input nucleic acid can be naturally occurring or synthetic.
- the input nucleic acid can have modified heterocyclic bases.
- the modification can be methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses, or other heterocycles.
- the input nucleic acid can have modified sugar moieties.
- the modified sugar moieties can include peptide nucleic acid.
- the input nucleic acid can comprise peptide nucleic acid.
- the input nucleic acid can comprise threose nucleic acid.
- the input nucleic acid can comprise locked nucleic acid.
- the input nucleic acid can comprise hexitol nucleic acid.
- the input nucleic acid can be flexible nucleic acid.
- the input nucleic acid can comprise glycerol nucleic acid.
- the input nucleic acid can be captured and enriched from low-input (e.g., 1 ng of nucleic acid materials) samples such as cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), a single cell, or 10 or fewer cells.
- cfDNA cell-free DNA
- ctDNA circulating tumor DNA
- a single cell or other cells for which analysis may be desired include a neuron, a glial cell, a germ cell, a gamete, an embryonic stem cell, a pluripotent stem cell (including an induced pluripotent stem cell), an adult stem cell, a cell of the hematopoietic lineage, a differentiated somatic cell, a microbial cell, a cancer cell (including, for example a cancer stem cell), and a disease cell.
- the input nucleic acids are captured from 10 or fewer cells (such as 1-10 cells, 2-10 cells, 5-10 cells, 1-2 cells, or 2-5 cells).
- the low-input samples can have 1 ng, 2 ng, 3 ng, 4 ng, 5 ng, 6 ng, 7 ng, 8 ng, 9 ng, 10 ng, or more of nucleic acid materials.
- the low-input samples can have less than 10 ng, 9 ng, 8 ng, 7 ng, 6 ng, 5 ng, 4 ng, 3 ng, 2 ng, 1 ng, or less of nucleic acid materials.
- the low-input samples can have from 200 pg to 10 ng of nucleic acid materials.
- the low-input samples can have less than 10 ng of nucleic acid materials.
- the low-input sample can less than 10 ng, 5 ng, 1 ng, 100 pg, 50 pg, 25 pg, or less of the nucleic acid materials.
- the input samples can have 1 ng, 10 ng, 20 ng, 30 ng, 40 ng, 50 ng, or more of nucleic acid molecule.
- the input samples can have less than 50 ng, 40 ng, 30 ng, 20 ng, 10 ng, 1 ng, or less of nucleic acid materials.
- the capture and enrichment can be done by target probe hybridization.
- the target probe can be a capture probe, bridge probe, and/or anchor probe.
- the target probe can comprise one or more binding moieties.
- the binding moiety can be a biotin.
- the binding moieties can be attached to a support.
- the support can be a bead.
- the bead can be a streptavidin coated bead.
- the input nucleic acid can be damaged.
- the damaged nucleic acid can comprise altered or missing bases, and/or modified backbone.
- the input nucleic acid can be damaged by oxidation, radiation, or random mutation.
- Damaged dsDNA (with a nick) or ssDNA can be used as input nucleic acid for a library construction.
- the dsDNA can be denatured so at least one undamaged strand can be used as an input nucleic acid.
- the input nucleic acid can then be hybridized and attached to a capture probe and amplified using various primers.
- the input nucleic acid can be derived from cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA).
- the cfDNA can be fetal or tumor in source.
- the input nucleic acid can be derived from liquid biopsy, solid biopsy, or fixed tissue of a subject.
- the input nucleic acid can be cDNA and can be generated by reverse transcription.
- the input nucleic acid can be derived from fluid samples, including not limited to plasma, serum, sputum, saliva, urine, or sweat.
- the input nucleic acid can be derived from liver, esophagus, kidney, heart, lung, spleen, bladder, colon, or brain.
- the input nucleic acid can be derived from male or female subject.
- the subject can be an infant, a teenager, a young adult or an elderly person.
- the input nucleic acid can originate from human, rat, mouse, other animal, or specific plants, bacteria, algae, viruses, and the like.
- the input nucleic acid can originate from primates, such as chimpanzees or gorillas. Other animals include a rhesus macaque.
- the input nucleic acid can be from a mixture of genomes of different species including host-pathogen, bacterial populations, etc.
- the input nucleic acid can be cDNA made from RNA expressed from genomes of one or more species.
- the input nucleic acid can comprise a target sequence.
- the target sequence can be an exon, an intron, or a promoter.
- the target sequence can be previously known, partially known previously, or previously unknown.
- the target sequence can comprise a chromosome, chromosome arm, or a gene.
- the gene can be gene associated with a condition, e.g., cancer.
- FIG. 1 illustrates the present method with an input RNA molecule 101.
- the input RNA molecule has a primer binding sequence, either when obtained from the sample or which is added to it.
- a series of adenosine (A) bases 103 are present at the 3' end of the input RNA molecule 101.
- a tagged cDNA molecule 105 is synthesized using the input RNA molecules 101 as a template, by extension of a tagged primer 107 that hybridizes to the series of adenosines 103.
- the tagged primer 107 is conjugated to a binding partner 109 of a first binding pair such that it that serves as a first tag.
- the tagged primer 107 can also comprise one or more other functional sequences 111 such as primer binding sites or barcodes.
- a tagged cDNA molecule 105 having a sequence complementary to the input RNA molecule is produced.
- the 3’ end of the cDNA molecule 105 can be attached to an adaptor 113.
- Input RNA molecules can be prepared or treated in other ways that cooperate with the present methods.
- the input RNA molecules can be prepared by a procedure that adds a poly(A) sequence if one is not already present.
- the tagged primes 107 also comprises a sample specific index and/or molecular barcode (for example, as functional sequence 111).
- FIG. 2 illustrates how a tagged cDNA 205 can be further processed in a target enrichment procedure.
- tagged cDNA 205 has been denatured from the input RNA molecule to form a single-stranded nucleic acid, which is hybridized with one or more bridge probes 213, 215.
- Bridge probes 213, 215 are hybridized with universal probe 217 (also referred to as an anchor probe) conjugated with a binding partner 219 (for example, biotin) of a second binding pair (for example, the binding pair of biotin: streptavidin) wherein streptavidin 221 is the reciprocal binding partner.
- the hybridizations proceed at the same time; in other embodiments, the hybridizations can be sequential.
- the binding partners of the first binding pair do not bind with the binding partners of the second binding pair (biotin: streptavidin).
- target enrichment of the is performed by binding the binding partner 219 (biotin) to its reciprocal binding partner 221 (streptavidin) which is on a solid support 223 (e.g., a magnetic bead).
- bridge probes can be used to hybridize an input nucleic acid molecule and can further allow indirect association between an anchor probe and the input nucleic acid.
- the bridge probe can comprise a target specific region (TSR) that hybridizes to target sequence.
- the bridge probe can comprise an anchor-probe-landing sequence (ALS) that hybridizes to bridge-binding- sequence of anchor probe.
- the bridge probe can comprise a linker connecting TSR and ALS.
- the TSR can be located in the 3 ’-portion of the bridge probe.
- the TSR can be located in the 5’- portion of the bridge probe.
- the bridge probe can comprise DNA.
- the bridge probe can comprise RNA.
- the bridge probe can comprise uracil and methylated cytosine.
- the bridge probe might not comprise of uracil.
- the bridge probe can comprise about 400 nucleotides, about 300 nucleotides, about 200 nucleotides, about 120 nucleotides, about 100 nucleotides, about 90 nucleotides, about 80, about 70 nucleotides, about 50 nucleotides, about 40 nucleotides, about 30 nucleotides, about 20 nucleotides, or about 10 nucleotides.
- the bridge probe can comprise one or more molecular barcodes.
- the bridge probe can comprise one or more binding moieties.
- the binding moiety can be a biotin.
- the binding moieties can be attached to a support.
- the support can be a bead.
- the bead can be a streptavidin bead.
- bridge probes can be used to anneal to multiple target sequences in a sample.
- the bridge probes can be designed to have similar melting temperatures.
- the melting temperatures for a set of bridge probes can be within about 15°C, within about 10°C, within about 5°C, or within about 2°C.
- the melting temperature for one or more bridge probes can be about 75°C, about 70°C, about 65°C, about 60°C, about 55°C, about 50°C, about 45°C, or about 40°C.
- the melting temperature for the bridge probe can be about 40°C to about 75°C, about 45°C to about 70°C, 45°C to about 60°C, or about 52°C to about 58°C.
- a hybridization temperature to form the multiple bridge probe assembly can be higher than the melting temperature of a single bridge probe. The higher temperature can result in a better capture specificity by reducing nonspecific hybridization that can occur at lower temperature.
- the hybridization temperature can be about 5°C, about 10°C, about 15°C, or about 20°C higher than the melting temperature of individual bridge probe.
- the hybridization temperature can be about 5°C to about 20°C higher than the melting temperature of a bridge probe, or about 5°C to about 20°C higher than an average melting temperature of a plurality of bridge probes.
- the hybridization temperature for multiple bridge probes can be about 75°C, about 70°C, about 65°C, about 60°C, about 55°C, or about 50°C.
- the hybridization temperature for multiple bridge probes can be about 50°C to about 75°C, 55°C to about 75°C, 60°C to about 75°C, or 65°C to about 75°C.
- the bridge probe can further comprise a label.
- the label can be fluorescent.
- the fluorescent label can be organic fluorescent dye, metal chelate, carbon nanotube, quantum dot, gold particle, or fluorescent mineral.
- the label can be radioactive.
- the label can be biotin.
- the bridge probe can bind to labeled nucleic acid binder molecule.
- the nucleic acid binder molecule can be antibody, antibiotic, histone, antibody, or nuclease.
- the bridge probe can comprise a linker.
- the linker comprises about 30 nucleotides, about 25 nucleotides, about 20 nucleotides, about 15 nucleotides, about 10 nucleotides, or about 5 nucleotides; any of those numbers can be combined to form a range for the number of nucleotides in a linker.
- the linker can comprise non-nucleic acid polymers (e.g., string of carbons).
- the linker non-nucleotide polymer can comprise about 30 units, about 25 units, about 20 units, about 15 units, about 10 units, or about 5 units; any of those numbers can be combined to form a range for the number of units in a linker.
- the bridge probe can be blocked at the 3’ and/or 5’ end.
- the bridge probe can lack a 5’ phosphate.
- the bridge probe can lack a 3’ OH.
- the bridge probe can comprise a 3’ddC, 3’inverted dT, 3’C3 spacer, 3’ amino, or 3’ phosphorylation.
- the anchor probe or universal anchor probe can comprise one or more bridgebinding-sequences (BBS) that hybridize to anchor-probe-landing sequence of the one or more bridge probes.
- the anchor probe can comprise spacers in between the BBSs. The presence of the one or more spacers can improve the efficiency of the hybridization capture and increase the specificity of the capture.
- the anchor probe can comprise a molecular barcode (MB).
- the anchor probe can comprise BBS to which the one or more bridge probes can hybridize to.
- the anchor probe can comprise from 1 to 100 BBSs.
- the anchor probe can comprise an index for distinguishing samples.
- the molecular barcode or index can be 5’ of the adaptor sequence and 5’ of the BBS.
- the anchor probe can comprise about 400 nucleotides, about 200 nucleotides, about 120 nucleotides, about 100 nucleotides, about 90 nucleotides, about 80 nucleotides, about 70 nucleotides, about 50 nucleotides, about 40 nucleotides, about 30 nucleotides, about 20 nucleotides, or about 10 nucleotides.
- the anchor probe can be about 20 to about 70 nucleotides.
- the melting temperature of anchor probe to the bridge probe can be about 65°C, about 60°C, about 55°C, about 50°C, about 45°C, or about 45°C to about 70°C.
- the anchor probe can comprise a label.
- the label can be fluorescent.
- the fluorescent label can be an organic fluorescent dye, metal chelate, carbon nanotube, quantum dot, gold particle, or fluorescent mineral.
- the label can be radioactive.
- the label can be biotin.
- the anchor probe can bind to labeled nucleic acid binder molecule.
- the nucleic acid binder molecule can be antibody, antibiotic, histone, antibody, or nuclease.
- the label is selected from the binding partners of one of the second binding pairs discussed above.
- FIG. 2 demonstrates how the present methods can be used for target enrichment on a sample containing the input RNA molecule tagged with a first tag.
- Preparation of nucleic acids for sequencing-by-synthesis or other analysis often employs target enrichment, and one or more target enrichment procedures can be included in the present methods. By enriching for one or more desired targets, sequencing or other analysis can be more focused with reduced effort and expense and/or with high coverage depth.
- target enrichment procedures include hybridization-based capture protocols such as SureSelect Hybrid Capture from Agilent and TruSeq Capture from Illumina.
- Other examples include PCR-based protocols such as HaloPlex from Agilent; AmpliSeq from ThermoFisher; TruSeq Amplicon from Illumina; and emulsion/digital PCR from Raindance.
- the present methods also comprise capture of input DNA molecules comprising target sequences.
- the present methods allow efficient capture and enrichment of both cDNA molecules having the sequence of input RNA molecules, and input DNA molecules as well.
- Target enrichment can be performed after synthesis of tagged cDNA molecules, which preserves the ability to distinguish between sequences from input DNA and input RNA molecules.
- Target enrichment can be performed after attaching an adaptor to an input nucleic acid molecule.
- the present methods can be used to handle low input samples.
- the present methods comprise target enrichment by indirect hybridization of the tagged cDNA molecule with an anchor probe through hybridization of one or more bridge probes to the tagged cDNA molecule.
- the one or more bridge probes can be designed to hybridize to particular target sequences in the tagged cDNA molecule.
- An anchor probe in turn can be designed to hybridize to the one or more bridge probes, thereby creating an assembly of three or more hybridized nucleic acid molecules.
- the multi-structure hybridization assembly can act synergistic to provide more stability to the assembly. [0058] Enrichment of an input nucleic acid containing a target sequence can be facilitated by interaction of the input nucleic acid and two or more probes that form a hybridization assembly. The multi-complex assembly can stabilize the hybridization interaction between the input and the enrichment such as bridge probes.
- a bridge probe can comprise a target specific region that hybridizes to a target region of the input nucleic acid and anchor-probe-landing sequence (ALS) that hybridizes to bridge-binding-sequence (BBS) of an anchor probe.
- ALS anchor-probe-landing sequence
- BSS bridge-binding-sequence
- the present methods comprise hybridizing a first target specific region of a first bridge probe to a first target sequence of a molecule with a sequence corresponding to the genome region, wherein a first anchor-probe-landing sequence of the first bridge probe is bound to a first bridge-binding-sequence of an anchor probe; and hybridizing a second target specific region of a second bridge probe to a second target sequence of the molecule with a sequence corresponding to the genome region, wherein a second anchor-probelanding sequence of the second bridge probe is bound to a second bridge-binding-sequence of the anchor probe.
- the anchor probe may comprise a binding moiety.
- the method generally comprises attaching adaptors to the 5’ end or the 3’ ends of nucleic acid molecules of the plurality of nucleic acid molecules, thereby generating a library of nucleic acid molecules comprising adaptors.
- More than two bridge probes per input nucleic acid molecule can be used in the methods disclosed herein. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 50, 75, 100, or more bridge probes can be used to bridge the input nucleic acid and the anchor probe.
- the target enrichment can further comprise hybridizing a second target specific region of a second bridge probe to a second target sequence of the input nucleic acid molecule, wherein a second anchorprobe-landing sequence of the second bridge probe can be bound to a second bridge-binding- sequence of the anchor probe.
- the target enrichment can be conducted after attachment of adaptors or other first tags to the input nucleic acid molecules.
- the bridge probes can further comprise linkers that connect the target specific region and the anchor-probe-landing sequence.
- the adaptor anchor can comprise one or more spacers in between the bridge-binding-sequences. The presence of the one or more spacers can improve the efficiency of the hybridization capture and increase the specificity of the capture.
- the input nucleic acid can be captured and enriched from low-input samples that contain cell-free RNA as well as cell-free DNA (cfDNA) and circulating tumor DNA (ctDNA).
- the capture and enrichment can be done by the indirect association with anchor probe through hybridization with bridge probe.
- the bridge probe and/or anchor probe can comprise one or more binding moieties.
- the binding moiety can be a biotin.
- the binding moieties can be attached to a support.
- the support can be ad.
- the bead can be a streptavidin bead.
- the present methods of capture and enrichment can further include solid phase extraction of the input nucleic acid.
- the bridge probe or anchor probe can be bound to a solid support.
- the bridge probe, or anchor probe can comprise a label.
- the disclosed methods can further comprise capturing to the bridge probe, the anchor probe, or the hybridization complex comprising input nucleic acid molecule, bridge probe, and anchor probe by the label.
- the label can be biotin.
- the label can be a nucleic acid sequence, such as poly A or Poly T, or specific sequence.
- the nucleic acid sequence can be about 5 to 30 bases in length.
- the nucleic acid sequence can comprise DNA and/or RNA.
- the label can be at the 3’ end of the bridge probe, or anchor probe.
- the label can be a peptide, or modified nucleic acid that can be recognized by antibody such as 5-Bromouridine, and biotin.
- the label can be conjugated to the bridge probe, or anchor probe by reactions such as “click” chemistry.
- Click chemistry can allow for the conjugation of a reporter molecule like fluorescent dye to a biomolecule like DNA. Click chemistry can be a reaction between and azide and alkyne that can yield a covalent product (e.g., 1,5-disubstituted 1,2,3-triazole). Copper can serve as a catalyst.
- the label can be captured on a solid support.
- the solid support can be magnetic.
- the solid support can comprise a bead, flowcell, glass, plate, device comprising one or more microfluidic channels, or a column.
- the solid support can be a magnetic bead.
- the solid support (e.g., bead) can comprise (e.g., by coated with) one or more capture moi eties that can bind the label.
- the capture moiety can be streptavidin, and the streptavidin can bind biotin.
- the capture moiety can be an antibody.
- the antibody can bind the label.
- the capture moiety can be a nucleic acid, e.g., a nucleic acid comprising DNA and/or RNA.
- the nucleic acid capture moiety can bind a sequence on, e.g., an anchor probe or bridge probe.
- an anti-RNA/DNA hybrid antibody bound to a solid surface can be used as a capture moiety.
- the label and the capture moiety can bind through one or more covalent or non- covalent bonds.
- the solid support can be washed to remove, e.g., unbound template from the sample. In some cases, no wash step is performed.
- the wash can be stringent or gentle.
- the capture probe or anchor probe that are hybridized to an input nucleic acid molecule can be eluted, e.g., by adding free biotin to the sample when the label is biotin and the capture moiety is streptavidin.
- Cleanup can be performed using streptavidin beads after the input nucleic acid, bridge probe, and anchor probe hybridization, wherein the 3’ end of the anchor probe is biotinylated.
- the input nucleic acid complex hybridized to the bridge probes (and indirectly with the anchor probe) is bound to the bead.
- the input nucleic acid that has not hybridized to the bridge probe can be washed away.
- the 5’ end or the 3’ end of a first and or second bridge probe can be biotinylated.
- streptavidin beads can be used to remove and separate the unhybridized input nucleic acid from input nucleic acid having the target sequence.
- a first tag is attached to an input nucleic acid molecule without an adaptor.
- a nucleic acid can be direct tagged using digoxigenin 3’ end oligonucleotide labeling kits which might improve tagging efficiency over traditional adaptor ligation.
- Such labeling kits are commercially available.
- such an approach potentially limits inclusion of identifiers (sample index / UMI) to support pairing of sequences obtained from original DNA strands.
- the dual tagged cDNA complexes comprising a second tag are immobilized on a solid support such as a plate or a bead. Immobilization of these complexes can facilitate washing to remove any undesired species (e.g., input DNA molecules or contaminants).
- the dual tagged cDNA complex is immobilized on the surface of a flowcell or a glass slide.
- the dual tagged cDNA complex is immobilized on a well or magnetic bead.
- the solid support may be coated with a polymer attached to a functional group or moiety.
- the solid support may carry functional groups such as amino, hydroxyl, or carboxyl groups, or other moieties such as avidin or streptavidin.
- the present methods comprise of attaching adaptors to the input DNA molecules or to amplicons thereof.
- an adaptor is attached to at least one strand of a double-stranded DNA molecule, and usually an adaptor can be a molecule that is at least partially double-stranded.
- An adaptor may be 40 to 150 bases in length, e g., 50 to 120 bases.
- An adaptor can be joined to a 5' end and/or a 3' end of a nucleic acid molecule.
- a Y- adaptor is an adaptor that contains a double-stranded region and a single-stranded region in which the opposing sequences are not complementary.
- the end of the double-stranded region can be joined to target molecules such as double-stranded fragments of genomic DNA, e.g., by via a transposase-catalyzed reaction.
- Each strand of a double-stranded DNA molecule that has been joined to a Y adaptor is asymmetrically tagged in that it has the sequence of one strand of the Y-adaptor at one end and the other strand of the Y-adaptor at the other end.
- Amplification of nucleic acid molecules that have been joined to Y-adaptors at both ends results in an asymmetrically tagged nucleic acid, i.e., a nucleic acid that has a 5' end containing one tag sequence and a 3' end that has another tag sequence.
- the input DNA molecules can be denatured to form a single-stranded molecules which are then amplified using amplification primers to form a double-stranded product, and/or processed by other techniques.
- the processing comprises amplifying a nucleic acid, before and/or after it is attached to an adaptor.
- an adaptor is located at a 5'-end of a target sequence in a cDNA or an input nucleic acid, and the adaptor provides a priming site for amplification of the target sequence.
- Nucleic acid can be amplified using a first amplification primer and a second amplification primer.
- the first amplification primer has sequence specificity for a target sequence in the nucleic acid, and is capable of hybridizing to a portion of the target sequence (a nucleic acid of interest).
- the second amplification primer is capable of hybridizing to a priming site of the adaptor or to a target-specific priming site of the input nucleic acid.
- the first amplification primer hybridizes to the target sequence and the second primer hybridizes to the sequence priming site on the adaptor.
- the first amplification primer hybridizes at the 5'-end of the nucleic acid.
- the primers should be sufficiently large to provide adequate hybridization with the target sequence or other primer binding site.
- An input DNA molecule or a cDNA molecule may be amplified using any suitable method.
- the nucleic acid is amplified using polymerase chain reaction (PCR).
- PCR comprises denaturation of polynucleotide strands (e.g., DNA melting), annealing of primers to the denatured polynucleotide strand, and extension of primers with a polymerase to synthesize the complementary polynucleotide.
- the process generally requires a DNA polymerase, forward and reverse primers, deoxynucleoside triphosphates, bivalent cations, and a buffer solution.
- the nucleic acid is amplified by linear amplification. In some embodiments, the nucleic acid is amplified using Emulsion PCR, Bridge-PCR, or Rolling Circle amplification. The amplicons of the nucleic acid may be analyzed to determine the order of base pairs using a suitable sequencing method.
- the present disclosure provides methods for differential processing of input nucleic acid molecules in a sample.
- a sample containing different types of nucleic acids could be processed in a way that certain nucleic acids (for example, cDNA synthesized using input RNA molecules as templates) are selectively tagged with a first tag as described here.
- certain nucleic acids for example, cDNA synthesized using input RNA molecules as templates
- tagged cDNA molecules with the first tag can be isolated, given a separate index, and/or subjected to different processing steps.
- the separate input DNA molecules and tagged cDNA molecules can then be pooled for combined analysis (such as sequencing) or analyzed separately.
- FIG. 3 illustrates how cDNA made from input RNA molecules in a mixture can be selectively tagged using the present methods.
- a nucleic acid sample contains a mixture of input RNA molecules 301 and input DNA molecules 302.
- a series of adenosine (A) bases 303 are present at the 3' end of the RNA molecules 301 but not to the DNA molecules 302.
- cDNA molecules 305 are synthesized using the input RNA molecules 301 as templates by extension of a tagged primer 307 that hybridizes to the A-tail.
- the tagged primer 307 is conjugated to a binding partner 309 of a first binding pair, so that serves as a first tag.
- double-stranded adaptors 325 are ligated to the DNA molecules 302.
- a single-stranded adaptor 313 is ligated to the cDNA 305.
- the adaptors 325 are shown without tags or binding partners attached thereto, but in some embodiments, the adaptors comprise one or more tags or binding partners of binding pairs that are different from the first tag or the binding partner 335 of a first binding pair.
- tagged cDNA molecules 305 can be tagged with a second tag 327 by attaching the second tag 327 to the first tag, more specifically to the binding partner 309 of the first binding pair, to form a dual tagged cDNA complex 330.
- the second tag 327 comprises a reciprocal binding partner 329 of the first binding pair.
- the second tag 338 also comprises a binding partner 331 of a second binding pair.
- the binding partner 309 of the first binding pair is DIG
- the reciprocal binding partner 329 of the first binding pair is an anti-DIG antibody
- the binding partner 331 of the second binding pair is a biotin moiety.
- the dual tagged cDNA complex 330 can be separated from the DNA constructs 333 by binding the binding partner 331 (biotin) to its reciprocal binding partner 335 (streptavidin) which is on a solid support 337 (e.g., a magnetic bead).
- a cDNA fraction 339 can be treated with a different procedure than the DNA constructs 333 in a DNA fraction 341.
- the DNA fraction 341 and the RNA fraction 339 can be pooled in specific ratios for sequencing or analysis, or analyzed separately.
- the present method also comprises processing cDNA molecules by attaching one or more adaptors to a tagged cDNA molecule, to a dual tagged cDNA complex, or to an amplicon thereof.
- An adaptor can be attached before or after the first and/or second tag is removed from the cDNA molecule.
- the adaptor can be attached before or after amplification of the cDNA, and in some embodiments the adaptor is attached before amplification.
- the adaptor can be attached by any suitable technique, such as by ligation, use of a transposase, hybridization, and/or primer extension.
- the cDNA molecule or amplicon thereof is ligated with an adaptor at one or both ends.
- a covalent bond or linkage is formed between the termini of two or more nucleic acid molecules (such as an input and an adaptor).
- the nature of the bond or linkage may vary, and the ligation may be carried out enzymatically or chemically.
- Ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5' carbon of a terminal nucleotide of one polynucleotide or oligonucleotide with 3' carbon of another polynucleotide or oligonucleotide.
- the adaptor is a Y-adaptor. Other examples of adaptors including linear adaptors, circular adaptors, and bubble adaptors. Sequencing The Input Nucleic Acid Molecules
- a high-throughput sequencing method such as a Next Generation Sequencing (NGS) method.
- a high-throughput sequencing method comprises three steps: library preparation, immobilization, and sequencing.
- DNA is often subjected to fragmentation, and adaptors are attached to one or both ends of the fragments or other nucleic acids to form a sequencing library.
- the sequencing library molecules are immobilized on a solid support, and sequencing reactions are performed to identify the nucleic acid sequence.
- the high-throughput sequencing method may employ Emulsion PCR, Bridge-PCR, or Rolling Circle amplification to provide colonies or copies of the input nucleic acid molecules.
- the cDNA molecules synthesized from input RNA molecules are sequenced without amplification.
- a dual tagged cDNA complex can be attached or immobilized to a solid substrate which has a reciprocal binding partner for the second tag of the complex.
- the dual tagged cDNA complex comprises a biotin molecule in its second tag, it can be immobilized on a streptavidin-coated surface of a flowcell or bead.
- the cDNA can be sequenced by any suitable technique such as sequencing- by-synthesis or sequencing-by-hybridization.
- the immobilized the cDNA molecule is sequenced using a single-molecule sequencing platform, such as the methods discussed in Wbhrstein et al. US Patent 10,851,411.
- the present methods comprise aligning sequence reads of the input nucleic acids.
- the sequence reads may be processed and grouped in any suitable way.
- the sequence reads may be initially grouped by the fragment sequence and/or the identifier(s).
- initial processing of the sequence reads may include identification of molecular barcodes (including sample identifier sequences or subsample identifier sequences), and/or trimming reads to remove low quality or adaptor sequences.
- quality assessment metrics can be run to ensure that the dataset is of an acceptable quality.
- the cDNA molecules synthesized from input RNA molecules (or amplicons thereof) can be further analyzed using various methods including southern blotting, polymerase chain reaction (PCR) (e.g., real-time PCR (RT-PCR), digital PCR (dPCR), droplet digital PCR (ddPCR), quantitative PCR (Q-PCR), nCounter analysis (Nanostring technology), gel electrophoresis, DNA microarray, mass spectrometry (e.g., tandem mass spectrometry, matrix- assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS), chain termination sequencing (Sanger sequencing), or next generation sequencing.
- PCR polymerase chain reaction
- RT-PCR real-time PCR
- dPCR digital PCR
- dddPCR droplet digital PCR
- Q-PCR quantitative PCR
- nCounter analysis NeCounter analysis
- gel electrophoresis DNA microarray
- mass spectrometry e.g., tandem mass spect
- the next generation sequencing can comprise 454 sequencing (ROCHE) (using pyrosequencing), sequencing using reversible terminator dyes (ILLUMINA sequencing), semiconductor sequencing (THERMOFISHER ION TORRENT), single molecule real time (SMRT) sequencing (PACIFIC BIOSCIENCES), nanopore sequencing (e.g., using technology from OXFORD NANOPORE or GENIA), microdroplet single molecule sequencing using pyrophosphorolyis (BASE4), single molecule electronic detection sequencing, e.g., measuring tunnel current through nanoelectrodes as nucleic acid (DNA/RNA) passes through nanogaps and calculating the current difference (QUANTUM SEQUENCING from QUANTUM BIOSYSTEMS), GenapSys Gene Electornic Nano-Integrated Ultra-Sensitive (GENIUS) technology (GENAPYS), GENEREADER from QIAGEN, sequencing using sequential hybridization and ligation of partially random oligonucleotides with a central determined base (ROCHE)
- the performance of a panel or method for capturing targets or preparing a NGS library may be defined by a number of different metrics describing efficiency, accuracy, and precision. Such metrics can be obtained by sequencing the captured nucleic acid molecules or amplicons thereof. For example, coverage percentage region-wide (0.2X or 0.5X), coverage percentage base-wide, target coverage, depth of coverage, fold enrichment, percent mapped, percent on-target, AT or GC dropout rate, fold 80 base penalty, percent zero coverage targets, PF reads, percent selected bases, percent duplication, or other variables can be used to characterize a library.
- the number of target sequences from a sample that can be sequenced using methods described herein can be about 5, 10, 15, 25, 50, 100, 1000, 10,000, 100,000, or 1,000,000, or about 5 to about 100, about 100 to about 1000, about 1000 to about 10,000, about 10,000 to about 100,000, or about 100,000 to about 1,000,000.
- Nucleic acid libraries generated using methods described herein can be generated from more than one sample. Each library can have a different index associated with the sample.
- a capture probe or an anchor probe can comprise an index that can be used to identify nucleic acids as coming from the same sample (e.g., a first set of capture probes or anchor probes comprising the same first index can be used to generate a first library from a first sample from a first subject, and a second set of capture probes or anchor probes comprising the same second index can be used to generate a second library from a second sample from a second subject, the first and second library can be pooled, sequenced, and an index can be used to discern from which sample a sequenced nucleic acid was derived).
- Amplified products generated using the methods described herein can be used to generate libraries from at least 2, 5, 10, 25, 50, 100, 1000, or 10,000 samples, each library with a different index, and the libraries can be pooled and sequenced, e.g., using a next generation sequencing technology.
- the sequencing can generate at least 100, 1000, 5000, 10,000, 100,000, 1,000,000, or 10,000,000 sequence reads.
- the sequencing can generate between about 100 sequence reads to about 1000 sequence reads, between about 1000 sequence reads to about 10,000 sequence reads, between about 10,000 sequence reads to about 100,000 sequence reads, between about 100,000 sequence reads and about 1,000,000 sequence reads, or between about 1,000,000 sequence reads and about 10,000,000 sequence reads.
- the depth of sequencing can be about lx, 5x, lOx, 50x, lOOx, lOOOx, or 10,000x.
- the depth of sequencing can be between about lx and about lOx, between about lOx and about lOOx, between about lOOx and about lOOOx, or between about lOOOx and about lOOOOx.
- FIG. 3 illustrates a method of obtaining a DNA fraction 347 and a RNA fraction 346 from the same sample.
- the enriched input nucleic acids may be analyzed by sequencing, or may be bisulfide treated (or enzymatically treated) prior to sequencing to assess methylation.
- a first fraction may be analyzed by sequencing to assess mutations while a second fraction is bisulfide or enzymatically treated prior to sequencing to assess methylation.
- a first fraction and a second fraction are both assessed by straightforward sequencing to access genomic alteration; however the samples may be sequenced at different depths.
- an analysis of a first fraction may be performed prior to performing a second target enrichment step. The results of the analysis of the first fraction sample may be used to select a second panel for the second enrichment step.
- the present methods can be compatible clinical samples with over a large range of amounts of input nucleic acid material.
- the present methods can be used sequence samples with input nucleic acid molecules of less than 5 ng, less than 4 ng, less than 3 ng, less than 2 ng, or less than 1 ng.
- the target specific sequence or target specific region (TSR) of a capture probe or a bridge probe can be designed based on the target sequence of the input nucleic acid molecule.
- kits which comprise first and second tagging reagents for making tagged cDNA molecules and dual tagged cDNA complexes constructs as described herein.
- a first tagging reagent comprises a tagged primer comprising first tag (according to any of the embodiments described herein) in a composition that comprises a solvent or other components.
- a second tagging reagent comprises a second tag (according to any of the embodiments described herein) in a composition.
- the kits can comprise the first and second tagging reagents in one or more vessels, such as vials, tubes, etc.
- the present kits comprise one or more tagged primers comprising functional sequences configured to be attached to an end of the cDNA molecules synthesized from input RNA molecules, such as adaptors and/or one or more identifiers such as UMI sequences.
- the first tags comprise a binding partner of a first binding pair, and the present kits can also comprise one or more second tags comprising a reciprocal binding partner of the first binding pair.
- the present kits further comprise one or more bridge probes that comprises a target specific region which hybridizes to a target sequence of an input nucleic acid molecule; and an anchor probe that comprises a bridge-binding-sequence which hybridizes to an anchor-probe-landing sequence of the bridge probe.
- the kit comprises two, three or more bridge probes.
- kits may further include instructions for using the components of the kit to practice the present methods, i.e., to prepare nucleic acids for sequencing.
- the instructions for practicing the present methods are generally recorded on a suitable recording medium.
- the instructions may be printed on a substrate, such as paper or plastic, etc.
- the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc.
- the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD- ROM, portable drive, or cloud-based storage, etc.
- the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g., via the internet, are provided.
- An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
- nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
- the present technology may employ, unless otherwise indicated, techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art.
- techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label.
- a primer refers to one or more primers, i.e., a single primer and multiple primers.
- a “plurality” contains at least 2 members. In certain cases, a plurality may have at least 10, at least 100, at least 100, at least 10,000, at least 100,000, at least 10 6 , at least 10 7 , at least IO 8 or at least IO 9 or more members.
- target refers to a nucleic acid of interest, or which is desired for sequencing and/or other analysis.
- One or more targets may be present within an input nucleic acid, or a construct or complex made from an input nucleic acid.
- a target may be singlestranded or double-stranded, and often is double-stranded DNA when attached to an adaptor to form a nucleic acid construct.
- Target as used herein can refer to a specific sequence or the complement thereof or to both.
- the term target encompasses any nucleic acid molecule of biological or synthetic origin whose sequence or other characteristic is of interest.
- the target sequence does not include identifiers, primer binding regions, or adaptors sequences which may be added to the input nucleic acid molecule to prepare an input nucleic acid construct for sequencing or other analysis.
- a target may be within a nucleic acid in vitro or in vivo within the genome of a cell, or within the cytoplasm of a cell (such as RNA), or with a biological fluid (such as blood, plasma, amniotic fluid, or other biological sample).
- the term “input” refers to a nucleic acid molecule to be processed in accordance with the present methods. For example, an input nucleic acid molecule may be present in a nucleic acid sample.
- the input may include one or more target sequences of interest, or it may include other sequences from which a target is desired to be separated.
- an input nucleic acid comprises one or more sequences complementary to sequences of one or more capture probes, bridge probes, or other types of probes.
- amplifying and “amplification” as used herein refer to synthesizing nucleic acid molecules that are complementary to one or both strands of an input nucleic acid.
- Amplifying a nucleic acid molecule may include denaturing a double-stranded input nucleic acid, annealing primers the input nucleic acid at a temperature that is below the melting temperatures of the primers, and enzymatically elongating from the primers to generate an amplification product.
- amplicon or “amplification product” refer to the nucleic acid sequences which are produced from an amplifying process, including the nucleic acid molecules synthesized by amplifying the input nucleic acid or its complementary sequence, as well as the nucleic acid molecules synthesized from other amplicons.
- the denaturing, annealing and elongating steps each can be performed one or more times. Amplification generally does not change the target or input nucleic acid sequence unless errors arise during the amplification.
- Amplification typically requires the presence of deoxyribonucleoside triphosphates, a DNA polymerase enzyme and an appropriate buffer and/or co-factors for optimal activity of the polymerase enzyme.
- Reverse transcription is a linear amplification reaction that employs a specialized DNA polymerase (reverse transcriptase) to copy RNA into cDNA (complementary DNA) using deoxyribonucleoside triphosphates.
- the term “adaptor” generally refers to a nucleic acid molecule that is attached to an input nucleic acid molecule to add a desired structure or function.
- a tag also generally refers to a moiety that can add a desired structure or function, though it is contemplated that a tag may be a nucleic acid molecule, a molecule other than a nucleic acid, or a combination thereof.
- a “tag” as used herein can comprise an adaptor conjugated to a non-nucleic acid binding partner such as DIG.
- a “tag” as used herein can comprise an antibody conjugated to a biotin moiety.
- an adaptor can be attached to an input fragment or an amplicon thereof to add a binding site for a NGS platform.
- an adaptor refers to molecules that are at least partially double-stranded.
- An adaptor or a tag may be any desired length, including but not limited to 40 to 150 bases in length, e.g., 50 to 120 bases, although adaptors and tags outside of this range are envisioned.
- identifier refers to a sequence of nucleotides used to identify the origin of a sequence.
- Identifiers may comprise sample indices or sample barcodes, where the same sequence is shared for all nucleic acids from a particular source, organism, or sample.
- Sample barcodes enable the mixing of nucleic acids from different samples in one sequencing run, as the different sample barcode sequences enable the correct assignment of sequencing reads to each sample.
- One, two, or more sample barcodes may be used.
- Identifiers also comprise molecular barcodes (MBCs) or unique molecular identifier (UMI) sequences, which function to identify copies of individual input nucleic acid molecules.
- UMIs may comprise random nucleotides, known nucleotides, or a mixture of random and known nucleotides. UMIs enable more accurate sequencing by allowing error correction of sequences and more accurate estimation of the original number of input nucleic acids. In some embodiments, a large number of UMIs is used (e.g., 100,000, 1 million, 1 billion, or more possible sequences) such that each input nucleic acid has a unique molecular barcode. Molecular barcodes called degenerate base regions (DBR) are disclosed in US Patent 8,481,292 (Population Genetics Technologies Ltd.). The DBRs are random sequence tags that are attached to molecules that are present in the sample. DBRs and other molecular barcodes allow one to distinguish PCR errors during sample preparation from mutations and other variants that were present in the original input nucleic acid.
- DBR degenerate base regions
- a smaller number of molecular barcodes is used, and the beginning or ending positions (or both) of the sequence read are used together with the molecular barcode to identify copies arising from a unique input nucleic acid.
- Molecular barcodes may be combined with sample barcodes, on the same or different portions of the target nucleic acid. Molecular barcodes may be added to one end of a nucleic acid template (e.g., the 5’ end of the + strand, and the 3’ end of the - strand in a duplex), or to both ends of an input nucleic acid (e.g., to both the 5; and the 3’ ends of both the + and the - strands of the duplex).
- sample as used herein relates to a material or mixture of materials containing one or more nucleic acids of interest.
- the term refers to any plant, animal or viral material containing DNA, RNA, or other nucleic acid, such as, for example, tissue or fluid isolated from a patient (including without limitation plasma, serum, amniotic fluid, cerebrospinal fluid, lymph, tears, saliva and tissue sections), from preserved tissue (such as FFPE sections) or from in vitro cell culture constituents, as well as samples from the environment.
- tissue or fluid isolated from a patient (including without limitation plasma, serum, amniotic fluid, cerebrospinal fluid, lymph, tears, saliva and tissue sections), from preserved tissue (such as FFPE sections) or from in vitro cell culture constituents, as well as samples from the environment.
- Any sample containing nucleic acid e.g., genomic DNA from tissue culture cells or from a sample of tissue, may be employed in the present technology.
- nucleic acid sample denotes a sample containing nucleic acids.
- the nucleic acid samples may be complex in that they contain multiple different molecules that contain sequences.
- Nucleic acid samples from a mammal e g., mouse or human
- Complex samples may have more than 10 4 , 10 5 , 10 6 or 10 7 different nucleic acid molecules.
- a complex sample may comprise only a few molecules, where the molecules collectively have more than 10 4 , 10 5 , 10 6 or 10 7 or more nucleotides.
- complexity generally refers the total number of different sequences in a population, such as in a population of fragments, adaptors, or adaptor-ligated fragments. For example, if a population has 4 different sequences, then that population has a complexity of 4. A population may have a complexity of at least 4, at least 8, at least 16, at least 100, at least 1,000, at least 10,000 or at least 100,000 or more, depending on the desired result.
- nucleotide refers to a phosphate ester of a nucleoside, wherein the esterification site typically corresponds to the hydroxyl group attached to the C-5 position of the pentose sugar. In some cases nucleotides comprise nucleoside polyphosphates. However, the terms “added nucleotide,” “incorporated nucleotide,” “nucleotide added” and “nucleotide after incorporation” all refer to a nucleotide residue that is part of an oligonucleotide or polynucleotide chain.
- nucleotide refers to naturally-occurring nucleotides including guanine, cytosine, adenine, thymine, uracil (G, C, A, T and U respectively), as well as modified pyrimidine and purine derivatives and other non-naturally occurring moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
- nucleotide includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, are functionalized as ethers, amines, or the likes.
- nucleic acid and “polynucleotide” are used interchangeably herein to describe a nucleotide-containing polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced naturally, chemically, enzymatically or synthetically.
- the term includes polymers having PNA, LNA or UNA.
- DNA and RNA have a deoxyribose and ribose sugar backbone, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds.
- PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds.
- a locked nucleic acid (LNA) often referred to as inaccessible RNA, is a modified RNA nucleotide.
- the ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2' oxygen and 4' carbon.
- LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired.
- the term “unstructured nucleic acid”, or “UNA”, is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability.
- an unstructured nucleic acid may contain a G’ residue and a C’ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively.
- nucleoside is intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
- nucleoside is intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
- nucleoside is intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
- nucleoside is
- deoxynucleotide include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides, deoxynucleosides or deoxynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
- Natural nucleotides or nucleosides are defined herein as adenine (A), thymine (T), guanine (G), and cytosine (C). It is recognized that certain modifications of these nucleotides or nucleosides occur in nature. However, modifications of A, T, G, and C that occur in nature that affect hydrogen bonded base pairing are considered to be non-naturally occurring. For example, 2-aminoadenosine is found in nature, but is not a “naturally occurring” nucleotide or nucleoside as that term is used herein.
- nucleotides include any nucleotide or nucleotide analog, whether naturally-occurring or synthetic.
- Exemplary nucleotides include phosphate esters of deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine, adenosine, cytidine, guanosine, and uridine.
- nucleotides include an adenine, cytosine, guanine, thymine base, a xanthine or hypoxanthine, 5 -bromouracil, 2- aminopurine, deoxyinosine, or methylated cytosine, such as 5-methylcytosine, and N4- methoxydeoxycytosine.
- bases of polynucleotide mimetics such as methylated nucleic acids, e.g., 2'-O-methRNA, peptide nucleic acids, modified peptide nucleic acids, locked nucleic acids and any other structural moiety that can act substantially like a nucleotide or base, for example, by exhibiting base-complementarity with one or more bases that occur in DNA or RNA and/or by being capable of base-complementary incorporation, and includes chainterminating analogs.
- a nucleotide corresponds to a specific nucleotide species if they share basecomplementarity with respect to at least one base.
- modified nucleotides or analogs include any compound that can form a hydrogen bond with one or more naturally occurring nucleotides or with another nucleotide analog. Any compound that forms at least two hydrogen bonds with T or with a derivative of T is considered to be an analog of A or a modified A. Similarly, any compound that forms at least two hydrogen bonds with A or with a derivative of A is considered to be an analog of T or a modified T. Similarly, any compound that forms at least two hydrogen bonds with G or with a derivative of G is considered to be an analog of C or a modified C.
- any compound that forms at least two hydrogen bonds with C or with a derivative of C is considered to be an analog of G or a modified G. It is recognized that under this scheme, some compounds will be considered for example to be both A analogs and G analogs (purine analogs) or both T analogs and C analogs (pyrimidine analogs).
- nucleic acid construct refers to a nucleic acid that is ligated or otherwise attached to another nucleic acid, such as an adaptor.
- a nucleic acid construct may contain a nucleic acid molecule to be sequenced, a capture site for flowcell attachment, one or more identifier sequences such as SBC and UMI, and primer binding sites for a first and second primer.
- capture site refers to a nucleic acid sequence configured for attachment of a nucleic acid construct to a flowcell or other surface, for NGS sequencing or other analysis processing.
- identifier refers to a nucleic acid sequence that can be used to identify a particular nucleic acid construct.
- An “identifier” may be a “sample barcode” or “SBC” sequence for identifying a particular biological sample.
- An “identifier” may also refer to a “molecular barcode” for identification of unique molecules present in the sample.
- an “identifier” may contain both an SBC and an UMI.
- antibody is well understood by those in the field and is used interchangeably herein with “immunoglobulin” Those terms refer to a protein consisting of one or more polypeptides that specifically binds an antigen.
- an antibody is the naturally occurring structural unit found in humans and other mammals which comprises a tetramer of two identical pairs of antibody chains, each pair having one light and one heavy chain. In each pair, the light and heavy chain variable regions are together responsible for binding to an antigen, and the constant regions are responsible for the antibody effector functions.
- antibody encompasses monoclonal antibodies, polyclonal antibodies, chimeric antibodies, humanized antibodies, human antibodies, murine antibodies, rabbit antibodies, camelid antibodies, and antibodies from other mammalian and non-mammalian species.
- the term antibody also encompasses single-chain antibodies, bi-specific hybrid antibodies, and fusion proteins comprising an antigen-binding portion of an antibody and a nonantibody protein.
- the term antibody also encompasses includes antigen-binding fragments of antibodies which retain specific binding to antigen, including, but not limited to, Fab, Fv, scFv, and Fd fragments.
- binding pair refers to a pair of binding partners that exhibit specific binding between them.
- a binding pair can selectively interact through covalent or non-covalent binding. In some embodiments, a binding pair can selectively interact by hybridization, ionic bonding, hydrogen bonding, van der Waals interactions, or any combination of these forces.
- a binding partner can comprise, for example, biotin, avidin, streptavidin, digoxigenin, inosine, avidin, GST sequences, modified GST sequences, biotin ligase recognition (BiTag) sequences, S tags, SNAP -tags, enterokinase sites, thrombin sites, antibodies or antibody domains, antibody fragments, antigens, receptors, receptor domains, receptor fragments, or combinations thereof.
- binding pairs include biotimavidin, biotin: streptavidin, antibody: antigen, complementary nucleic acids, hapten/antibody, lectin/carbohydrate, apoprotein/cofactor and biotin/streptavidin, as well as others set forth above.
- specific binding refers to the ability of a binding partner to preferentially bind to its reciprocal binding partner that is present in a homogeneous mixture of different molecules. In some embodiments, specific binding discriminates between a reciprocal binding partner and other molecules by at least 100-fold, 1000-fold, 10,000-fold, 100,000-fold, or more. In some embodiments, the affinity between binding partners of a binding pair when they are specifically bound in a complex is characterized by a KD (dissociation constant) of less than 10' 6 M, less than 10‘ 7 M, less than 10' 8 M, less than 10‘ 9 M, less than IO- 10 M, less than 10’ 11 M, or less than about 10‘ 12 M, or less.
- KD dissociation constant
- a “capture binding partner” refers to a binding partner that is configured to capture (e.g., isolate, purify, immobilize, extract) a nucleic acid tagged with its reciprocal binding partner.
- capture e.g., isolate, purify, immobilize, extract
- streptavidin coated on a bead would be a capture binding partner for an input nucleic acid complex having a tag comprising a biotin moiety.
- a capture binding partner and its reciprocal binding partner may comprise any suitable binding pair.
- Embodiment 1 A method of preparing RNA molecules for analysis comprising providing a sample comprising input DNA molecules and input RNA molecules; selectively hybridizing a tagged primer to the input RNA molecules, wherein the tagged primer comprises a first tag comprising a binding partner of a first binding pair; and extending the primer to produce tagged cDNA molecules.
- Embodiment 2 The method of embodiment 1, wherein the input RNA molecules comprise a primer binding site that hybridizes with the tagged primer, and the input DNA molecules do not comprise the same primer binding site.
- Embodiment 3 The method of embodiment 2, wherein the primer binding site is poly(A).
- Embodiment 4 The method of embodiment 2, wherein the primer binding site is ligated to the input RNA molecules before the hybridizing of the tagged primer.
- Embodiment 5 The method of any of embodiments 1 to 4, wherein the first tag is selected from digoxigenin, 5-bromo-2’-deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotri acetic acid or a nitrilotriacetate (NT A) such as nickel nitrilotriacetate (Ni-NTA), trisNitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
- NT A nitrilotriacetate
- Ni-NTA nickel nitrilotriacetate
- trisNitrilotriacetate trisNitrilotriacetate
- tyramine a thiol
- an amine an aldehyde
- alkyne an azide.
- Embodiment 6 The method of embodiment 5, wherein the first tag is digoxigenin
- Embodiment 7 The method of any of embodiments 1-6, further comprising attaching a second tag to the first tag to produce a dual tagged cDNA complex, wherein the second tag comprises a reciprocal binding partner of the first binding pair.
- Embodiment 8 The method of embodiment 7, wherein the reciprocal binding partner in the second tag is selected from an anti-DIG antibody, an anti-BrdU antibody, an anti- DNP antibody, poly-Histidine, a tyrosine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
- the reciprocal binding partner in the second tag is selected from an anti-DIG antibody, an anti-BrdU antibody, an anti- DNP antibody, poly-Histidine, a tyrosine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
- Embodiment 9 The method of embodiment 8, wherein the second tag comprises anti-DIG antibody as the reciprocal binding partner of the first binding pair.
- Embodiment 10 The method of any of embodiments 7 to 9, wherein the second tag further comprises a binding partner of a second binding pair.
- Embodiment 11 The method of embodiment 10, wherein the binding partner of the second binding pair is selected from biotin, 5-bromo-2’ -deoxyuridine (BrdU), 2,4- dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NT A) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide, with the proviso that the binding partner of the second binding pair is not the binding partner or the reciprocal binding partner of the first binding pair.
- NT A nitrilotriacetate
- Ni-NTA nickel nitrilotriacetate
- tris-Nitrilotriacetate tris-Nitrilotriacetate
- tyramine a thi
- Embodiment 12 The method of embodiment 11, wherein a reciprocal binding partner of the second binding pair is selected from an avidin moiety, an anti-DIG antibody, an anti-BrdU antibody, an anti-DNP antibody, poly-Histidine, a tyrosine residue, a thiol, an amine, an aldehyde, an alkyne, and an azide.
- a reciprocal binding partner of the second binding pair is selected from an avidin moiety, an anti-DIG antibody, an anti-BrdU antibody, an anti-DNP antibody, poly-Histidine, a tyrosine residue, a thiol, an amine, an aldehyde, an alkyne, and an azide.
- Embodiment 13 The method of embodiment 10, wherein the binding partner of the second binding pair is biotin, and a reciprocal binding partner of the second binding pair is an avidin moiety.
- Embodiment 14 The method of embodiment 13, wherein the first tag is digoxigenin (DIG), and the second tag comprises anti-DIG antibody as the reciprocal binding partner of a first binding pair comprising the first tag and the second tag.
- DIG digoxigenin
- Embodiment 15 The method of any of embodiments 1-14, wherein the first tag is covalently bound to the tagged cDNA molecule.
- Embodiment 16 The method of any of embodiments 1-15, wherein the second tag is non-covalently bound to the first tag.
- Embodiment 17 The method of any of embodiments 10-16, further comprising separating the dual tagged cDNA complex from the input DNA molecules by binding the partner of the second binding pair to its reciprocal binding partner.
- Embodiment 18 The method of any of embodiments 12 to 17, wherein the reciprocal binding pair of the second binding pair is attached to a solid support.
- Embodiment 19 The method of embodiment 18, wherein the second tag comprises an antibody conjugated with biotin, and the reciprocal binding partner of the second binding pair is an avidin moiety coated on a solid support.
- Embodiment 20 The method of embodiment 1, further comprising hybridizing the tagged cDNA molecules to a probe.
- Embodiment 21 The method of embodiment 7, further comprising hybridizing the dual tagged cDNA complex to a probe.
- Embodiment 22 The method of embodiment 20 or 21, wherein the probe is a bridge probe.
- Embodiment 23 The method of embodiment 22, further comprising hybridizing the bridge probe with an anchor probe.
- Embodiment 24 The method of embodiment 23, the anchor probe is attached to a biotin moiety.
- Embodiment 25 The method of any of embodiments 1 to 24, further comprising attaching adaptors to the input DNA molecules, wherein the adaptors do not comprise the first tag.
- Embodiment 26 The method of embodiment 25, further comprising processing the input DNA molecule to produce processed DNA molecules.
- Embodiment 27 The method of embodiment 26, wherein the processing is amplification, and the processed DNA molecules are amplicons of the input DNA molecules.
- Embodiment 28 The method of embodiment 27, further comprising processing the tagged cDNA molecules to produce processed cDNA molecules.
- Embodiment 29 The method of embodiment 28, further comprising combining the processed DNA molecules and the processed cDNA molecules for sequencing.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods, kits and compositions for selectively tagging RNA molecules and for separating RNA sequences from DNA sequences. Methods for analysis of RNA sequences by preparing tagged cDNA molecules which can be separated from DNA molecules. The DNA molecules can be subjected to different preparation steps than the tagged cDNA molecules.
Description
METHODS OF SELECTIVELY TAGGING NUCLEIC ACIDS FOR ANALYSIS OF RNA SEQUENCES
CROSS-REFERENCE TO RELATED APPLICATIONS
[001] This application claims the benefit of U.S. Provisional Application No. 63/636,497, filed April 19, 2024, which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
[002] The present invention relates to preparation of nucleic acids for sequencing or other analysis. The present invention also relates to target enrichment of nucleic acids.
BACKGROUND
[003] Biological samples may contain nucleic acid molecules to be analyzed. For instance, plasma or serum samples used as liquid biopsies may contain DNA and RNA molecules present in relatively amounts but with potentially high value for diagnostic purposes. In some circumstances, it is desirable to analyze both DNA and RNA molecules which are present in a sample, and some diagnostic assays distinguish between them. Some methods for extracting nucleic acid from a sample isolate both DNA and RNA from the sample, but additional steps must be performed if one wishes to distinguish or separate between DNA and RNA molecules. [004] DNA and RNA molecules in plasma or serum are released into circulation by different biological processes. DNA molecules usually come from apoptotic or necrotic cells, whereas RNA molecules are mostly from living and actively dividing cells. Distinguishing between the types of nucleic acids within a sample can provide further information and diagnostic value. Additionally, analysis of nucleic acids, especially by sequencing, typically
requires one or more processing steps, and it may be desirable to process the RNA molecules differently than DNA molecules.
[005] RNA can be separated from DNA in a sample in various ways. For instance, samples can be mixed with a phenol: chloroform mixture and centrifuged. Because the phenol: chloroform mixture is immiscible with water, centrifugation forms two distinct phases: an upper aqueous phase and a lower organic phase. Proteins will remain at the interphase between the two phases, while the nucleic acids (as well as other contaminants such as salts, sugars, etc.) remain in the upper aqueous phase. If the mixture is acidic, DNA precipitates into the organic phase while RNA remains in the aqueous phase. The upper aqueous phase can then be pipetted off, and the RNA can be separated from contaminants.
[006] Analysis of nucleic acid from a sample can be performed in many different ways. Some techniques are designed to analyze all the nucleic acid from a sample (e.g., whole genome or whole transcriptome analysis), while others are designed to analyze a portion or only selected genes or transcripts. Nucleic acid target capture methods can allow specific genes, exons, and other genomic regions of interest to be enriched for targeted sequencing or other analysis. However, target capture-based sequencing methods can involve cumbersome lengthy protocols and costly processes, as well as a low on-target rate for a small capture panel (e.g., less than 500 probes). Moreover, current methods for nucleic acid target capture can be ill-suited for low input and damaged DNA because of a low recovery rate.
[007] Sequencing of nucleic acids often provides extremely valuable information from the sample. Next-Generation Sequencing (NGS) methods and systems involve the parallel sequencing of a library of nucleic acids by a sequencing platform. Preparation of a sequencing library generally includes various steps such as amplification of the nucleic acids, attachment of
adaptors, and/or other preparatory steps. Emerging polynucleotide sequencing platforms can enable direct detection and analysis of nucleic acid molecules without the need for amplification, though the nucleic acids often require an adaptor or other moiety to immobilize the nucleic acid for sequencing steps. An adaptor can be attached to one or both ends of nucleic acid molecules in order to add sites for primer binding and for immobilization of the nucleic acid on a surface such as a flowcell or a bead, and to add other functional sequences to the fragments. Various kinds of adaptors are used in sequencing preparation kits to add these sites or sequences to the nucleic acids from the sample. Adaptors can be attached in various ways, such as by ligation, primer extension, tagmentation, and other techniques.
[008] A sequencing library can be generated in a variety of ways, with different objectives regarding the nucleic acids to be used as inputs. For instance, PCR can be used with targetspecific primers to generate a library of amplicons covering regions of interest in the nucleic acid sample. Other methods of library preparation involve random fragmentation of the nucleic acid sample by enzymatic or physical shearing methods, followed by amplification using common adaptor sequences. Enrichment procedures are used to remove or separate sequences of interest from the rest of the sample.
[009] There remains a need for improved methods for separating RNA molecules from DNA molecules. There is also a need for methods of identifying or enriching RNA molecules for sequencing or other analysis and/or for target enrichment. There is a need for improved methods of determining the sequence of RNA molecules and of preparing copies of RNA for sequencing.
SUMMARY
[0010] The present disclosure provides methods of preparing nucleic acid molecules for sequencing or other analysis. As one aspect, methods are provided for use with a sample comprising input DNA molecules and input RNA molecules. Tagged cDNA molecules are synthesized by hybridizing a tagged primer to the input RNA molecules, and extending the primer to produce the tagged cDNA molecules. The tagged primer comprises a first tag, and the first tag comprises a binding partner which is a member of a first binding pair. In some embodiments, the methods also comprise attaching a second tag to the first tag to produce a dual tagged cDNA complex. The second tag comprises a reciprocal binding partner of the first binding pair, thereby facilitating attachment of the second tag. For example, the first tag can be digoxigenin (DIG), and the second tag can comprise anti-DIG antibody. In this example, DIG and anti-DIG antibody constitute the first binding pair.
[0011] The second tag can also comprise a binding partner of a second binding pair, and the present methods can comprise attaching the second tag to a reciprocal binding partner of the second binding pair, which itself may be attached to the solid support. For example, the second tag can comprise a biotin moiety attached to an anti-DIG antibody, and the reciprocal binding partner of the second binding pair can be a streptavidin coated bead; in this example, biotin and streptavidin constitute the second binding pair.
[0012] The present methods can further comprise one or more steps for capture or target enrichment of a tagged cDNA molecule. For instance, the tagged cDNA molecule can be hybridized with a probe, such as a capture probe or a bridge probe which is hybridized with an anchor probe. The capture probe or anchor probe can be attached to an enrichment tag such as a biotin moiety.
[0013] In some embodiments, the present methods further comprise processing the tagged cDNA molecules and the input DNA molecules separately. For instance, the processing may be amplification to produce amplicons of the input molecules. The present methods can also comprise re-combining the processed cDNA molecules and the process input DNA molecules (or amplicons thereof) for sequencing or other analysis.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 illustrates synthesis of a tagged cDNA molecule which is complementary to an input RNA molecule. A primer comprising a first tag hybridizes to the input RNA molecule. The first tag comprises a binding partner (DIG) of a first binding pair (DIG:anti-DIG antibody).
[0015] FIG. 2 illustrates a target enrichment procedure in which a tagged cDNA molecule having an attached first tag is hybridized with bridge probes and the cDNA molecule:bridge probe complex is hybridized with biotinylated universal (or anchor) probes.
[0016] FIG. 3 illustrates how one of the input nucleic acids (e.g. RNA) is differentially tagged with a first tag described here while another of the input nucleic acids (e g. DNA) is not tagged. Following optional target enrichment, RNA (or cDNA) molecules with the first tag are then isolated.
[0017] The present teachings are best understood from the following detailed description when read with the accompanying drawing figures. The features are not necessarily drawn to scale. Wherever practical, like reference numerals refer to like features.
DETAILED DESCRIPTION
[0018] Before the various embodiments are described, it is to be understood that the teachings of this disclosure are not limited to the particular embodiments described, and as such
can, of course, vary. The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described in any way.
[0019] As one aspect, the present disclosure provides methods of preparing RNA molecules for analysis. The methods comprise providing a sample comprising input DNA molecules and input RNA molecules, and selectively hybridizing a tagged primer to the input RNA molecules. The tagged primer comprises a first tag, and the first tag comprises a binding partner of a first binding pair. The first tag can be attached to the cDNA by a covalent bond or by non-covalent binding. The input RNA molecules can comprise a primer binding site that hybridizes with the tagged primer, and the input DNA molecules do not comprise the primer binding site. In some embodiments, the primer binding site is polyadenine, or poly(A), which can be naturally present on the input RNA molecules, or may be added to input RNA molecules. In some embodiments, the primer binding site is ligated to the input RNA molecules before the hybridizing of the tagged primer.
[0020] The present methods also comprise extending the primer to produce tagged cDNA molecules. In some embodiments, the methods also comprise attaching a second tag to the first tag to produce a dual tagged cDNA complex, wherein the second tag comprises a reciprocal binding partner of the first binding pair. The second tag can be attached to the first tag by a covalent bond or by non-covalent binding. In some embodiments, the second tag also comprises a binding partner of a second binding pair. The binding partners of the first binding pair do not bind with the binding partners of the second binding pair. In other words, the second tag can include binding partners from two different binding pairs.
Synthesizing a Tagged cDNA Molecule Having A First Tag
[0021] The present methods are particularly advantages with samples that contain input DNA molecules and input RNA molecules. The methods comprise selectively hybridizing a tagged primer to the input RNA molecules, meaning that the primer hybridizes to the input RNA molecules without significant hybridization to the input DNA molecules. The tagged primer comprises a first tag, and the first tag includes a binding partner of a first binding pair. In other words, a first binding pair includes two binding partners which reciprocally bind to each other; the first tag includes one of the binding partners from that binding pair.
[0022] The methods also comprise extending the tagged primer to produce tagged cDNA molecules. The term “extending”, as used herein, refers to the extension of a primer by the addition of nucleotides using a primer extension enzyme. If a primer that is annealed to a nucleic acid (such as an input RNA molecule) is extended, the nucleic acid acts as a template for an extension reaction. The sequence of nucleotides added during the extension process is determined by the sequence of the nucleic acid template. Primers can be extended by primer extension enzymes such as DNA polymerases and reverse transcriptases.
[0023] In some embodiments, the first tag is any binding partner suitable for covalent or non-covalent attachment to the tagged primer, so that the first tag is attached to a cDNA molecule synthesized by primer extension of a primer hybridized with an input RNA molecule. In some embodiments, the first tag is selected from the group consisting of digoxigenin, 5- bromo-2’-deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotri acetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine (e.g., a primary amine), an aldehyde, an alkyne or an azide or other groups reacting by click chemistry, and mixtures thereof. Accordingly, examples of first binding
pairs include DIG:anti-DIG antibody, BrdU:anti-BrdU antibody, DNP:anti-DNP antibody, Ni- NTA:poly-Histidine (His-tag), Tyramide:tyrosine residues, ThiokThiol (disulfide bonds), Amine:Aldehyde conjugation, alkyne:azide, or other click chemistry reactants. Examples of tyramines include Tyramine, N-Methyltyramine, N,N-Dimethyltyramine, and N,N,N- Trimethyltyramine. Examples of alkynes and azides binding via click chemistry include copper- catalyzed reaction of an azide and alkyne to form a triazole (Huisgen 1, 3-dipolar cycloaddition) and strain-promoted azide alkyne cycloaddition (SPAAC).
[0024] Digoxigenin is advantageous as the binding partner included in the first tag, due to its very low non-specific binding and the availability of high-affinity anti-digoxigenin antibodies. Examples of anti -DIG antibodies include Perkin Elmer’s Anti -Digoxigenin biotin conjugate.
Binding A Second Tag To Produce A Dual Tagged cDNA Complex
[0025] In some embodiments, the present methods comprise a second tag that includes the reciprocal binding partner of the first tag (that is, the other binding partner of the first binding pair). For example, the first tag can comprise an antigen or hapten, and the second tag can comprise an antibody that selectively binds that antigen or hapten. The attachment of the second tag to the tagged cDNA molecule results in dual tagged DNA complexes, which can facilitate enrichment or isolation of the cDNA. The binding partners of the second binding pair do not bind with the binding partners of the first binding pair. In some embodiments, the present methods comprise forming a dual tagged cDNA complex by binding the second tag to the first tag which was previously attached to the tagged cDNA molecule.
[0026] In some embodiments, the second tag also includes a binding partner of a second binding pair (that is, a binding pair that is different from the first binding pair). In some embodiments, the second tag comprises biotin, 5-bromo-2’-deoxyuridine (BrdU), 2,4- dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine (e.g., a primary amine), an aldehyde, an alkyne or an azide or other groups reacting by click chemistry, and mixtures thereof. The reciprocal binding partner of the second binding pair can comprise an avidin moiety such as avidin, streptavidin, neutravidin, captavidin, etc., or antibodies that specifically bind the binding partner, His-tags, tyrosine residues, thiols, amines, aldehydes, alkynes, azides, etc. Accordingly, examples of second binding pairs include biotin: streptavidin, BrdU: anti -BrdU antibody, DNP:anti-DNP antibody, Ni-NTA:His-tag, Tyramide:tyrosine residues, Thiol:Thiol (disulfide bonds), Amine:Aldehyde conjugation, alkyne:azide, or other click chemistry reactants. Click chemistry can be a reaction between and azide and alkyne that can yield a covalent product (e.g., 1,5 -di substituted 1,2,3-triazole). Copper can serve as a catalyst. The proteins avidin and streptavidin form exceptionally tight complexes with biotin moieties. In general, when a biotin moiety is coupled to a second molecule through its carboxyl side chain, the resulting conjugate is still tightly bound by avidin or streptavidin. The second molecule is said to be "biotinylated" when such conjugates are prepared.
[0027] Emerging nucleic-acid sequencing platforms can enable direct detection and analysis of nucleic acid molecules without the need for traditional sample preparation and amplification steps. In one example of such emerging platforms, input nucleic acid molecules can be tagged with a biotin molecule through a simple enzymatic addition step. The biotinylated
nucleic acid molecule can then be bound to a flowcell containing avidin or streptavidin moieties. Subsequent analysis can then be performed, e.g., using fluorescently labelled probes.
[0028] If a biotin moiety is tagged to input nucleic acid molecules, as is done in some sample preparation procedures, it would interfere with certain probe-based capture and enrichment protocols that uses biotin-tagged oligonucleotide probes. However, addition of a biotin tag to nucleic acids after target enrichment is challenging due to the single stranded nature of the enriched nucleic acid molecules. In addition, residual components from the target enrichment protocol could also be processed, impacting subsequent analysis.
[0029] The present disclosure provides an approach for selectively tagging cDNA molecules that is compatible with existing target enrichment procedures and supports subsequent biotin-tag addition, enabling compatibility with direct nucleic acid sequence analysis platforms. [0030] The present methods can be used with samples comprising input nucleic acid molecules of various types, particularly where the sample comprises a mixture of input DNA and RNA molecules. Input DNA molecules include genomic DNA (gDNA), mitochondrial DNA, viral DNA, cDNA, cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), cell-free fetal DNA (cffDNA), or synthetic DNA. The DNA can be double-stranded DNA, single-stranded DNA, fragmented DNA, or damaged DNA. The input nucleic molecules can include input RNA, input DNA, or a mixture of input RNA and input DNA. The input RNA molecules can be mRNA, pre-mRNA, tRNA, rRNA, microRNA, snRNA, piRNA, small non-coding RNA, polysomal RNA, intron RNA, pre-mRNA, viral RNA, or cell-free RNA. In some embodiments, the DNA comprises fragmented genomic DNA and the RNA comprises mRNA or pre-mRNA.
[0031] The input nucleic acid can be naturally occurring or synthetic. The input nucleic acid can have modified heterocyclic bases. The modification can be methylated purines or
pyrimidines, acylated purines or pyrimidines, alkylated riboses, or other heterocycles. The input nucleic acid can have modified sugar moieties. The modified sugar moieties can include peptide nucleic acid. The input nucleic acid can comprise peptide nucleic acid. The input nucleic acid can comprise threose nucleic acid. The input nucleic acid can comprise locked nucleic acid. The input nucleic acid can comprise hexitol nucleic acid. The input nucleic acid can be flexible nucleic acid. The input nucleic acid can comprise glycerol nucleic acid.
[0032] The input nucleic acid can be captured and enriched from low-input (e.g., 1 ng of nucleic acid materials) samples such as cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), a single cell, or 10 or fewer cells. Examples of a single cell or other cells for which analysis may be desired include a neuron, a glial cell, a germ cell, a gamete, an embryonic stem cell, a pluripotent stem cell (including an induced pluripotent stem cell), an adult stem cell, a cell of the hematopoietic lineage, a differentiated somatic cell, a microbial cell, a cancer cell (including, for example a cancer stem cell), and a disease cell. In some embodiments, the input nucleic acids are captured from 10 or fewer cells (such as 1-10 cells, 2-10 cells, 5-10 cells, 1-2 cells, or 2-5 cells). The low-input samples can have 1 ng, 2 ng, 3 ng, 4 ng, 5 ng, 6 ng, 7 ng, 8 ng, 9 ng, 10 ng, or more of nucleic acid materials. The low-input samples can have less than 10 ng, 9 ng, 8 ng, 7 ng, 6 ng, 5 ng, 4 ng, 3 ng, 2 ng, 1 ng, or less of nucleic acid materials. The low-input samples can have from 200 pg to 10 ng of nucleic acid materials. The low-input samples can have less than 10 ng of nucleic acid materials. The low-input sample can less than 10 ng, 5 ng, 1 ng, 100 pg, 50 pg, 25 pg, or less of the nucleic acid materials. In some cases, the input samples can have 1 ng, 10 ng, 20 ng, 30 ng, 40 ng, 50 ng, or more of nucleic acid molecule. The input samples can have less than 50 ng, 40 ng, 30 ng, 20 ng, 10 ng, 1 ng, or less of nucleic acid materials.
[0033] The capture and enrichment can be done by target probe hybridization. The target probe can be a capture probe, bridge probe, and/or anchor probe. The target probe can comprise one or more binding moieties. The binding moiety can be a biotin. The binding moieties can be attached to a support. The support can be a bead. The bead can be a streptavidin coated bead. [0034] The input nucleic acid can be damaged. The damaged nucleic acid can comprise altered or missing bases, and/or modified backbone. The input nucleic acid can be damaged by oxidation, radiation, or random mutation.
[0035] Damaged dsDNA (with a nick) or ssDNA can be used as input nucleic acid for a library construction. For the damaged dsDNA, the dsDNA can be denatured so at least one undamaged strand can be used as an input nucleic acid. The input nucleic acid can then be hybridized and attached to a capture probe and amplified using various primers.
[0036] The input nucleic acid can be derived from cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA). The cfDNA can be fetal or tumor in source. The input nucleic acid can be derived from liquid biopsy, solid biopsy, or fixed tissue of a subject. The input nucleic acid can be cDNA and can be generated by reverse transcription. The input nucleic acid can be derived from fluid samples, including not limited to plasma, serum, sputum, saliva, urine, or sweat. The input nucleic acid can be derived from liver, esophagus, kidney, heart, lung, spleen, bladder, colon, or brain.
[0037] The input nucleic acid can be derived from male or female subject. The subject can be an infant, a teenager, a young adult or an elderly person. The input nucleic acid can originate from human, rat, mouse, other animal, or specific plants, bacteria, algae, viruses, and the like. The input nucleic acid can originate from primates, such as chimpanzees or gorillas. Other animals include a rhesus macaque. The input nucleic acid can be from a mixture of genomes of
different species including host-pathogen, bacterial populations, etc. In some embodiments, the input nucleic acid can be cDNA made from RNA expressed from genomes of one or more species.
[0038] The input nucleic acid can comprise a target sequence. The target sequence can be an exon, an intron, or a promoter. The target sequence can be previously known, partially known previously, or previously unknown. The target sequence can comprise a chromosome, chromosome arm, or a gene. The gene can be gene associated with a condition, e.g., cancer.
[0039] FIG. 1 illustrates the present method with an input RNA molecule 101. The input RNA molecule has a primer binding sequence, either when obtained from the sample or which is added to it. For instance, a series of adenosine (A) bases 103 are present at the 3' end of the input RNA molecule 101. A tagged cDNA molecule 105 is synthesized using the input RNA molecules 101 as a template, by extension of a tagged primer 107 that hybridizes to the series of adenosines 103. The tagged primer 107 is conjugated to a binding partner 109 of a first binding pair such that it that serves as a first tag. The tagged primer 107 can also comprise one or more other functional sequences 111 such as primer binding sites or barcodes. In some embodiments, after extension along the length of the input RNA molecule 101, a tagged cDNA molecule 105 having a sequence complementary to the input RNA molecule is produced. In some embodiments, the 3’ end of the cDNA molecule 105 can be attached to an adaptor 113.
[0040] Input RNA molecules can be prepared or treated in other ways that cooperate with the present methods. For instance, the input RNA molecules can be prepared by a procedure that adds a poly(A) sequence if one is not already present. In some embodiments, the tagged primes
107 also comprises a sample specific index and/or molecular barcode (for example, as functional sequence 111).
[0041] FIG. 2 illustrates how a tagged cDNA 205 can be further processed in a target enrichment procedure. In FIG. 2, tagged cDNA 205 has been denatured from the input RNA molecule to form a single-stranded nucleic acid, which is hybridized with one or more bridge probes 213, 215. Bridge probes 213, 215 are hybridized with universal probe 217 (also referred to as an anchor probe) conjugated with a binding partner 219 (for example, biotin) of a second binding pair (for example, the binding pair of biotin: streptavidin) wherein streptavidin 221 is the reciprocal binding partner. In the illustrated embodiment, the hybridizations proceed at the same time; in other embodiments, the hybridizations can be sequential. The binding partners of the first binding pair (DIG: anti -DIG antibody) do not bind with the binding partners of the second binding pair (biotin: streptavidin). In FIG. 2, target enrichment of the is performed by binding the binding partner 219 (biotin) to its reciprocal binding partner 221 (streptavidin) which is on a solid support 223 (e.g., a magnetic bead).
[0042] Although the present methods can employ a capture probe that direct hybridizes with a target sequence of a tagged cDNA 205, the use of bridge probes and anchor probes offer several advantages. Bridge probes can be used to hybridize an input nucleic acid molecule and can further allow indirect association between an anchor probe and the input nucleic acid. The bridge probe can comprise a target specific region (TSR) that hybridizes to target sequence. The bridge probe can comprise an anchor-probe-landing sequence (ALS) that hybridizes to bridge-binding- sequence of anchor probe. The bridge probe can comprise a linker connecting TSR and ALS. The TSR can be located in the 3 ’-portion of the bridge probe. The TSR can be located in the 5’- portion of the bridge probe.
[0043] The bridge probe can comprise DNA. The bridge probe can comprise RNA. The bridge probe can comprise uracil and methylated cytosine. The bridge probe might not comprise of uracil. The bridge probe can comprise about 400 nucleotides, about 300 nucleotides, about 200 nucleotides, about 120 nucleotides, about 100 nucleotides, about 90 nucleotides, about 80, about 70 nucleotides, about 50 nucleotides, about 40 nucleotides, about 30 nucleotides, about 20 nucleotides, or about 10 nucleotides. The bridge probe can comprise one or more molecular barcodes. The bridge probe can comprise one or more binding moieties. The binding moiety can be a biotin. The binding moieties can be attached to a support. The support can be a bead. The bead can be a streptavidin bead.
[0044] Multiple bridge probes can be used to anneal to multiple target sequences in a sample. The bridge probes can be designed to have similar melting temperatures. The melting temperatures for a set of bridge probes can be within about 15°C, within about 10°C, within about 5°C, or within about 2°C. The melting temperature for one or more bridge probes can be about 75°C, about 70°C, about 65°C, about 60°C, about 55°C, about 50°C, about 45°C, or about 40°C. The melting temperature for the bridge probe can be about 40°C to about 75°C, about 45°C to about 70°C, 45°C to about 60°C, or about 52°C to about 58°C.
[0045] Use of an anchor probe along with one or more bridge probe around a particular bridge probe can help to stabilize the hybridization of the particular bridge probe to its target sequence through synergistic effect. A hybridization temperature to form the multiple bridge probe assembly can be higher than the melting temperature of a single bridge probe. The higher temperature can result in a better capture specificity by reducing nonspecific hybridization that can occur at lower temperature. The hybridization temperature can be about 5°C, about 10°C, about 15°C, or about 20°C higher than the melting temperature of individual bridge probe. The
hybridization temperature can be about 5°C to about 20°C higher than the melting temperature of a bridge probe, or about 5°C to about 20°C higher than an average melting temperature of a plurality of bridge probes.
[0046] The hybridization temperature for multiple bridge probes can be about 75°C, about 70°C, about 65°C, about 60°C, about 55°C, or about 50°C. The hybridization temperature for multiple bridge probes can be about 50°C to about 75°C, 55°C to about 75°C, 60°C to about 75°C, or 65°C to about 75°C.
[0047] The bridge probe can further comprise a label. The label can be fluorescent. The fluorescent label can be organic fluorescent dye, metal chelate, carbon nanotube, quantum dot, gold particle, or fluorescent mineral. The label can be radioactive. The label can be biotin. The bridge probe can bind to labeled nucleic acid binder molecule. The nucleic acid binder molecule can be antibody, antibiotic, histone, antibody, or nuclease.
[0048] The bridge probe can comprise a linker. In some embodiments, the linker comprises about 30 nucleotides, about 25 nucleotides, about 20 nucleotides, about 15 nucleotides, about 10 nucleotides, or about 5 nucleotides; any of those numbers can be combined to form a range for the number of nucleotides in a linker. The linker can comprise non-nucleic acid polymers (e.g., string of carbons). The linker non-nucleotide polymer can comprise about 30 units, about 25 units, about 20 units, about 15 units, about 10 units, or about 5 units; any of those numbers can be combined to form a range for the number of units in a linker.
[0049] The bridge probe can be blocked at the 3’ and/or 5’ end. The bridge probe can lack a 5’ phosphate. The bridge probe can lack a 3’ OH. The bridge probe can comprise a 3’ddC, 3’inverted dT, 3’C3 spacer, 3’ amino, or 3’ phosphorylation.
[0050] The anchor probe or universal anchor probe can comprise one or more bridgebinding-sequences (BBS) that hybridize to anchor-probe-landing sequence of the one or more bridge probes. The anchor probe can comprise spacers in between the BBSs. The presence of the one or more spacers can improve the efficiency of the hybridization capture and increase the specificity of the capture.
[0051] The anchor probe can comprise a molecular barcode (MB). The anchor probe can comprise BBS to which the one or more bridge probes can hybridize to. The anchor probe can comprise from 1 to 100 BBSs. The anchor probe can comprise an index for distinguishing samples. The molecular barcode or index can be 5’ of the adaptor sequence and 5’ of the BBS. [0052] The anchor probe can comprise about 400 nucleotides, about 200 nucleotides, about 120 nucleotides, about 100 nucleotides, about 90 nucleotides, about 80 nucleotides, about 70 nucleotides, about 50 nucleotides, about 40 nucleotides, about 30 nucleotides, about 20 nucleotides, or about 10 nucleotides. The anchor probe can be about 20 to about 70 nucleotides.
[0053] The melting temperature of anchor probe to the bridge probe can be about 65°C, about 60°C, about 55°C, about 50°C, about 45°C, or about 45°C to about 70°C.
[0054] The anchor probe can comprise a label. The label can be fluorescent. The fluorescent label can be an organic fluorescent dye, metal chelate, carbon nanotube, quantum dot, gold particle, or fluorescent mineral. The label can be radioactive. The label can be biotin. The anchor probe can bind to labeled nucleic acid binder molecule. The nucleic acid binder molecule can be antibody, antibiotic, histone, antibody, or nuclease. In same embodiments, the label is selected from the binding partners of one of the second binding pairs discussed above.
[0055] FIG. 2 demonstrates how the present methods can be used for target enrichment on a sample containing the input RNA molecule tagged with a first tag. Preparation of nucleic
acids for sequencing-by-synthesis or other analysis often employs target enrichment, and one or more target enrichment procedures can be included in the present methods. By enriching for one or more desired targets, sequencing or other analysis can be more focused with reduced effort and expense and/or with high coverage depth. Examples of target enrichment procedures include hybridization-based capture protocols such as SureSelect Hybrid Capture from Agilent and TruSeq Capture from Illumina. Other examples include PCR-based protocols such as HaloPlex from Agilent; AmpliSeq from ThermoFisher; TruSeq Amplicon from Illumina; and emulsion/digital PCR from Raindance.
[0056] In some embodiments, the present methods also comprise capture of input DNA molecules comprising target sequences. The present methods allow efficient capture and enrichment of both cDNA molecules having the sequence of input RNA molecules, and input DNA molecules as well. Target enrichment can be performed after synthesis of tagged cDNA molecules, which preserves the ability to distinguish between sequences from input DNA and input RNA molecules. Target enrichment can be performed after attaching an adaptor to an input nucleic acid molecule. The present methods can be used to handle low input samples. [0057] In some embodiments, the present methods comprise target enrichment by indirect hybridization of the tagged cDNA molecule with an anchor probe through hybridization of one or more bridge probes to the tagged cDNA molecule. The one or more bridge probes can be designed to hybridize to particular target sequences in the tagged cDNA molecule. An anchor probe in turn can be designed to hybridize to the one or more bridge probes, thereby creating an assembly of three or more hybridized nucleic acid molecules. The multi-structure hybridization assembly can act synergistic to provide more stability to the assembly.
[0058] Enrichment of an input nucleic acid containing a target sequence can be facilitated by interaction of the input nucleic acid and two or more probes that form a hybridization assembly. The multi-complex assembly can stabilize the hybridization interaction between the input and the enrichment such as bridge probes. A bridge probe can comprise a target specific region that hybridizes to a target region of the input nucleic acid and anchor-probe-landing sequence (ALS) that hybridizes to bridge-binding-sequence (BBS) of an anchor probe. The hybridizations between the input nucleic acid and the bridge probe and between the bridge probe and the anchor probe can form a multi-complex assembly.
[0059] In some embodiments, the present methods comprise hybridizing a first target specific region of a first bridge probe to a first target sequence of a molecule with a sequence corresponding to the genome region, wherein a first anchor-probe-landing sequence of the first bridge probe is bound to a first bridge-binding-sequence of an anchor probe; and hybridizing a second target specific region of a second bridge probe to a second target sequence of the molecule with a sequence corresponding to the genome region, wherein a second anchor-probelanding sequence of the second bridge probe is bound to a second bridge-binding-sequence of the anchor probe. As described herein the anchor probe may comprise a binding moiety. The method generally comprises attaching adaptors to the 5’ end or the 3’ ends of nucleic acid molecules of the plurality of nucleic acid molecules, thereby generating a library of nucleic acid molecules comprising adaptors.
[0060] More than two bridge probes per input nucleic acid molecule can be used in the methods disclosed herein. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 50, 75, 100, or more bridge probes can be used to bridge the input nucleic acid and the anchor probe. The target enrichment can further comprise hybridizing a second target specific region of a second bridge
probe to a second target sequence of the input nucleic acid molecule, wherein a second anchorprobe-landing sequence of the second bridge probe can be bound to a second bridge-binding- sequence of the anchor probe. In some cases, the target enrichment can be conducted after attachment of adaptors or other first tags to the input nucleic acid molecules.
[0061] The bridge probes can further comprise linkers that connect the target specific region and the anchor-probe-landing sequence. The adaptor anchor can comprise one or more spacers in between the bridge-binding-sequences. The presence of the one or more spacers can improve the efficiency of the hybridization capture and increase the specificity of the capture.
[0062] The input nucleic acid can be captured and enriched from low-input samples that contain cell-free RNA as well as cell-free DNA (cfDNA) and circulating tumor DNA (ctDNA). The capture and enrichment can be done by the indirect association with anchor probe through hybridization with bridge probe. The bridge probe and/or anchor probe can comprise one or more binding moieties. The binding moiety can be a biotin. The binding moieties can be attached to a support. The support can be a bead. The bead can be a streptavidin bead.
[0063] The present methods of capture and enrichment can further include solid phase extraction of the input nucleic acid. The bridge probe or anchor probe can be bound to a solid support. The bridge probe, or anchor probe can comprise a label. The disclosed methods can further comprise capturing to the bridge probe, the anchor probe, or the hybridization complex comprising input nucleic acid molecule, bridge probe, and anchor probe by the label. The label can be biotin. The label can be a nucleic acid sequence, such as poly A or Poly T, or specific sequence. The nucleic acid sequence can be about 5 to 30 bases in length. The nucleic acid sequence can comprise DNA and/or RNA. The label can be at the 3’ end of the bridge probe, or anchor probe. The label can be a peptide, or modified nucleic acid that can be recognized by
antibody such as 5-Bromouridine, and biotin. The label can be conjugated to the bridge probe, or anchor probe by reactions such as “click” chemistry. “Click” chemistry can allow for the conjugation of a reporter molecule like fluorescent dye to a biomolecule like DNA. Click chemistry can be a reaction between and azide and alkyne that can yield a covalent product (e.g., 1,5-disubstituted 1,2,3-triazole). Copper can serve as a catalyst.
[0064] The label can be captured on a solid support. The solid support can be magnetic. The solid support can comprise a bead, flowcell, glass, plate, device comprising one or more microfluidic channels, or a column. The solid support can be a magnetic bead.
[0065] The solid support (e.g., bead) can comprise (e.g., by coated with) one or more capture moi eties that can bind the label. The capture moiety can be streptavidin, and the streptavidin can bind biotin. The capture moiety can be an antibody. The antibody can bind the label. The capture moiety can be a nucleic acid, e.g., a nucleic acid comprising DNA and/or RNA. The nucleic acid capture moiety can bind a sequence on, e.g., an anchor probe or bridge probe. In some cases, an anti-RNA/DNA hybrid antibody bound to a solid surface can be used as a capture moiety.
[0066] The label and the capture moiety can bind through one or more covalent or non- covalent bonds. Following capture of the bridge probe, anchor probe, or the hybridization complex on the solid support, the solid support can be washed to remove, e.g., unbound template from the sample. In some cases, no wash step is performed. The wash can be stringent or gentle. The capture probe or anchor probe that are hybridized to an input nucleic acid molecule can be eluted, e.g., by adding free biotin to the sample when the label is biotin and the capture moiety is streptavidin.
[0067] Cleanup can be performed using streptavidin beads after the input nucleic acid, bridge probe, and anchor probe hybridization, wherein the 3’ end of the anchor probe is biotinylated. The input nucleic acid complex hybridized to the bridge probes (and indirectly with the anchor probe) is bound to the bead. The input nucleic acid that has not hybridized to the bridge probe can be washed away. The 5’ end or the 3’ end of a first and or second bridge probe can be biotinylated. In this manner, streptavidin beads can be used to remove and separate the unhybridized input nucleic acid from input nucleic acid having the target sequence.
[0068] In some embodiments, a first tag is attached to an input nucleic acid molecule without an adaptor. For instance, a nucleic acid can be direct tagged using digoxigenin 3’ end oligonucleotide labeling kits which might improve tagging efficiency over traditional adaptor ligation. Such labeling kits are commercially available. However, such an approach potentially limits inclusion of identifiers (sample index / UMI) to support pairing of sequences obtained from original DNA strands.
[0069] In some embodiments, the dual tagged cDNA complexes comprising a second tag are immobilized on a solid support such as a plate or a bead. Immobilization of these complexes can facilitate washing to remove any undesired species (e.g., input DNA molecules or contaminants). In some embodiments, the dual tagged cDNA complex is immobilized on the surface of a flowcell or a glass slide. In some embodiments, the dual tagged cDNA complex is immobilized on a well or magnetic bead. In some embodiments, the solid support may be coated with a polymer attached to a functional group or moiety. In some embodiments, the solid support may carry functional groups such as amino, hydroxyl, or carboxyl groups, or other moieties such as avidin or streptavidin.
[0070] In some embodiments, the present methods comprise of attaching adaptors to the input DNA molecules or to amplicons thereof. Generally an adaptor is attached to at least one strand of a double-stranded DNA molecule, and usually an adaptor can be a molecule that is at least partially double-stranded. An adaptor may be 40 to 150 bases in length, e g., 50 to 120 bases. An adaptor can be joined to a 5' end and/or a 3' end of a nucleic acid molecule. A Y- adaptor is an adaptor that contains a double-stranded region and a single-stranded region in which the opposing sequences are not complementary. The end of the double-stranded region can be joined to target molecules such as double-stranded fragments of genomic DNA, e.g., by via a transposase-catalyzed reaction. Each strand of a double-stranded DNA molecule that has been joined to a Y adaptor is asymmetrically tagged in that it has the sequence of one strand of the Y-adaptor at one end and the other strand of the Y-adaptor at the other end. Amplification of nucleic acid molecules that have been joined to Y-adaptors at both ends results in an asymmetrically tagged nucleic acid, i.e., a nucleic acid that has a 5' end containing one tag sequence and a 3' end that has another tag sequence.
[0071] The input DNA molecules can be denatured to form a single-stranded molecules which are then amplified using amplification primers to form a double-stranded product, and/or processed by other techniques.
[0072] In some embodiments, the processing comprises amplifying a nucleic acid, before and/or after it is attached to an adaptor. In some embodiments, an adaptor is located at a 5'-end of a target sequence in a cDNA or an input nucleic acid, and the adaptor provides a priming site for amplification of the target sequence. Nucleic acid can be amplified using a first amplification primer and a second amplification primer. In some embodiments, the first amplification primer has sequence specificity for a target sequence in the nucleic acid, and is capable of hybridizing to
a portion of the target sequence (a nucleic acid of interest). The second amplification primer is capable of hybridizing to a priming site of the adaptor or to a target-specific priming site of the input nucleic acid. During the amplification step, the first amplification primer hybridizes to the target sequence and the second primer hybridizes to the sequence priming site on the adaptor. In some embodiments, the first amplification primer hybridizes at the 5'-end of the nucleic acid.
The primers should be sufficiently large to provide adequate hybridization with the target sequence or other primer binding site.
[0073] An input DNA molecule or a cDNA molecule (or other nucleic acid) may be amplified using any suitable method. In some embodiments, the nucleic acid is amplified using polymerase chain reaction (PCR). In general, PCR comprises denaturation of polynucleotide strands (e.g., DNA melting), annealing of primers to the denatured polynucleotide strand, and extension of primers with a polymerase to synthesize the complementary polynucleotide. The process generally requires a DNA polymerase, forward and reverse primers, deoxynucleoside triphosphates, bivalent cations, and a buffer solution. In some embodiments, the nucleic acid is amplified by linear amplification. In some embodiments, the nucleic acid is amplified using Emulsion PCR, Bridge-PCR, or Rolling Circle amplification. The amplicons of the nucleic acid may be analyzed to determine the order of base pairs using a suitable sequencing method.
[0074] As another aspect, the present disclosure provides methods for differential processing of input nucleic acid molecules in a sample. For instance, a sample containing different types of nucleic acids could be processed in a way that certain nucleic acids (for example, cDNA synthesized using input RNA molecules as templates) are selectively tagged with a first tag as described here. Following target enrichment, tagged cDNA molecules with the first tag can be isolated, given a separate index, and/or subjected to different processing steps.
The separate input DNA molecules and tagged cDNA molecules can then be pooled for combined analysis (such as sequencing) or analyzed separately.
[0075] FIG. 3 illustrates how cDNA made from input RNA molecules in a mixture can be selectively tagged using the present methods. A nucleic acid sample contains a mixture of input RNA molecules 301 and input DNA molecules 302. A series of adenosine (A) bases 303 are present at the 3' end of the RNA molecules 301 but not to the DNA molecules 302. cDNA molecules 305 are synthesized using the input RNA molecules 301 as templates by extension of a tagged primer 307 that hybridizes to the A-tail. The tagged primer 307 is conjugated to a binding partner 309 of a first binding pair, so that serves as a first tag. In some embodiments, double-stranded adaptors 325 are ligated to the DNA molecules 302. A single-stranded adaptor 313 is ligated to the cDNA 305. In FIG. 3, the adaptors 325 are shown without tags or binding partners attached thereto, but in some embodiments, the adaptors comprise one or more tags or binding partners of binding pairs that are different from the first tag or the binding partner 335 of a first binding pair.
[0076] Following an optional target enrichment, tagged cDNA molecules 305 can be tagged with a second tag 327 by attaching the second tag 327 to the first tag, more specifically to the binding partner 309 of the first binding pair, to form a dual tagged cDNA complex 330. The second tag 327 comprises a reciprocal binding partner 329 of the first binding pair. The second tag 338 also comprises a binding partner 331 of a second binding pair. In some embodiments, the binding partner 309 of the first binding pair is DIG, the reciprocal binding partner 329 of the first binding pair is an anti-DIG antibody, and the binding partner 331 of the second binding pair is a biotin moiety.
[0077] The dual tagged cDNA complex 330 can be separated from the DNA constructs 333 by binding the binding partner 331 (biotin) to its reciprocal binding partner 335 (streptavidin) which is on a solid support 337 (e.g., a magnetic bead). In some embodiments, after separation, a cDNA fraction 339 can be treated with a different procedure than the DNA constructs 333 in a DNA fraction 341. The DNA fraction 341 and the RNA fraction 339 can be pooled in specific ratios for sequencing or analysis, or analyzed separately.
[0078] In some embodiments, the present method also comprises processing cDNA molecules by attaching one or more adaptors to a tagged cDNA molecule, to a dual tagged cDNA complex, or to an amplicon thereof. An adaptor can be attached before or after the first and/or second tag is removed from the cDNA molecule. The adaptor can be attached before or after amplification of the cDNA, and in some embodiments the adaptor is attached before amplification. The adaptor can be attached by any suitable technique, such as by ligation, use of a transposase, hybridization, and/or primer extension. In some embodiments, the cDNA molecule or amplicon thereof is ligated with an adaptor at one or both ends. In a ligation reaction, a covalent bond or linkage is formed between the termini of two or more nucleic acid molecules (such as an input and an adaptor). The nature of the bond or linkage may vary, and the ligation may be carried out enzymatically or chemically. Ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5' carbon of a terminal nucleotide of one polynucleotide or oligonucleotide with 3' carbon of another polynucleotide or oligonucleotide. In some embodiments, the adaptor is a Y-adaptor. Other examples of adaptors including linear adaptors, circular adaptors, and bubble adaptors.
Sequencing The Input Nucleic Acid Molecules
[0079] The present methods may be used as part of a high-throughput sequencing method such as a Next Generation Sequencing (NGS) method. In some embodiments, a high-throughput sequencing method comprises three steps: library preparation, immobilization, and sequencing. DNA is often subjected to fragmentation, and adaptors are attached to one or both ends of the fragments or other nucleic acids to form a sequencing library. The sequencing library molecules are immobilized on a solid support, and sequencing reactions are performed to identify the nucleic acid sequence. The high-throughput sequencing method may employ Emulsion PCR, Bridge-PCR, or Rolling Circle amplification to provide colonies or copies of the input nucleic acid molecules.
[0080] In some embodiments, the cDNA molecules synthesized from input RNA molecules are sequenced without amplification. A dual tagged cDNA complex can be attached or immobilized to a solid substrate which has a reciprocal binding partner for the second tag of the complex. For example, when the dual tagged cDNA complex comprises a biotin molecule in its second tag, it can be immobilized on a streptavidin-coated surface of a flowcell or bead.
After immobilization, the cDNA can be sequenced by any suitable technique such as sequencing- by-synthesis or sequencing-by-hybridization. In some embodiments, the immobilized the cDNA molecule is sequenced using a single-molecule sequencing platform, such as the methods discussed in Wbhrstein et al. US Patent 10,851,411.
[0081] In some embodiments, the present methods comprise aligning sequence reads of the input nucleic acids. The sequence reads may be processed and grouped in any suitable way. In some embodiments, the sequence reads may be initially grouped by the fragment sequence and/or the identifier(s). In some implementations, initial processing of the sequence reads may
include identification of molecular barcodes (including sample identifier sequences or subsample identifier sequences), and/or trimming reads to remove low quality or adaptor sequences. In addition, quality assessment metrics can be run to ensure that the dataset is of an acceptable quality. With sequencing platforms that require clonal amplification of input nucleic acid molecules, there is a concern a potential sequence variation is a PCR or amplification error rather than a true variation. An advantage from identifying and sequencing an input nucleic acid molecule without amplification is that it avoids such errors.
[0082] The cDNA molecules synthesized from input RNA molecules (or amplicons thereof) can be further analyzed using various methods including southern blotting, polymerase chain reaction (PCR) (e.g., real-time PCR (RT-PCR), digital PCR (dPCR), droplet digital PCR (ddPCR), quantitative PCR (Q-PCR), nCounter analysis (Nanostring technology), gel electrophoresis, DNA microarray, mass spectrometry (e.g., tandem mass spectrometry, matrix- assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS), chain termination sequencing (Sanger sequencing), or next generation sequencing. The input DNA molecules (or amplicons thereof) can also by analyzed by such methods.
[0083] The next generation sequencing can comprise 454 sequencing (ROCHE) (using pyrosequencing), sequencing using reversible terminator dyes (ILLUMINA sequencing), semiconductor sequencing (THERMOFISHER ION TORRENT), single molecule real time (SMRT) sequencing (PACIFIC BIOSCIENCES), nanopore sequencing (e.g., using technology from OXFORD NANOPORE or GENIA), microdroplet single molecule sequencing using pyrophosphorolyis (BASE4), single molecule electronic detection sequencing, e.g., measuring tunnel current through nanoelectrodes as nucleic acid (DNA/RNA) passes through nanogaps and calculating the current difference (QUANTUM SEQUENCING from QUANTUM
BIOSYSTEMS), GenapSys Gene Electornic Nano-Integrated Ultra-Sensitive (GENIUS) technology (GENAPYS), GENEREADER from QIAGEN, sequencing using sequential hybridization and ligation of partially random oligonucleotides with a central determined base (or pair of bases) identified by a specific fluorophore (SOLiD sequencing). The sequencing can be paired-end sequencing.
[0084] The performance of a panel or method for capturing targets or preparing a NGS library may be defined by a number of different metrics describing efficiency, accuracy, and precision. Such metrics can be obtained by sequencing the captured nucleic acid molecules or amplicons thereof. For example, coverage percentage region-wide (0.2X or 0.5X), coverage percentage base-wide, target coverage, depth of coverage, fold enrichment, percent mapped, percent on-target, AT or GC dropout rate, fold 80 base penalty, percent zero coverage targets, PF reads, percent selected bases, percent duplication, or other variables can be used to characterize a library.
[0085] The number of target sequences from a sample that can be sequenced using methods described herein can be about 5, 10, 15, 25, 50, 100, 1000, 10,000, 100,000, or 1,000,000, or about 5 to about 100, about 100 to about 1000, about 1000 to about 10,000, about 10,000 to about 100,000, or about 100,000 to about 1,000,000.
[0086] Nucleic acid libraries generated using methods described herein can be generated from more than one sample. Each library can have a different index associated with the sample. For example, a capture probe or an anchor probe can comprise an index that can be used to identify nucleic acids as coming from the same sample (e.g., a first set of capture probes or anchor probes comprising the same first index can be used to generate a first library from a first sample from a first subject, and a second set of capture probes or anchor probes comprising the
same second index can be used to generate a second library from a second sample from a second subject, the first and second library can be pooled, sequenced, and an index can be used to discern from which sample a sequenced nucleic acid was derived). Amplified products generated using the methods described herein can be used to generate libraries from at least 2, 5, 10, 25, 50, 100, 1000, or 10,000 samples, each library with a different index, and the libraries can be pooled and sequenced, e.g., using a next generation sequencing technology.
[0087] The sequencing can generate at least 100, 1000, 5000, 10,000, 100,000, 1,000,000, or 10,000,000 sequence reads. The sequencing can generate between about 100 sequence reads to about 1000 sequence reads, between about 1000 sequence reads to about 10,000 sequence reads, between about 10,000 sequence reads to about 100,000 sequence reads, between about 100,000 sequence reads and about 1,000,000 sequence reads, or between about 1,000,000 sequence reads and about 10,000,000 sequence reads.
[0088] The depth of sequencing can be about lx, 5x, lOx, 50x, lOOx, lOOOx, or 10,000x. The depth of sequencing can be between about lx and about lOx, between about lOx and about lOOx, between about lOOx and about lOOOx, or between about lOOOx and about lOOOOx.
[0089] The present disclosure provides methods in which separate fractions of a nucleic acid sample can be prepared for sequencing or treated with different procedures. FIG. 3 illustrates a method of obtaining a DNA fraction 347 and a RNA fraction 346 from the same sample. Because the present methods facilitate sequencing and analysis without amplification and enable the separation of different fractions, the enriched input nucleic acids may be analyzed by sequencing, or may be bisulfide treated (or enzymatically treated) prior to sequencing to assess methylation. In some cases, a first fraction may be analyzed by sequencing to assess mutations while a second fraction is bisulfide or enzymatically treated prior to sequencing to
assess methylation. In some cases, a first fraction and a second fraction are both assessed by straightforward sequencing to access genomic alteration; however the samples may be sequenced at different depths. In some cases, an analysis of a first fraction may be performed prior to performing a second target enrichment step. The results of the analysis of the first fraction sample may be used to select a second panel for the second enrichment step.
[0090] The present methods can be compatible clinical samples with over a large range of amounts of input nucleic acid material. In some embodiments, the present methods can be used sequence samples with input nucleic acid molecules of less than 5 ng, less than 4 ng, less than 3 ng, less than 2 ng, or less than 1 ng.
[0091] The target specific sequence or target specific region (TSR) of a capture probe or a bridge probe can be designed based on the target sequence of the input nucleic acid molecule.
Kits for Indirect Tagging of Nucleic Acids
[0092] As another aspect of the present invention, kits are provided which comprise first and second tagging reagents for making tagged cDNA molecules and dual tagged cDNA complexes constructs as described herein. A first tagging reagent comprises a tagged primer comprising first tag (according to any of the embodiments described herein) in a composition that comprises a solvent or other components. Likewise, a second tagging reagent comprises a second tag (according to any of the embodiments described herein) in a composition. The kits can comprise the first and second tagging reagents in one or more vessels, such as vials, tubes, etc.
[0093] In some embodiments, the present kits comprise one or more tagged primers comprising functional sequences configured to be attached to an end of the cDNA molecules
synthesized from input RNA molecules, such as adaptors and/or one or more identifiers such as UMI sequences. The first tags comprise a binding partner of a first binding pair, and the present kits can also comprise one or more second tags comprising a reciprocal binding partner of the first binding pair.
[0094] In some embodiments, the present kits further comprise one or more bridge probes that comprises a target specific region which hybridizes to a target sequence of an input nucleic acid molecule; and an anchor probe that comprises a bridge-binding-sequence which hybridizes to an anchor-probe-landing sequence of the bridge probe. In some embodiments, the kit comprises two, three or more bridge probes.
[0095] In addition to above-mentioned components, the kits may further include instructions for using the components of the kit to practice the present methods, i.e., to prepare nucleic acids for sequencing. The instructions for practicing the present methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD- ROM, portable drive, or cloud-based storage, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g., via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
TERMINOLOGY
[0096] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present teachings, some exemplary methods and materials are now described.
[0097] All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present claims are not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided can be different from the actual publication dates which can need to be independently confirmed.
[0001] Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
[0002] The present technology may employ, unless otherwise indicated, techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art. Such techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label.
[0003] As used herein, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. For example, the term “a primer” refers to one or more primers, i.e., a single primer and multiple primers. A “plurality” contains at least 2 members. In
certain cases, a plurality may have at least 10, at least 100, at least 100, at least 10,000, at least 100,000, at least 106, at least 107, at least IO8 or at least IO9 or more members.
[0004] It is further noted that the claims can be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
[0005] As used in the specification and appended claims, and in addition to their ordinary meanings, the terms "substantial" or "substantially" mean to within acceptable limits or degree to one having ordinary skill in the art. For example, "substantially inactive" means that one skilled in the art considers the level of activity to be negligible.
[0098] The term “target” as used herein refers to a nucleic acid of interest, or which is desired for sequencing and/or other analysis. One or more targets may be present within an input nucleic acid, or a construct or complex made from an input nucleic acid. A target may be singlestranded or double-stranded, and often is double-stranded DNA when attached to an adaptor to form a nucleic acid construct. Target as used herein can refer to a specific sequence or the complement thereof or to both. The term target encompasses any nucleic acid molecule of biological or synthetic origin whose sequence or other characteristic is of interest. The target sequence does not include identifiers, primer binding regions, or adaptors sequences which may be added to the input nucleic acid molecule to prepare an input nucleic acid construct for sequencing or other analysis. A target may be within a nucleic acid in vitro or in vivo within the genome of a cell, or within the cytoplasm of a cell (such as RNA), or with a biological fluid (such as blood, plasma, amniotic fluid, or other biological sample).
[0099] The term “input” refers to a nucleic acid molecule to be processed in accordance with the present methods. For example, an input nucleic acid molecule may be present in a nucleic acid sample. The input may include one or more target sequences of interest, or it may include other sequences from which a target is desired to be separated. In some embodiments, an input nucleic acid comprises one or more sequences complementary to sequences of one or more capture probes, bridge probes, or other types of probes.
[00100] The terms “amplifying” and “amplification” as used herein refer to synthesizing nucleic acid molecules that are complementary to one or both strands of an input nucleic acid. Amplifying a nucleic acid molecule may include denaturing a double-stranded input nucleic acid, annealing primers the input nucleic acid at a temperature that is below the melting temperatures of the primers, and enzymatically elongating from the primers to generate an amplification product. The terms “amplicon” or “amplification product” refer to the nucleic acid sequences which are produced from an amplifying process, including the nucleic acid molecules synthesized by amplifying the input nucleic acid or its complementary sequence, as well as the nucleic acid molecules synthesized from other amplicons. The denaturing, annealing and elongating steps each can be performed one or more times. Amplification generally does not change the target or input nucleic acid sequence unless errors arise during the amplification.
[00101] Amplification typically requires the presence of deoxyribonucleoside triphosphates, a DNA polymerase enzyme and an appropriate buffer and/or co-factors for optimal activity of the polymerase enzyme. Reverse transcription is a linear amplification reaction that employs a specialized DNA polymerase (reverse transcriptase) to copy RNA into cDNA (complementary DNA) using deoxyribonucleoside triphosphates.
[00102] The term “adaptor” generally refers to a nucleic acid molecule that is attached to an input nucleic acid molecule to add a desired structure or function. The term “tag” also generally refers to a moiety that can add a desired structure or function, though it is contemplated that a tag may be a nucleic acid molecule, a molecule other than a nucleic acid, or a combination thereof. For example, a “tag” as used herein can comprise an adaptor conjugated to a non-nucleic acid binding partner such as DIG. As another example, a “tag” as used herein can comprise an antibody conjugated to a biotin moiety. As another example, an adaptor can be attached to an input fragment or an amplicon thereof to add a binding site for a NGS platform. In some embodiments, an adaptor refers to molecules that are at least partially double-stranded. An adaptor or a tag may be any desired length, including but not limited to 40 to 150 bases in length, e.g., 50 to 120 bases, although adaptors and tags outside of this range are envisioned.
[00103] The terms “identifier” or “barcode” refers to a sequence of nucleotides used to identify the origin of a sequence. Identifiers may comprise sample indices or sample barcodes, where the same sequence is shared for all nucleic acids from a particular source, organism, or sample. Sample barcodes enable the mixing of nucleic acids from different samples in one sequencing run, as the different sample barcode sequences enable the correct assignment of sequencing reads to each sample. One, two, or more sample barcodes may be used. Identifiers also comprise molecular barcodes (MBCs) or unique molecular identifier (UMI) sequences, which function to identify copies of individual input nucleic acid molecules. UMIs may comprise random nucleotides, known nucleotides, or a mixture of random and known nucleotides. UMIs enable more accurate sequencing by allowing error correction of sequences and more accurate estimation of the original number of input nucleic acids. In some embodiments, a large number of UMIs is used (e.g., 100,000, 1 million, 1 billion, or more
possible sequences) such that each input nucleic acid has a unique molecular barcode. Molecular barcodes called degenerate base regions (DBR) are disclosed in US Patent 8,481,292 (Population Genetics Technologies Ltd.). The DBRs are random sequence tags that are attached to molecules that are present in the sample. DBRs and other molecular barcodes allow one to distinguish PCR errors during sample preparation from mutations and other variants that were present in the original input nucleic acid.
[00104] In other embodiments, a smaller number of molecular barcodes is used, and the beginning or ending positions (or both) of the sequence read are used together with the molecular barcode to identify copies arising from a unique input nucleic acid. Molecular barcodes may be combined with sample barcodes, on the same or different portions of the target nucleic acid. Molecular barcodes may be added to one end of a nucleic acid template (e.g., the 5’ end of the + strand, and the 3’ end of the - strand in a duplex), or to both ends of an input nucleic acid (e.g., to both the 5; and the 3’ ends of both the + and the - strands of the duplex).
[0006] The term “sample” as used herein relates to a material or mixture of materials containing one or more nucleic acids of interest. In some embodiments, the term refers to any plant, animal or viral material containing DNA, RNA, or other nucleic acid, such as, for example, tissue or fluid isolated from a patient (including without limitation plasma, serum, amniotic fluid, cerebrospinal fluid, lymph, tears, saliva and tissue sections), from preserved tissue (such as FFPE sections) or from in vitro cell culture constituents, as well as samples from the environment. Any sample containing nucleic acid, e.g., genomic DNA from tissue culture cells or from a sample of tissue, may be employed in the present technology.
[0007] The term “nucleic acid sample” as used herein denotes a sample containing nucleic acids. The nucleic acid samples may be complex in that they contain multiple different
molecules that contain sequences. Nucleic acid samples from a mammal (e g., mouse or human) are types of complex samples. Complex samples may have more than 104, 105, 106 or 107 different nucleic acid molecules. Also, a complex sample may comprise only a few molecules, where the molecules collectively have more than 104, 105, 106 or 107 or more nucleotides. The term “complexity” generally refers the total number of different sequences in a population, such as in a population of fragments, adaptors, or adaptor-ligated fragments. For example, if a population has 4 different sequences, then that population has a complexity of 4. A population may have a complexity of at least 4, at least 8, at least 16, at least 100, at least 1,000, at least 10,000 or at least 100,000 or more, depending on the desired result.
[00105] The term “nucleotide” as used herein refers to a phosphate ester of a nucleoside, wherein the esterification site typically corresponds to the hydroxyl group attached to the C-5 position of the pentose sugar. In some cases nucleotides comprise nucleoside polyphosphates. However, the terms “added nucleotide,” “incorporated nucleotide,” “nucleotide added” and “nucleotide after incorporation” all refer to a nucleotide residue that is part of an oligonucleotide or polynucleotide chain.
[0008] The term “nucleotide” refers to naturally-occurring nucleotides including guanine, cytosine, adenine, thymine, uracil (G, C, A, T and U respectively), as well as modified pyrimidine and purine derivatives and other non-naturally occurring moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the term “nucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleotides also include
modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, are functionalized as ethers, amines, or the likes.
[0009] The term “nucleic acid” and “polynucleotide” are used interchangeably herein to describe a nucleotide-containing polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced naturally, chemically, enzymatically or synthetically. The term includes polymers having PNA, LNA or UNA. DNA and RNA have a deoxyribose and ribose sugar backbone, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. In PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. A locked nucleic acid (LNA), often referred to as inaccessible RNA, is a modified RNA nucleotide. The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2' oxygen and 4' carbon. The bridge "locks" the ribose in the 3'-endo (North) conformation, which is often found in the A-form duplexes. LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired. The term “unstructured nucleic acid”, or “UNA”, is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability. For example, an unstructured nucleic acid may contain a G’ residue and a C’ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively.
[00106] The terms “nucleoside”, “nucleotide”, “deoxynucleoside”, and “deoxynucleotide” are intended to include those moieties that contain not only the known purine and pyrimidine
bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the “nucleoside”, “nucleotide”, “deoxynucleoside”, and
“deoxynucleotide” include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides, deoxynucleosides or deoxynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
[00107] Natural nucleotides or nucleosides are defined herein as adenine (A), thymine (T), guanine (G), and cytosine (C). It is recognized that certain modifications of these nucleotides or nucleosides occur in nature. However, modifications of A, T, G, and C that occur in nature that affect hydrogen bonded base pairing are considered to be non-naturally occurring. For example, 2-aminoadenosine is found in nature, but is not a “naturally occurring” nucleotide or nucleoside as that term is used herein. Other non-limiting examples of modified nucleotides or nucleosides that occur in nature that do not affect base pairing and are considered to be naturally occurring are 5-methylcytosine, 3 -methyladenine, O(6)-methylguanine, and 8-oxoguanine, etc. Nucleotides include any nucleotide or nucleotide analog, whether naturally-occurring or synthetic. Exemplary nucleotides include phosphate esters of deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine, adenosine, cytidine, guanosine, and uridine. Other nucleotides include an adenine, cytosine, guanine, thymine base, a xanthine or hypoxanthine, 5 -bromouracil, 2- aminopurine, deoxyinosine, or methylated cytosine, such as 5-methylcytosine, and N4- methoxydeoxycytosine. Also included are bases of polynucleotide mimetics, such as methylated nucleic acids, e.g., 2'-O-methRNA, peptide nucleic acids, modified peptide nucleic acids, locked
nucleic acids and any other structural moiety that can act substantially like a nucleotide or base, for example, by exhibiting base-complementarity with one or more bases that occur in DNA or RNA and/or by being capable of base-complementary incorporation, and includes chainterminating analogs. A nucleotide corresponds to a specific nucleotide species if they share basecomplementarity with respect to at least one base.
[00108] In addition to purines and pyrimidines, modified nucleotides or analogs, as those terms are used herein, include any compound that can form a hydrogen bond with one or more naturally occurring nucleotides or with another nucleotide analog. Any compound that forms at least two hydrogen bonds with T or with a derivative of T is considered to be an analog of A or a modified A. Similarly, any compound that forms at least two hydrogen bonds with A or with a derivative of A is considered to be an analog of T or a modified T. Similarly, any compound that forms at least two hydrogen bonds with G or with a derivative of G is considered to be an analog of C or a modified C. Similarly, any compound that forms at least two hydrogen bonds with C or with a derivative of C is considered to be an analog of G or a modified G. It is recognized that under this scheme, some compounds will be considered for example to be both A analogs and G analogs (purine analogs) or both T analogs and C analogs (pyrimidine analogs).
[00109] As used herein, the term “nucleic acid construct” refers to a nucleic acid that is ligated or otherwise attached to another nucleic acid, such as an adaptor. For example, a nucleic acid construct may contain a nucleic acid molecule to be sequenced, a capture site for flowcell attachment, one or more identifier sequences such as SBC and UMI, and primer binding sites for a first and second primer.
[00110] As used herein, the term “capture site” refers to a nucleic acid sequence configured for attachment of a nucleic acid construct to a flowcell or other surface, for NGS sequencing or other analysis processing.
[00111] As used herein, the term “identifier” refers to a nucleic acid sequence that can be used to identify a particular nucleic acid construct. An “identifier” may be a “sample barcode” or “SBC” sequence for identifying a particular biological sample. An “identifier” may also refer to a “molecular barcode” for identification of unique molecules present in the sample. Also, an “identifier” may contain both an SBC and an UMI.
[00112] The term “antibody” is well understood by those in the field and is used interchangeably herein with “immunoglobulin” Those terms refer to a protein consisting of one or more polypeptides that specifically binds an antigen. One example of an antibody is the naturally occurring structural unit found in humans and other mammals which comprises a tetramer of two identical pairs of antibody chains, each pair having one light and one heavy chain. In each pair, the light and heavy chain variable regions are together responsible for binding to an antigen, and the constant regions are responsible for the antibody effector functions. The term antibody encompasses monoclonal antibodies, polyclonal antibodies, chimeric antibodies, humanized antibodies, human antibodies, murine antibodies, rabbit antibodies, camelid antibodies, and antibodies from other mammalian and non-mammalian species. The term antibody also encompasses single-chain antibodies, bi-specific hybrid antibodies, and fusion proteins comprising an antigen-binding portion of an antibody and a nonantibody protein. The term antibody also encompasses includes antigen-binding fragments of antibodies which retain specific binding to antigen, including, but not limited to, Fab, Fv, scFv, and Fd fragments.
[00113] The term “binding pair” as used herein refers to a pair of binding partners that exhibit specific binding between them. In some embodiments, a binding pair can selectively interact through covalent or non-covalent binding. In some embodiments, a binding pair can selectively interact by hybridization, ionic bonding, hydrogen bonding, van der Waals interactions, or any combination of these forces. In some embodiments, a binding partner can comprise, for example, biotin, avidin, streptavidin, digoxigenin, inosine, avidin, GST sequences, modified GST sequences, biotin ligase recognition (BiTag) sequences, S tags, SNAP -tags, enterokinase sites, thrombin sites, antibodies or antibody domains, antibody fragments, antigens, receptors, receptor domains, receptor fragments, or combinations thereof. Examples of binding pairs include biotimavidin, biotin: streptavidin, antibody: antigen, complementary nucleic acids, hapten/antibody, lectin/carbohydrate, apoprotein/cofactor and biotin/streptavidin, as well as others set forth above.
[00114] The term “specific binding” refers to the ability of a binding partner to preferentially bind to its reciprocal binding partner that is present in a homogeneous mixture of different molecules. In some embodiments, specific binding discriminates between a reciprocal binding partner and other molecules by at least 100-fold, 1000-fold, 10,000-fold, 100,000-fold, or more. In some embodiments, the affinity between binding partners of a binding pair when they are specifically bound in a complex is characterized by a KD (dissociation constant) of less than 10'6 M, less than 10‘7 M, less than 10'8 M, less than 10‘9 M, less than IO-10 M, less than 10’11 M, or less than about 10‘12 M, or less.
[00115] As used herein, a “capture binding partner” refers to a binding partner that is configured to capture (e.g., isolate, purify, immobilize, extract) a nucleic acid tagged with its reciprocal binding partner. For example, streptavidin coated on a bead would be a capture
binding partner for an input nucleic acid complex having a tag comprising a biotin moiety. A capture binding partner and its reciprocal binding partner may comprise any suitable binding pair.
Exemplary Embodiments
[00116] Embodiment 1. A method of preparing RNA molecules for analysis comprising providing a sample comprising input DNA molecules and input RNA molecules; selectively hybridizing a tagged primer to the input RNA molecules, wherein the tagged primer comprises a first tag comprising a binding partner of a first binding pair; and extending the primer to produce tagged cDNA molecules.
[00117] Embodiment 2. The method of embodiment 1, wherein the input RNA molecules comprise a primer binding site that hybridizes with the tagged primer, and the input DNA molecules do not comprise the same primer binding site.
[00118] Embodiment 3. The method of embodiment 2, wherein the primer binding site is poly(A).
[00119] Embodiment 4. The method of embodiment 2, wherein the primer binding site is ligated to the input RNA molecules before the hybridizing of the tagged primer.
[00120] Embodiment 5. The method of any of embodiments 1 to 4, wherein the first tag is selected from digoxigenin, 5-bromo-2’-deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotri acetic acid or a nitrilotriacetate (NT A) such as nickel nitrilotriacetate (Ni-NTA), trisNitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
[00121] Embodiment 6. The method of embodiment 5, wherein the first tag is digoxigenin
(DIG).
[00122] Embodiment 7. The method of any of embodiments 1-6, further comprising attaching a second tag to the first tag to produce a dual tagged cDNA complex, wherein the second tag comprises a reciprocal binding partner of the first binding pair.
[00123] Embodiment 8. The method of embodiment 7, wherein the reciprocal binding partner in the second tag is selected from an anti-DIG antibody, an anti-BrdU antibody, an anti- DNP antibody, poly-Histidine, a tyrosine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
[00124] Embodiment 9. The method of embodiment 8, wherein the second tag comprises anti-DIG antibody as the reciprocal binding partner of the first binding pair.
[00125] Embodiment 10. The method of any of embodiments 7 to 9, wherein the second tag further comprises a binding partner of a second binding pair.
[00126] Embodiment 11. The method of embodiment 10, wherein the binding partner of the second binding pair is selected from biotin, 5-bromo-2’ -deoxyuridine (BrdU), 2,4- dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NT A) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide, with the proviso that the binding partner of the second binding pair is not the binding partner or the reciprocal binding partner of the first binding pair.
[00127] Embodiment 12. The method of embodiment 11, wherein a reciprocal binding partner of the second binding pair is selected from an avidin moiety, an anti-DIG antibody, an anti-BrdU antibody, an anti-DNP antibody, poly-Histidine, a tyrosine residue, a thiol, an amine, an aldehyde, an alkyne, and an azide.
[00128] Embodiment 13. The method of embodiment 10, wherein the binding partner of the second binding pair is biotin, and a reciprocal binding partner of the second binding pair is an avidin moiety.
[00129] Embodiment 14. The method of embodiment 13, wherein the first tag is digoxigenin (DIG), and the second tag comprises anti-DIG antibody as the reciprocal binding partner of a first binding pair comprising the first tag and the second tag.
[00130] Embodiment 15. The method of any of embodiments 1-14, wherein the first tag is covalently bound to the tagged cDNA molecule.
[00131] Embodiment 16. The method of any of embodiments 1-15, wherein the second tag is non-covalently bound to the first tag.
[00132] Embodiment 17. The method of any of embodiments 10-16, further comprising separating the dual tagged cDNA complex from the input DNA molecules by binding the partner of the second binding pair to its reciprocal binding partner.
[00133] Embodiment 18. The method of any of embodiments 12 to 17, wherein the reciprocal binding pair of the second binding pair is attached to a solid support.
[00134] Embodiment 19. The method of embodiment 18, wherein the second tag comprises an antibody conjugated with biotin, and the reciprocal binding partner of the second binding pair is an avidin moiety coated on a solid support.
[00135] Embodiment 20. The method of embodiment 1, further comprising hybridizing the tagged cDNA molecules to a probe.
[00136] Embodiment 21. The method of embodiment 7, further comprising hybridizing the dual tagged cDNA complex to a probe.
[00137] Embodiment 22. The method of embodiment 20 or 21, wherein the probe is a bridge probe.
[00138] Embodiment 23. The method of embodiment 22, further comprising hybridizing the bridge probe with an anchor probe.
[00139] Embodiment 24. The method of embodiment 23, the anchor probe is attached to a biotin moiety.
[00140] Embodiment 25. The method of any of embodiments 1 to 24, further comprising attaching adaptors to the input DNA molecules, wherein the adaptors do not comprise the first tag.
[00141] Embodiment 26. The method of embodiment 25, further comprising processing the input DNA molecule to produce processed DNA molecules.
[00142] Embodiment 27. The method of embodiment 26, wherein the processing is amplification, and the processed DNA molecules are amplicons of the input DNA molecules.
[00143] Embodiment 28. The method of embodiment 27, further comprising processing the tagged cDNA molecules to produce processed cDNA molecules.
[00144] Embodiment 29. The method of embodiment 28, further comprising combining the processed DNA molecules and the processed cDNA molecules for sequencing.
[00145] In view of this disclosure it is noted that the methods and kits can be implemented in keeping with the present teachings. Further, the various components, materials, structures and parameters are included by way of illustration and example only and not in any limiting sense.
In view of this disclosure, the present teachings can be implemented in other applications and components, materials, structures and equipment to implement these applications can be determined, while remaining within the scope of the appended claims.
Claims
1. A method of preparing RNA molecules for analysis comprising: providing a sample comprising input DNA molecules and input RNA molecules; selectively hybridizing a tagged primer to the input RNA molecules, wherein the tagged primer comprises a first tag comprising a binding partner of a first binding pair; and extending the primer to produce tagged cDNA molecules.
2. The method of claim 1, wherein the input RNA molecules comprise a primer binding site that hybridizes with the tagged primer, and the input DNA molecules do not comprise the same primer binding site.
3. The method of claim 2, wherein the primer binding site is poly(A).
4. The method of claim 2, wherein the primer binding site is ligated to the input RNA molecules before the hybridizing of the tagged primer.
5. The method of claim 1, wherein the first tag is selected from digoxigenin, 5- bromo-2’-deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
6. The method of claim 5, wherein the first tag is digoxigenin (DIG).
7. The method of claim 1, further comprising attaching a second tag to the first tag to produce a dual tagged cDNA complex, wherein the second tag comprises a reciprocal binding partner of the first binding pair.
8. The method of claim 7, wherein the reciprocal binding partner in the second tag is selected from an anti-DIG antibody, an anti-BrdU antibody, an anti-DNP antibody, poly- Histidine, a tyrosine, a thiol, an amine, an aldehyde, an alkyne, and an azide.
9. The method of claim 8, wherein the second tag comprises anti-DIG antibody as the reciprocal binding partner of the first binding pair.
10. The method of claim 9, wherein the second tag further comprises a binding partner of a second binding pair.
11. The method of claim 10, wherein the binding partner of the second binding pair is selected from biotin, 5-bromo-2’ -deoxyuridine (BrdU), 2,4-dinitrophenyl (DNP), nitrilotriacetic acid or a nitrilotriacetate (NTA) such as nickel nitrilotriacetate (Ni-NTA), tris-Nitrilotriacetate (tris-NTA), a tyramine, a thiol, an amine, an aldehyde, an alkyne, and an azide, with the proviso that the binding partner of the second binding pair is not the binding partner or the reciprocal binding partner of the first binding pair.
12. The method of claim 11, wherein a reciprocal binding partner of the second binding pair is selected from an avidin moiety, an anti-DIG antibody, an anti-BrdU antibody, an anti-DNP antibody, poly-Histidine, a tyrosine residue, a thiol, an amine, an aldehyde, an alkyne, and an azide.
13. The method of claim 10, wherein the binding partner of the second binding pair is biotin, and a reciprocal binding partner of the second binding pair is an avidin moiety.
14. The method of claim 13, wherein the first tag is digoxigenin (DIG), and the second tag comprises anti-DIG antibody as the reciprocal binding partner of a first binding pair comprising the first tag and the second tag.
15. The method of claim 1, wherein the first tag is covalently bound to the tagged cDNA molecule.
16. The method of claim 15, wherein the second tag is non-covalently bound to the first tag.
17. The method of claim 16, further comprising separating the dual tagged cDNA complex from the input DNA molecules by binding the partner of the second binding pair to its reciprocal binding partner.
18. The method of claim 17, wherein the reciprocal binding pair of the second binding pair is attached to a solid support.
19. The method of claim 18, wherein the second tag comprises an antibody conjugated with biotin, and the reciprocal binding partner of the second binding pair is an avidin moiety coated on a solid support.
20. The method of claim 1, further comprising hybridizing the tagged cDNA molecules to a probe.
21. The method of claim 7, further comprising hybridizing the dual tagged cDNA complex to a probe.
22. The method of claim 20 or 21, wherein the probe is a bridge probe.
23. The method of claim 22, further comprising hybridizing the bridge probe with an anchor probe.
24. The method of claim 23, the anchor probe is attached to a biotin moiety.
25. The method of claim 1, further comprising attaching adaptors to the input DNA molecules, wherein the adaptors do not comprise the first tag.
26. The method of claim 25, further comprising processing the input DNA molecule to produce processed DNA molecules.
27. The method of claim 26, wherein the processing is amplification, and the processed DNA molecules are amplicons of the input DNA molecules.
28. The method of claim 27, further comprising processing the tagged cDNA molecules to produce processed cDNA molecules.
29. The method of claim 28, further comprising combining the processed DNA molecules and the processed cDNA molecules for sequencing.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463636497P | 2024-04-19 | 2024-04-19 | |
| US63/636,497 | 2024-04-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025221722A1 true WO2025221722A1 (en) | 2025-10-23 |
Family
ID=97404222
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/024672 Pending WO2025221722A1 (en) | 2024-04-19 | 2025-04-15 | Methods of selectively tagging nucleic acids for analysis of rna sequences |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025221722A1 (en) |
-
2025
- 2025-04-15 WO PCT/US2025/024672 patent/WO2025221722A1/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220259638A1 (en) | Methods and compositions for high throughput sample preparation using double unique dual indexing | |
| EP3204518B1 (en) | Universal blocking oligo system and improved hybridization capture methods for multiplexed capture reactions | |
| US10072283B2 (en) | Direct capture, amplification and sequencing of target DNA using immobilized primers | |
| WO2012003374A2 (en) | Targeted sequencing library preparation by genomic dna circularization | |
| JP2024099616A (en) | Sequencing methods for detecting genomic rearrangements | |
| JP7761619B2 (en) | Method for accurate parallel detection and quantification of nucleic acids - Patent Application 20070122997 | |
| JP7651497B2 (en) | A sensitive method for accurate parallel quantification of nucleic acids | |
| JP2025041969A (en) | Method for accurate parallel quantification of nucleic acids in diluted or unpurified samples | |
| EP2785865A1 (en) | Method and kit for characterizing rna in a composition | |
| US20240271126A1 (en) | Oligo-modified nucleotide analogues for nucleic acid preparation | |
| US11174511B2 (en) | Methods and compositions for selecting and amplifying DNA targets in a single reaction mixture | |
| WO2025221722A1 (en) | Methods of selectively tagging nucleic acids for analysis of rna sequences | |
| WO2025221683A1 (en) | Methods of indirect tagging of nucleic acids | |
| JP7490071B2 (en) | Novel nucleic acid template structures for sequencing | |
| WO2025221668A1 (en) | Methods of preparing nucleic acids for sequencing and methylation analysis | |
| US12037640B2 (en) | Sequencing an insert and an identifier without denaturation | |
| JP7762690B2 (en) | A highly sensitive method for accurate parallel quantification of mutant nucleic acids | |
| KR20250065218A (en) | Highly sensitive methods for accurate parallel quantification of nucleic acids | |
| CN118696131A (en) | Systems and methods for targeted nucleic acid capture and barcoding | |
| HK40076229A (en) | Methods and compositions for high throughput sample preparation using double unique dual indexing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25790822 Country of ref document: EP Kind code of ref document: A1 |