WO2024215891A1 - Methods for nucleic acid labeling - Google Patents
Methods for nucleic acid labeling Download PDFInfo
- Publication number
- WO2024215891A1 WO2024215891A1 PCT/US2024/024078 US2024024078W WO2024215891A1 WO 2024215891 A1 WO2024215891 A1 WO 2024215891A1 US 2024024078 W US2024024078 W US 2024024078W WO 2024215891 A1 WO2024215891 A1 WO 2024215891A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- dna
- sequence
- aco
- click
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K19/00—Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1252—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
- C12P19/28—N-glycosides
- C12P19/30—Nucleotides
- C12P19/34—Polynucleotides, e.g. nucleic acids, oligoribonucleotides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- nucleic acid sequencing technologies have advanced various fields including genetics, functional genomics, genome editing, transcriptomics, and a range of other basic and medical science disciplines. Despite a dramatic reduction in the per-base cost of sequencing, some individual processes related to the use of these technologies remain inefficient and expensive. For instance, to prepare libraries of nucleic acid samples for sequencing, many approaches require the selective enrichment of target regions of interest via PCR-based amplification (to label the molecules of interest to interact with the sequencing platform and reduce background sequencing; Fig. 1 A) 1 4 , or without enrichment they rely on ultra-deep whole-genome or -transcriptome sequencing of the entire sample to achieve sufficient coverage of the region of interest (Fig. IB).
- the amplification-free CAGE and CAPTURE methods facilitate a variety of applications including more accurate understanding of DNA and RNA editing outcomes, and more scalable and unbiased interrogation of genomic, epigenomic, transcriptomic, and epitranscriptomic biology.
- the present methods use a nuclease with a DNA or RNA template (e.g., either click editor or prime editor) and a DNA- or RNA-dependent polymerase to install a DNA or RNA 3 ’ overhang onto a target DNA or RNA substrate.
- a DNA or RNA template e.g., either click editor or prime editor
- a DNA- or RNA-dependent polymerase e.g., either click editor or prime editor
- a DNA binding domain nuclease DBDn
- Cas9 clkDNA tethering domain
- gRNAs guide RNAs
- gRNAs guide RNAs
- ssDNA, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, or modified DNA or RNA comprising a localization moiety that binds to the clkDNA tethering domain, a sequence complementary to an adaptor- complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR)
- a DNA-dependent DNA polymerase optionally wherein the polymerase is linked or
- a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an adaptor-complementary overhang (ACO) sequence.
- the methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence on a substrate nucleic acid; and (iii) one, two, or more clkDNA oligonucleotide templates (ssDNA, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, modified DNA, or modified DNA/RNA bases) comprising a localization moiety that binds to the cl
- the methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor or overhang sequences comprise: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to above 60°C, e.g., to about 72°C; contacting the modified nucleic acid of DNA with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA (e.g., 20-27°C or room temperature), thereby producing a product nucleic
- the methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs at comprising ACO sequence produced by a method described herein; contacting the modified nucleic acid of DNA with a ligase and a single stranded oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, modified DNA, or modified DNA/RNA bases) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos
- Also provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and (iii) one, two, or more clkDNA oligonucleotide templates (i.e., single stranded templates) comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order
- the methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and (iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template for the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the g
- DBDn DNA binding domain nuclease
- gRNAs guide RNAs
- methods comprising: preparing a reaction mixture comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected targeted nucleic acid substrate; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn.
- DBDn DNA binding domain nuclease
- PBS primer binding site
- RTT reverse transcriptase template
- the methods comprise (A) providing a reaction mixture prepared according to a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order,
- the methods comprise: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid of DNA with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA (e.g., 20- 27°C or room temperature), thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequence
- the methods comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid of DNA with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences
- the methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn; (B) providing a reaction mixture prepared by a method described herein, comprising (i) a DNA binding domain nuclease (
- the methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn; (B) providing a sample comprising
- RNA binding domain nuclease RBDn
- Casl3 RNA binding domain nuclease
- gRNA guide RNA
- gRNA guide RNA
- gRNA guide RNA
- Also provided herein are methods for generating a modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence comprising: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence; (iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor- complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein
- the method comprise: providing a sample comprising a modified nucleic acid of DNA comprising a 3’ end comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid (e.g., 20-27°C or room temperature for nonthermostable ligase), thereby producing a further modified nucleic acid of DNA having ends comprised of
- the methods comprise: providing a modified nucleic acid of DNA comprising a 3’ end comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid with a ligase and a single stranded DNA oligonucleotide modified with a clickcompatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid of dsDNA having a 5’ end comprising a click-compatible moiety; and contacting the further modified nucleic acid of DNA having a 5’ end comprising a click-compatible moiety with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with
- RNA binding domain nuclease RBDn
- gRNA guide RNA
- a method for generating a product nucleic acid comprising: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a RNA binding domain nuclease (RBDn), optionally Cast 3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence; (iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the
- RNA binding domain nuclease RBDn
- pegRNA that binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn.
- PBS primer binding site
- RTT reverse transcriptase template
- Also provided herein are methods for generating a further modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence comprising: (A) providing a reaction mixture prepared according a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) at least one pegRNAthat binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding a sequence complementary to the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the g
- methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprising: providing a sample comprising a modified nucleic acid of DNA comprising a 3’ end comprising ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid of DNA with a double stranded DNA oligonucleotide comprising defined adaptor sequences and a ssDNA 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions (e.g., 20-27°C or room temperature for non-thermostable ligase) sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA, thereby producing a region of dsDNA
- the methods comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moi eties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’
- the methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) one, two, or more pegRNAs that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn; (B) providing a reaction mixture prepared by a method described herein, comprising (i) a RNA binding domain nuclease (RB
- a method for generating a product nucleic acid, e.g., RNA, having ends comprised of defined adaptor sequences comprising: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) one, two, or more pegRNAs that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding a sequence complementary to the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the R
- Also provided herein are methods comprising: cleaving a nucleic acid using a sequence-specific nuclease (e.g., via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA); writing adaptor-complementary overhang (ACO) sequences onto a 3’ end of a cleaved nucleic acids (e.g., in a DNA version of this method via a DNA-dependent DNA polymerase of the PCE using the clkDNA as a template); and ligating an oligonucleotide, e.g., a sequencing adaptor, onto the 3’ ACO.
- a sequence-specific nuclease e.g., via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA
- ACO adaptor-complementary overhang
- the nucleic acid is DNA
- the nuclease is a Class II type II CRISPR, optionally CRISPR-Cas9.
- the nucleic acid is RNA
- the nuclease is a class II type VI CRISPR, optionally CRISPR-Casl3.
- the method further comprises sequencing the nucleic acid using the sequencing adaptor.
- the DBDn can be a Cas-family enzymes (e.g., Cas9 or Casl2), a TnpB-family enzyme, or an IscB-family enzyme, e.g., as described herein.
- Cas-family enzymes e.g., Cas9 or Casl2
- TnpB-family enzyme e.g., TnpB-family enzyme
- IscB-family enzyme e.g., as described herein.
- linked can include non-covalent attachment or association such as recruitment of the clkDNA through the gRNA (e.g., via RNA hairpin on the gRNA and RNA aptamer binding protein fused to a HUH, enabling recruitment of the clkDNA to Cas9 via the gRNA).
- a plurality of different ACOs are used; in such methods, preferably a plurality of adaptors are used that correspond or are fully or partly complementary in sequence to the different ACOs (e.g., if two different ACO sequences are used, two different adaptor sequences are present in the mix, each complementary or partly complementary (e.g., at least 50%, 60%, 70%, 80%, 90%, or 95%, complementary) to one of the ACO sequences).
- the gRNAs are allowed to bind to the DBDn before the clkDNA oligonucleotide templates are added to the mixture.
- Figure 1A-D Overview of methods for target molecule enrichment and sequencing.
- A Schematic of amplification-based methods to enrich for target regions or to add sequencer-specific adapter sequences.
- B Schematic of ultra-deep sequencing to permit adequate coverage of a region of interest.
- C Schematic of current Cas9-targeted, phosphorylation-based enrichment approach.
- D Schematic of one embodiment of a CAGE sequence-specific enrichment approach, leading to more selective background-free target substrate enrichment without amplification via user- specifiable adapter-complementary overhangs (ACOs).
- FIG. 2A-C Workflow for exemplary CAGE and sequence-specific adaptor ligation to DNA targets.
- A Ribonucleoprotein (RNP) formation of exemplary Click Editors using a DNA-binding domain nuclease (DBDn), HUH endonuclease, and DNA-dependent polymerase complex (where the DNA polymerase can be unfused or fused to the DBDn) in combination with a guide RNA (gRNA) and clkDNA pair.
- B CAGE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the modified DNA.
- C CAGE workflow using oligo annealing to the ACO and subsequent adaptor attachment via click chemistry.
- ACO adaptor-complementary overhang
- cACO complement of the ACO
- gRNA guide RNA
- clkDNA click DNA template oligonucleotide
- FBR flap-binding region
- PT polymerization template
- RNP ribonucleoprotein
- DBDn DNA binding domain nuclease
- DNA deoxyribonucleic acid
- HUH histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease.
- FIG. 3A-B Detailed exemplary 6-step and 5-step CAGE workflow.
- A Schematic of an exemplary 6-step CAGE workflow.
- a heat inactivation step (step 4) dissociates Cas9 from the target DNA, liberating the click-edited ACO for adaptor ligation in subsequent steps.
- the DNA Ligase and adaptors are subsequently added in step 5.
- B Schematic of an exemplary 5-step CAGE workflow. DNA ligase and adaptor are added together with other components in a single reaction prior to click editing. No heat inactivation step is included.
- ACO adaptor-complementary overhang
- cACO complement of the ACO
- sgRNA single guide RNA
- gRNA guide RNA
- clkDNA click DNA template oligonucleotide
- FBR flap-binding region
- PT polymerization template
- DBDn DNA binding domain nuclease
- RNP ribonucleoprotein
- DNA deoxyribonucleic acid
- HUH a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease
- RT room temperature
- dNTPs deoxynucleoside triphosphates
- ATP adenosine triphosphate.
- FIG. 4A-G CAGE-mediated adaptor ligation on PCR-generated fragments and human genomic DNA.
- A Schematic of exemplary single-end CAGE workflow to assess feasibility of sequence-specific ACO addition to a substrate followed by adaptor ligation.
- B Junction PCR across the adaptortarget junction of the product DNA to verify adaptor ligation following CAGE reactions on PCR- fragments. Bands illustrating PCR amplification are only present in conditions that have the adaptor successfully ligated; assessed via Qiaxcel capillary electrophoresis (Qiagen).
- C Sanger sequencing results of the expected product (from a 6-step CAGE reaction) from PCR substate targeting experiments.
- E Junction PCR across the adaptor Target junction of the product DNA for human gDNA targeting (HEK site 3) experiments. Bands illustrating PCR amplification are only present in conditions that have the adaptor successfully ligated; assessed via Qiaxcel capillary electrophoresis (Qiagen).
- F Sanger sequencing results of the expected product (from 6-step CAGE reactions) from gDNA targeting experiments.
- ACO adaptor-complementary overhang
- cACO complement of the ACO
- gRNA guide RNA
- clkDNA click DNA template oligonucleotide
- FBR flapbinding region
- PT polymerization template
- RNP ribonucleoprotein
- DNA deoxyribonucleic acid
- HUH a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease
- RT room temperature
- dNTPs deoxynucleoside triphosphates
- ATP adenosine triphosphate
- PCR polymerase chain reaction
- CAGE click-assisted genome enrichment.
- thermostable DNA ligases Use of thermostable DNA ligases in CAGE reactions.
- Exemplary 5-step CAGE workflow similar to as shown in Fig. 3B, except that thermostable DNA ligases used; thermostable ligases are activated at the same temperature that the DBDn is inactivated, leading to single-pot click editing and adaptor ligation without user intervention.
- ACO adaptor-complementary overhang
- cACO complement of the ACO
- gRNA guide RNA
- sgRNA single guide RNA
- clkDNA click DNA template oligonucleotide
- FBR flap-binding region
- PT polymerization template
- DBDn DNA binding domain nuclease
- RNP ribonucleoprotein
- DNA deoxyribonucleic acid
- HUH a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease
- RT room temperature
- dNTPs deoxynucleoside triphosphates
- ATP adenosine triphosphate.
- FIG. 6A-C Workflow for PE-CAGE and sequence-specific adaptor ligation to DNA targets.
- A Ribonucleoprotein (RNP) formation of Prime Editors using a DNA-binding domain nuclease (DBDn) and RNA-dependent polymerase (e.g. reverse transcriptase (RT); where the RT can be unfused or fused to the DBDn) in combination with a prime editing guide RNA (pegRNA) and clkDNA pair.
- B PE- CAGE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the target DNA.
- C PE-CAGE workflow using oligo annealing to the ACO and subsequent adaptor attachment via click chemistry.
- cACO complement of the ACO
- PBS primer binding site
- RTT reverse transcriptase template.
- FIG. 7A-C PE-CAGE-mediated overhang installation on human genomic DNA.
- A Schematic of a single-end PE-CAGE experiment to assess feasibility of ACO installation using a human genomic DNA substrate (at HEK site 3).
- B Junction PCR (across the ACO:gDNA target junction) and Sanger sequencing results of the modified nucleic acid showing the expected ACO installation at the target site.
- C ACO installation efficiency on genomic DNA at HEK site 3, as assessed by ddPCR.
- ACO adaptor-complementary overhang
- cACO complement of the ACO
- RT reverse transcriptase
- PBS primer binding site
- RTT reverse transcriptase template
- pegRNA prime editing guide RNA
- DNA deoxyribonucleic acid
- gDNA genomic DNA.
- FIG. 8A-C Workflow for CAPTURE and sequence-specific adaptor ligation to RNA targets.
- A Ribonucleoprotein (RNP) formation of Click Editors using an RNA-binding domain nuclease (RBDn), HUH endonuclease, and DNA- dependent polymerase complex (where the DNA polymerase can be unfused or fused to the RBDn) in combination with a guide RNA (gRNA) and clkDNA pair.
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RBDn Ribonucleoprotein
- RBDn RNA-binding domain nuclease
- gRNA guide RNA
- C CAPTURE workflow using oligo annealing to the RNA ACO and subsequent adaptor attachment via click chemistry.
- clkDNA click DNA template oligonucleotide
- cACO complement of the ACO
- FBR flap-binding region
- PT polymerization template
- RNP ribonucleoprotein
- DNA deoxyribonucleic acid
- RNA ribonucleic acid
- HUH histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease
- RT room temperature
- dNTPs deoxynucleoside triphosphates
- ATP adenosine triphosphate
- PCR polymerase chain reaction
- CAGE click-assisted genome enrichment
- pol polymerase.
- FIG. 9A-C Workflow for PE-CAPTURE and sequence-specific adaptor ligation to RNA targets.
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RNP Ribonucleoprotein
- RBDn RNA-binding domain nuclease
- RT reverse transcriptase
- pegRNA prime editing guide RNA
- clkDNA click DNA
- RNA ribonucleic acid
- FIGS 10A-D Exemplary methodological flowcharts for CAGE and CAPTURE workflows.
- A Exemplary CAGE workflow using adaptors with ACO- complementary 3’ overhangs.
- B Exemplary CAGE workflow using oligo ligation and click-chemistry adaptor attachment.
- C Exemplary CAPTURE workflow using adaptors with ACO-complementary 3’ overhangs.
- D CAPTURE workflow using oligo ligation and click-chemistry adaptor attachment.
- boxed steps in the flowchart indicate potential user intervention.
- ACO adaptor-complementary overhang
- sgRNA single guide RNA
- pegRNA prime editing guide RNA
- clkDNA click DNA template oligonucleotide
- CAGE click-assisted genome enrichment
- CAPTURE click-assisted precise targeting of unaltered RNA for enrichment
- RNP ribonucleoprotein
- DBD DNA binding domain
- RBD RNA binding domain
- PE prime editor
- DNA deoxyribonucleic acid
- RNA ribonucleic acid
- HUH histidinehydrophobic residue-histidine (HUH) motif-containing endonuclease.
- dNTPs deoxynucleoside triphosphates
- ATP adenosine triphosphate.
- FIGS 11A-B CAGE reactions on PCR substrates with and without a tethering domain.
- Reactions were done with a range of clkDNA concentrations (0. IpM to 1 OpM) and a clkDNA containing a 15bp cACO and a 17bp PBS.
- the clkDNA also contained or lacked a PCV2 recognition sequence.
- ACO writing percentage, measured by NGS, is shown for various lengths of ACO installed where 9, 11, and 13bp represent truncations and 15bp represents full ACO addition.
- FIG. 12 CAGE reactions on gDNA substrate with and without a tethering domain.
- ACO writing percentage, measured by NGS, is shown - accounting for reads containing the 15bp ACO with no additional PCV2 sequence installation and reads with the ACO and 4, 9, or 13 bp of the PCV2 sequence.
- Targeted nucleic acid enrichment methods enable users to selectively sequence only regions of interest, reducing cost and labor while increasing throughput, sequence coverage, and resolution.
- PCR-based amplification is employed for target enrichment.
- PCR amplification introduces noise from low-level polymerase error and bias, particularly on repetitive/complex templates or for applications that have varying template sizes).
- PCR-based methods may be unfeasible for certain template compositions, and they eliminate native base modifications that are crucial to certain applications.
- Other methods for amplification- free target enrichment typically include the use of enzymes to selectively modify or cleave the target region 5 7 (e.g., whole-sample dephosphorylation followed by targetspecific exposure of 5 ’-phosphates following enzymatic cleavage by a restriction enzyme of CRISPR-Cas nuclease 7 ; Fig. 1C).
- enzymes to selectively modify or cleave the target region 5 7 e.g., whole-sample dephosphorylation followed by targetspecific exposure of 5 ’-phosphates following enzymatic cleavage by a restriction enzyme of CRISPR-Cas nuclease 7 ; Fig. 1C.
- these methods often lack adequate target-molecule selectivity, require protracted protocols, exhibit inadequate background reduction (e.g.
- an enrichment workflow to capture only the intended nucleic acid regions prior to sequencing would have the following properties; it would: be simple, require minimal user intervention, contain no amplification steps, be compatible with a range of nucleic acid inputs, and facilitate near- 100% target molecule enrichment prior to sequencing (e.g. via selective adaptor ligation to target fragment(s)).
- a key premise of these approaches is to selectively cleave the substrate nucleic acid molecule(s) of interest followed by targeted polymerization of an adaptor-complementary overhang (ACO) sequence onto this sequence to create a modified nucleic acid; the sequence complementary to the ACO (cACO) is encoded on a click DNA (clkDNA) oligonucleotide template (Fig. ID).
- ACO adaptor-complementary overhang
- the sequence of the ACO is user-specifiable and can be comprised of a variety of different lengths and sequence compositions, e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nt and up to 10, 15, 20, 25, 30, 40, 50, 75, or 100 or more nt long, and ranges having any of the foregoing values as end points.
- the modified nucleic acid(s) or regions-of-interest (ROIs) comprising the ACO can then be subjected to downstream reactions that generate sequencing-competent product nucleic acid molecules; for instance, by ligation of a platform-specific sequencing adaptor where the adaptor has an overhang complementary to the ACO (e.g., for Illumina short-read or long-read sequencing, Oxford Nanopore Technologies (ONT) long-read sequencing, or Pacific Biosciences (PacBio) long-read sequencing).
- a platform-specific sequencing adaptor where the adaptor has an overhang complementary to the ACO (e.g., for Illumina short-read or long-read sequencing, Oxford Nanopore Technologies (ONT) long-read sequencing, or Pacific Biosciences (PacBio) long-read sequencing).
- ONT Oxford Nanopore Technologies
- PacBio Pacific Biosciences
- RNA-dependent polymerase a DNA or RNA-dependent polymerase to install a DNA or RNA 3’ overhang onto a target DNA or RNA molecule, referred to herein as CAGE and CAPTURE, using various polymerases.
- the methods can include installing a 3’ ACO onto a substrate nucleic acid where the ACO has complementarity to a sequencing adaptor, or ligating an oligo with a click chemistry moiety and using an adaptor with a compatible click chemistry moiety to attach an adaptor.
- Any ACO sequence or adaptor sequence can be used that is compatible with any sequencing platform.
- the adaptors can be modified with a moiety that can enable pull down of adaptor-ligated molecules from non-adaptor-ligated molecules (e.g., biotin on adaptor, then use streptavidin beads in final purification step); alternatively, oligoconjugated beads can be used to pull down product nucleic acids with an ACO installed.
- non-adaptor-ligated molecules e.g., biotin on adaptor, then use streptavidin beads in final purification step
- oligoconjugated beads can be used to pull down product nucleic acids with an ACO installed.
- CEs click editors
- exemplary polymerase click editors couple HUH endonucleases with RNA-guided DNA binding domains and DNA polymerases (where the DNA polymerase is either fused or supplied in trans), that be optionally directed to a target site via a guide RNA (gRNA) (Fig. 2A).
- gRNA guide RNA
- the PCE-gRNA complex nicks the nontarget DNA strand (NTS) of a target site (within a chromosome/genome), exposing the NTS DNA ‘flap’ for modification.
- NTS nontarget DNA strand
- a click DNA (clkDNA) template is provided in trans; the clkDNA template encodes an HUH endonuclease recognition site (e.g., a PCV2 recognition site), a polymerization template (PT), and a NTS flap binding region (FBR); Fig. 2A.
- the HUH enzyme of the PCE cleaves and covalently binds to the clkDNA, permitting local recruitment and annealing of the FBR of the clkDNA with the nicked NTS of the target site, acting as a DNA:DNA duplex to recruit the DNA-dependent DNA polymerase (which can be PCE-fused or unfused) to initiate extension of the NTS using the PT portion of the clkDNA as a template.
- This process enables the selective writing of custom 3’ flaps into regions of genomic DNA in human cells, which through subsequent DNA repair and removal of the native 5’ flap leads to permanent installation of clkDNA-encoded sequence alterations.
- Flaps are the strands exposed after the DNA is cut by the nuclease; for Cas9-gRNA bound target DNAs, the 3’ flaps on the non-target strand are extended by the polymerase, so 3’ flaps are the new sequence written in from the polymerase domain of the PCE.
- PCEs can be deployed for a variety of uses in vitro.
- PCEs that are comprised of nucleases instead of nickases could enable simultaneous substrate cleavage and ‘tagging’ of user-specifiable sequences.
- the polymerase can be decoupled/unfused from the HUH-Cas9 fusion.
- the general principles of this approach are: (1) user- specifiable sequence-specific nucleic acid cleavage (e.g., via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA), (2) direct writing of customizable adaptor-complementary 3’ ‘overhang’ (ACO) sequences onto the substrate nucleic acids of interest (e.g., in a DNA version of this method via a DNA- dependent DNA polymerase of the PCE using the clkDNA as a template), where the sequence of the ACO can be of various lengths and sequence compositions, and (3) ligation of custom sequencing adaptors onto the nascent custom 3’ overhang on the modified nucleic acid (Fig.
- CAGE Click-Assisted Genome Enrichment
- nt single nucleotide
- CAGE can be used to install an overhang on a single-end of a nucleic acid (using one gRNA), or onto both ends (using two or more gRNA). While theCAGE method is applicable to DNA substrates (Figs. 10A and 10B), the Click-Assisted Precise Targeting of Unaltered RNAfor Enrichment (CAPTURE) method is applicable to RNA substrates (Figs. IOC and 10D)
- sample multiplexing using barcodes can dramatically reduce sequencing costs by permitting the pooling of orthogonally barcoded samples into a single sequencing run.
- sample multiplexing should be permitted by using adaptors that encode barcodes (as is done by various sequencing platforms including Oxford Nanopore Technologies (ONT), Pacific Biosciences (PacBio), Illumina, etc.).
- CAGE can be used, e.g., for single-end labeling of DNA substrates, or dualend labeling of DNA substrates;
- CAPTURE can be used for single- or dual -end labeling of RNA substrates.
- Single-end labeling of DNA might be preferable for quantifying or sequencing substrates with known or ambiguous ends (e.g., near the end of chromosomes or DNA fragments, for unidirectional sequencing near CRISPR- Cas genome editing target sites, sequencing from ‘bait’ sites to identify genome-scale changes, or for unbiased assessment of large sequence DNA integration or inversion events).
- Dual-end labeling to enrich specific sequences from certain larger DNA substrates (e.g., chromosomes) with known DNA sequences may be preferable to improve sequencing efficiency and termination.
- a single adaptor molecule can be used that harbors a 3’ overhang (complementary to the ACO) and the remainder of the double-stranded region appropriate to the sequencing-platform.
- This single adaptor can be ligated to the ACO of the target site(s) in a single step (Figs. 2B and 10A).
- a two-step reaction is used.
- a single stranded oligonucleotide that is complementary to the ACO and that contains a 5’ modification compatible with click chemistry is annealed and ligated to the ACO/target modified nucleic acid molecule.
- This further modified nucleic acid is then combined in a reaction with a second adaptor containing the appropriate modification to enable click chemistry with the initial ligated oligo, facilitating the rapid attachment of adaptors to the target sequence(s) resulting in a product nucleic acid (Figs. 2C and 10B).
- a second adaptor containing the appropriate modification to enable click chemistry with the initial ligated oligo, facilitating the rapid attachment of adaptors to the target sequence(s) resulting in a product nucleic acid (Figs. 2C and 10B).
- An example of this type of adaptor is Oxford Nanopore Technologies’ “Rapid Adaptor”.
- the ability to polymerize a DNA sticky-end overhang onto target DNA or RNA sequences represents a powerful approach to enable targeted, amplification-free enrichment of nucleic acid sequences in vitro for various sequencing applications and beyond.
- CAGE offers major advantages compared to current amplification-free targeted enrichment approaches by facilitating highly specific adaptor ligation through overhang-adaptor complementarity instead of phosphorylation states 7 , reducing background in sequencing runs to enable much higher coverage and resolution at significantly lower costs.
- CAPTURE for targeted, amplification-free enrichment of RNA sequences.
- CAGE and CAPTURE are wide-ranging across basic biology - genomics, epigenomics, transcriptomics, epitranscriptomics, diagnostics, microbial/microbiome studies, etc. Additionally, CAGE can facilitate a more thorough investigation of intended and unintended genomic or epigenetic alterations by genome editing technologies at endogenous loci, particularly as new genome editing technologies capable of larger sequence edits continue to emerge (e.g. insertions, deletions, inversions, etc.). CAPTURE could also be used to profile the potential of RNA editing technologies to edit the transcriptome or alter RNA modifications.
- CAGE enables at or near 100% enrichment (percent of sequencing reads that are expected to be attributable to the desired target region) of target sequences because this method relies on highly specific adaptor ligation, and only adaptor-ligated product nucleic acids are sequenced (whereas other Cas9-based approaches achieve only 0.5-5% enrichment due to inherent non-specific adaptor ligation to non-target molecules 5 7 ). Even with 10-30% ligation efficiency of the adaptor to the ACO-extended region-of-interest, only the adaptor-ligated molecules should be sequenced (resulting in at or near 100% enrichment). Notably, adaptor ligation efficiency dictates required DNA input amount and does not determine enrichment.
- the adaptor itself can be customized to be compatible with a broad range of sequencing platforms or for other uses, as well as modified to facilitate further physical isolation of adaptor-ligated products from non- adaptor-ligated molecules (e.g. adaptors containing a functional group like a biotin moiety, that would permit adaptor-ligated molecules to be isolated or pulled down from the population of molecules using streptavidin beads).
- SSB single-strand binding protein
- the enzymes or enzyme fusions used in CAGE and CAPTURE reactions can comprise a variety of architectures, including a domain for tethering (T) the template (also referred to herein as clkDNA tethering domain), a DNA binding domain nuclease (DBDn) for cleaving the substrate, and a polymerase (P) for extending the template, separated by optional linkers (L).
- T domain for tethering
- DBDn DNA binding domain nuclease
- P polymerase
- T-L- DBDn +P trans architecture
- DBDn DNA binding domain nuclease
- Examples of alternate architectures beyond T-L-DBDn+P include but are not limited to: DBDn-L-T+P, T-L-DBDn-L-P, T-L-P-L-DBDn, DBDn-L-T-L-P, DBDn-L-P-L-T, P-L-DBDn-L-T, or P-L-T-L- DBDn (where DBDn is the nuclease, T is the tethering domain, P is the polymerase, and L is a linker; each of varying compositions).
- the nuclease domain (DBDn) may be SpCas9 or other nucleases; the tethering domain (T) may be an HUH enzyme (e.g. PCV2 or others) or other tethering domains; the polymerase domain (P) may be E. coli Klenow fragment or other polymerases; and the linkers (L) may be of various lengths and amino acid compositions.
- DBDn The nuclease domain
- T may be SpCas9 or other nucleases
- the tethering domain (T) may be an HUH enzyme (e.g. PCV2 or others) or other tethering domains
- the polymerase domain (P) may be E. coli Klenow fragment or other polymerases
- the linkers (L) may be of various lengths and amino acid compositions.
- the DNA polymerase can be either fused or unfused from the DBDn-T complex; in addition, the T (tethering domain) can also be recruited or linked to the DBDn via other methods instead of direct fusion, including but not limited to recruitment through the gRNA; if any component is unfused, that component can optionally be recruited or linked to the target site through a protein recruitment domain, e.g., phage coat proteins (CP) (coupled with sgRNAs encoding the corresponding RNA recognition hairpin recognized by a given coat protein).
- CP phage coat proteins
- one method to selectively recruit and link proteins or domains to specific target sites is to fuse or recruit the effector protein-of-interest (e.g., polymerase and/or HUH endonuclease or other tethering domain) to a recruitment domain (e.g., an RNA binding protein, e.g., MCP, PCP, or Com RNA binding protein 41,42 ), e.g., the MS2-coat protein (MCP)) that then interacts with a specific hairpin sequence encoded within that gRNA (e.g., viral RNA sequences MS2, PP7, and com, e.g., an MS2 hairpin).
- a recruitment domain e.g., an RNA binding protein, e.g., MCP, PCP, or Com RNA binding protein 41,42
- MCP MS2-coat protein
- a specific hairpin sequence encoded within that gRNA e.g., viral RNA sequences MS2, PP7, and com, e.g.
- protein complexes can be formed using a protein recruitment domain coiled-coil (CC) protein domains, leucine zippers (LZs), or SunTags, that permit protein: protein interactions (among other types of protein recruitment strategies) can be used.
- Coiled-coil domains are known in the art, see, e.g., Woolfson, Adv Protein Chem. 2005;70:79-112 (design of coiled-coil structures and assemblies); Grigoryan and Keating, Curr Opin Struct Biol. 2008 Aug;18(4):477- 83 (structural specificity in coiled-coil interactions); Reinke et al., Am. Chem. Soc.
- coiled-coil interactome heterospecific modules for molecular engineering
- Ljubetic et al. Nature Biotechnology 35: 1094-1101 (2017)(coiled-coil protein-origami cages that self-assemble in vitro and in vivo); Fink et al., Nature Chemical Biology 15: 115-122 (2019)(orthogonal CC dimerizing domains); Lebar et al., Nature Chemical Biology 16:513-519 (2020) (orthogonal coiled-coil domains); Plaper et al., Scientific Reports 11 : 9136 (2021)(coiled-coil heterodimers); and Lainscek et al., Nature Communications 13:3604 (2022)(coiled- coil heterodimer-based recruitment of an exonuclease to CRISPR/Cas).
- Exemplary coiled-coil sequences include the following:
- Exemplary combinations of CC domains include P1 :P2; P3:P4; P3:P4S; P3S:P4; P3S:P4S; P5:P6; P7:P8; P9:P10; P1 EP12; P3:P4; N5:N6; P3:AP4; and P3S:P4S.
- Leucine zippers are also known in the art. See, e.g., Amoutzias et al., Trends Biochem Sci. 2008 May;33(5):220-9; Bader and Vogt, (2006). Leucine Zipper Transcription Factors: bZIP Proteins. In: Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine. Springer, Berlin, Heidelberg, doi.org/10.1007/3- 540-29623-9 2180; and Busch and Sassone-Corsi, Trends Genet. 1990 Feb;6(2):36- 40 (see, e.g., exemplary LZ domain sequences in Fig. 1 of this paper; examples include: GCN4, yAP-1, C/EBP, CREB, CRE-BP1, c-Jun, JunB, JunD, FosB, Fra-1, and c-Fos).
- SunTags are described in Tanenbaum et al., Cell. 2014 Oct 23; 159(3):635-46.
- Exemplary sequences include: GCN4: LLPKNYHLENEVARLKKLVGER; GCN4 variant: EELLSKNYHLENEVARLKK; and ScFv-GCN4: GPDIVMTQSPSSLSASVGDRVTITCRSSTGAVTTSNYASWVQEKPGKLFKGLI GGTNNRAPGVPSRFSGSLIGDKATLTISSLQPEDFATYFCALWYSNHWVFGQ GTKVELKRGGGGSGGGGSGGGGSSGGGSEVKLLESGGGLVQPGGSLKLSCA VSGFSLTDYGVNWVRQAPGRGLEWIGVIWGDGITDYNSALKDRFIISKDNGK NT VYLQMSKVRSDDT AL YYC VTGLFD YWGQGTL VT VS S .
- the clkDNA tethering domain can include an HUH endonuclease (see, e.g., Table A) or a different ssDNA localization moiety, such as avidin (when the clkDNA is labeled with biotin), SNAP -tag (when the clkDNA is labeled with benzylguanine derivatives), CLIP -tag (when the clkDNA is labeled with benzylcytosine derivatives) (or other O 6 -alkylguanine-DNA-alkyltransferase derivatives) and HALO-tag or other haloalkane dehalogenase derivatives (when the clkDNA is labeled with a chloroalkane) (Table B).
- avidin when the clkDNA is labeled with biotin
- SNAP -tag when the clkDNA is labeled with benzylguanine derivatives
- HUH endonucleases are preferred as they do not require specialized and expensive chemical modifications on the clkDNA and have high efficient covalent binding to their ssDNA substrate.
- Exemplary HUH endonucleases include PCV2 HUH domain; DCV HUH domain; FBNYV HUH domain; RepBm HUH domain; Tral relaxase domain; dPCV2 (Y96F) HUH domain, MSMV HUH domain, TGMV HUH domain, ChiSCV-GT306, ChiSCV-GM510, ChiSCV-GM415, and other HUH domains described in Li, L. et al 15 .
- the CEs described herein optionally include an RNA-programmable DNA nickase that nicks the NTS, or a nuclease, including nickases from Cas-family enzymes (e.g., Cas9 or Casl2), TnpB-family, or IscB-family enzymes (Table C). See, e.g., Kapitonov et al., J Bacteriol. 2016 Mar 1; 198(5): 797-807; Karvelis et al., Nature. 2021; 599(7886): 692-696 (TnpB); Koonin and Makarova, PLoS Biol.
- Cas-family enzymes e.g., Cas9 or Casl2
- TnpB-family e.g., TnpB-family
- IscB-family enzymes Table C. See, e.g., Kapitonov et al., J Bacteriol. 2016 Mar 1; 198(5):
- nickases can be generated from wild type RNA-programmable DNA nucleases by the introduction of a mutation of a catalytic RuvC-II residue or a mutation of a catalytic HNH residue (Table C).
- A. warmingii IscB nickases can include an H212A or E157A mutation; IscB nickases from other species can include corresponding mutations; see, e.g., WO 2022/087494.
- the nickase can also include one or more mutations that increase activity, reduce off-target effects, and/or alter protospacer adjacent motif (PAM) or target adjacent motif (TAM) specificity (Tables D and E).
- Exemplary Cas9 and Casl2 nickases and mutations are shown in Tables C-E.
- Table C List of Exemplary Cas9, Caslla, and IscB Orthologs (see WO2018218166 for references)
- the RuvC domain nicks the non-target strand (NTS) DNA and the HNH domain nicks the target strand (TS) DNA.
- NTS non-target strand
- TS target strand
- the RuvC domain nicks both DNA strands. Mutations abrogate activity.
- sequence of ogeuIscB is as follows (from metagenome genome assembly, contig: NODE_25_length_150080_cov_8.882980; contig accession: OGEU01000025.1):
- Table D List of Exemplary High Fidelity and/or PAM-relaxed RGN Orthologs
- the methods and compositions described herein further include effector proteins that have DNA polymerase activity, e.g., DNA-dependent DNA polymerases, e.g., of family A, B, C, D, X, or Y, or RNA-dependent DNA polymerases (also known as reverse transcriptases) (for polymerase click editors (PCEs)).
- DNA-dependent DNA polymerases e.g., of family A, B, C, D, X, or Y
- RNA-dependent DNA polymerases also known as reverse transcriptases
- exemplary polymerases include E.
- coli Klenow (EcKlenow, optionally with the D355A and/or E357A mutations that deactivate its 3’-5’ exonuclease domain); Taq Stoffel; Pol-Beta; Pol -Beta + Sso7d; Phi29 DNA Polymerase (D169A); Sequenase; T4 DNA Polymerase, and E. coli dKlenow (optionally with the D355A, E357A, D705A, and/or D882A mutations).
- Exemplary reverse transcriptases are described in more detail below.
- a ligase is used; exemplary DNA Ligases include T3 DNA ligase; T4 DNA ligase; T7 DNA ligase; ChlV (SplintR) DNA ligase; PhiKMV DNA ligase; Vaccinia DNA ligase; E. coli DNA ligase, Taq DNA ligase; 9°C DNA ligase; or Hi-T4 DNA Ligase.
- Methods that use a clkDNA would need a DNA-dependent DNA polymerase, and methods that use a pegRNA would need an RNA-dependent DNA polymerase.
- villin headpiece, supercharged villin headpiece, Sso7d, NeqSSB, etc. fused to the N- or C- terminus of DNA polymerases have been shown to increase the DNA affinity, stability, and processivity of the polymerase 21 23 .
- Exchange of the 3’-5’ exonuclease domain of TaqStoffel with that of EcKlenow may also endow TaqStoffel with proofreading capability, as has been done previously 24 .
- the present compositions and methods can use a DNA polymerase, such as EcKlenow or TaqStoffel, which include one or more of these modifications.
- RTs Reverse Transcriptases
- Reduced Size RTs Reduced Size RTs
- Variant RTs Variant RTs
- compositions and methods can use any RT, including Group II introns.
- Group II introns are retroelements that consist of a self-splicing ribozyme and an intron encoded protein (IEP) which functions as a reverse transcriptase (RT), DNA endonuclease, and RNA maturase.
- IEP intron encoded protein
- the pentamutant Moloney Murine Leukemia Virus reverse transcriptase can be used.
- the group II intron RT (commercially available as “MarathonRT”) from Eubacterium rectale (E.r.) has been shown to display superior intrinsic RT processivity compared to Superscript IV.
- MarathonRT commercially available as “MarathonRT” from Eubacterium rectale (E.r.) has been shown to display superior intrinsic RT processivity compared to Superscript IV.
- substitution of the M-MLV RT in a PE with MarathonRT or other RTs resulted in efficient prime editing in the HEK293T cell line.
- Additonal Exemplary alternative RTs include those listed in Table F, below. Table F: Alternative reverse transcriptases
- GsI-IIC intron RT denoted GsI-IIC RT; sold commercially as TGIRT-III; InGex); see Stamos et al., Mol Cell. 2017 Dec 7;68(5):926-939.e4.
- RT sequences include:
- Eubacterium rectale RT (aka Marathon-RT; WT) SEQ ID NO: xx
- HERV-Kcon Human endogenous retrovirus K consensus (HERV-Kcon) RT SEQ ID NO: xx
- Geobacillus stearothermophilus GsI-IIC intron RT (GsI-IIC RT) pentamutants can also be used, e.g., comprising mutations DI 1R/N23R/G71R/G113K/P194R (positions bolded in SEQ ID NO:xxx, above.
- Exemplary MMLV RT sequences include the following:
- compositions and methods can make use of variants as known in the art and as provided herein, e.g., MarathonRT, GsI-IIC RT, and MMLV-RT variants, e.g., PE2 MMLV RT (with D200N, T306K, W313F, T330P, L603W mutations), or MMLV or PE2 MMLV RT truncations (truncations 2, 5, and 6;
- MMLV-RT variants e.g., PE2 MMLV RT (with D200N, T306K, W313F, T330P, L603W mutations), or MMLV or PE2 MMLV RT truncations (truncations 2, 5, and 6;
- Additional effectors can be included in the present proteins and compositions.
- the PCEs, LCEs, or dual-overhang ligation approaches can be used to install a recombinase attachment site (att) at a desired position in the genome.
- Serine recombinases either fused to a PCE/LCE or expressed in trans, integrate a DNA donor containing a corresponding recombinase attachment site and a cargo of interest at the targeted location.
- Serine recombinases can include Bxbl, PaOl, BceINT, etc (including those discovered from metagenomic mining efforts as described in Ref 25 ).
- the clkDNA templates used in the present compositions and methods include (i) a localization moiety, (ii) a polymerization template (PT), and (iii) a flap binding region (FBR).
- the clkDNA templates are in the order (i)-(ii)- (iii) from 5’ to 3’, but other configurations are possible (e.g. (ii)-(iii)-(i), e.g., wherein the clkDNA has a 3’ moiety (e.g., chloroalkane, etc combined with a SNAP tag) rather than an HUH).
- the localization moiety is a sequence or modification that binds or links to the tethering domain on the DBDn, e.g., an HUH endonuclease recognition site (when the CE includes an HUH), biotin (when the CE includes avidin), label with O 6 - benzylguanine derivatives (when the CE includes SNAP), label with O 1 - benzylcytosine derivatives (when the CE includes CLIP -tag), and labeled with a chloroalkane (when the CE includes HALO-tag).
- RNA or DNA hairpins can also be used to localize effectors (when the CE includes an RNA or DNA binding protein, such as a phage coat protein like MCP, PCP, BoxB, or Com).
- the polymerization template (PT) for use with PCEs includes a portion that encodes homology to the target genome, e.g., at least 3, 4, 5, 6, 7, 8, 9, or 10 nt long, and optionally up to 50, 100, 200, 250, or 500 nt long, and a portion that includes the edit that is at least 1 nt long.
- the flap binding region is complementary or partly complementary to the genomic flap released by the nickase (and thus in some embodiments the FBR is complementary or partly complementary to part of the gRNA protospacer sequence).
- the length of the genomic flap is the distance between the DNA nick on the NTS and equivalent NTS position that is analogous to the end of the TS/gRNA spacer, which will often be about 15-20, e.g., 17, nt but it can be target specific.
- the flap can be shorter (e.g., in the case of truncated gRNAs (Fu et al., Nat Biotechnol. 2014 Mar; 32(3): 279-284) if the gRNA spacer region is shorter.
- the flap can be longer, though such arrangements may be thermodynamically less favorable, if the TS/NTS is unpaired outside of the gRNA spacer/TS region.
- more than two clkDNAs and gRNAs can be used when doing multiplex labeling of multiple fragments (either single or dual end).
- the labeling can be done all in a single reaction, with different ACOs/adapters depending on the fragment being labeled. Each ACO will have a corresponding adapter sequence.
- the sequence of a protein or nucleic acid used in a composition or method described herein is at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to a reference sequence set forth herein.
- the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes).
- the length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%.
- amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared.
- a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid “identity” is equivalent to amino acid or nucleic acid “homology”).
- the percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
- the comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm.
- the percent identity between two amino acid sequences can be determined using the Needleman and Wunsch ((1970) J. Mol. Biol. 48:444-453) algorithm that has been incorporated into the GAP program in the GCG software package (available on the world wide web at gcg.com), using the default parameters, e.g., a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
- PCE polymerase click editor
- PCV2-Cas9 was overexpressed in BL21(DE3) cells by IPTG induction. Protein was purified in an adapted protocol from Anders and Jinek 8 . Cells were pelleted and lysed in 50 mM HEPES, 200 mM NaCl, 20% (w/v) sucrose, 15 mM imidazole, pH 7.4, and eluted from a 5 ml EconoFit Ni-charged IMAC column (BioRad) using a gradient from 15-500 mM imidazole followed by overnight incubation with SUMO protease Ulpl at 4 °C, while dialyzing into 50 mM HEPES, 150 mM KC1, 5% glycerol, 1 mM DTT, 1 mM EDTA, pH 7.5.
- Cation exchange was performed using a 5 ml HiTrap SP HP column (GE Healthcare), eluting using a gradient from 100 mM to 1 M KC1. Protein was concentrated in a 100 kDa MWCO spin concentrator (Amicon), frozen in liquid nitrogen and stored at -80 °C. For experiments, a 60 pM of PCV2-Cas9 in 50% glycerol was used and stored at -20 °C.
- the sequence of the target DNA molecule (PCR substrate), which contains a chosen sgRNA sequence, is provided below.
- the 5-step, single-pot protocol proceeded as follows: (1) RNPs were formed using 100 pmol (1.1 pM final) sgRNA, IX Cutsmart Buffer (New England Biolabs; NEB), and 48 pmol PCV2-Cas9 protein (0.53 pM final) and were incubated at room temperature for 20 min. 100 pmol of clkDNA encoding an ACO complementary sequence (cACO) was then added, and the mixture was incubated an additional 10 min at room temperature. (2) The CAGE reaction mix was assembled using 5 pmol of PCV2-Cas9 RNPs, 10 U of E.
- Ligation efficiency was analyzed by several methods, including junction PCR (containing ⁇ 15 ng of template, 0.5 pM primers (Table 1), 0.4 U Q5 High-Fidelity DNA Polymerase (NEB), 0.5 pM dNTPs, that together were thermocycled at: 1 cycle, 98 °C for 3 min; 35 cycles, 98 °C for 10 sec, 66 °C for 15 sec, 72 °C for 20 sec; and hold at 4 °C), Sanger sequencing (sending approximately 100 ng PCR for sequencing), and ddPCR reactions (described below).
- junction PCR containing ⁇ 15 ng of template, 0.5 pM primers (Table 1), 0.4 U Q5 High-Fidelity DNA Polymerase (NEB), 0.5 pM dNTPs, that together were thermocycled at: 1 cycle, 98 °C for 3 min; 35 cycles, 98 °C for 10 sec, 66 °C for 15 sec, 72 °C for 20 sec; and
- PE-CAGE reactions were performed using in vitro transcribed pegRNAs (transcribed using T7 RiboMax Express Large Scale RNA Production System; Promega) generated from a PCR templates that included a T7 promoter, an appropriate gRNA spacer and scaffold, and a 3’ extension including the primer binding site (PBS) and reverse transcriptase template (RTT) encoding the ACO to be installed at HEK site 3.
- pegRNAs transcribed using T7 RiboMax Express Large Scale RNA Production System; Promega
- PBS primer binding site
- RTT reverse transcriptase template
- Ligation efficiency was assessed via ddPCR reactions containing 100-200 ng of human genomic DNA (for gDNA targeting experiments) or 10 pg of PCR target (for PCR-substrate targeting experiments), ddPCR Supermix for Probes (Bio-Rad), Hindlll-HF (0.25 U pl -1 , New England Biolabs), RPP30 (for gDNA experiments) or PCR-substrate (for PCR-substrate experiments) control primers and probes (Table 1; 900 nM each primer, 250 nM probe), and target/adaptor specific primers and probes (Table 1 ; 900 nM each primer, 250 nM probe), according to the manufacturer’s protocol.
- Droplets were generated using a QX200 Automated Droplet Generator (BioRad). Thermal cycling conditions were: 1 cycle, 95 °C for 10 min; 40 cycles, 94 °C for 30 sec, 58 °C for 1 min; 1 cycle, 98 °C for 10 min; and hold at 4 °C.
- PCR products were analyzed using a QX200 Droplet Reader (BioRad) and the number of “adaptor-ligated” target copies and “total” copies (defined by RPP30 copies) was calculated using QuantaSoft (v.1.7.4).
- Adaptor ligation efficiency was defined as the ratio of adaptor-ligated target copies to total copies.
- HUH-Cas9 fusion protein (comprised of the PCV2 HUH N-terminally fused to nuclease Streptococcus pyogenes Cas9 (SpCas9) via an 24aa gsXTENgs linker), purchased E.
- coli DNA Polymerase I Klenow Fragment (exo-) (EcKlenow) (New England Biolabs), designed a gRNA to target a specific site on a PCR substrate, and designed and ordered a clkDNA containing an appropriate FBR and a 20 nt PT (Fig. 4A).
- Successful polymerization by the PCE using the clkDNA in vitro should lead to the writing of a specific 20 nt 3’ overhang onto the NTS of the PCR-based DNA substrate at the SpCas9 cleavage site encoded on the target substrate (Fig. 4A).
- a mock adaptor that encodes a 20 nt sequence complementary to the ACO sequence on the target substrate.
- thermostable ligases could enable the elimination of steps involving the manual addition of ligase and adaptor after Cas9 heat inactivation.
- all reagents PCV2-Cas9 RNPs, EcKlenow, Ligase, adaptor, and buffer
- PCV2-Cas9 RNPs, EcKlenow, Ligase, adaptor, and buffer could be mixed in a single tube and run on a thermocycler with an appropriate protocol (e.g. click editing at 37 °C for 30-120 min, followed by an elevated temperature at ⁇ 65 °C for 30-60 min).
- the incubation step at 65 °C should simultaneously inactivate/dissociate Cas9 while activating the ligase to complete the CAGE reaction.
- Example 3 Other compositions of CAGE using prime editors instead of click editors
- PEs prime editors
- PCEs in PE-CAGE reactions.
- PEs typically consist of a Cas9 nickase fused to or co-supplied with a RNA-dependent polymerase (e.g. reverse transcriptase (RT)) and an extended prime editing gRNA (pegRNA) that encodes a primer binding site (PBS) that anneals to the nicked NTS and a reverse transcription template (RTT) encoding the edit of interest 12 (Fig. 6A).
- RT reverse transcriptase
- pegRNA extended prime editing gRNA
- PBS primer binding site
- RTT reverse transcription template
- the PE-extended 3’ flap from the NTS would create an ACO, similar to how the PCE creates a 3’ ACO from the clkDNA.
- a PE-CAGE approach should be compatible with both modes of adaptor ligation described above for PCE-CAGE (Figs. 6B, 6C, 10A, and 10B).
- RT -mediated template extension past the RTT region of the pegRNA could lead to writing of nucleotides of the gRNA scaffold into the ACO (which could inhibit adaptor ligation).
- PE-CAGE may hold advantages for applications that require extensive and simultaneous multiplexed enrichment within a given sample since the ACO template is inherently coupled to the gRNA in the pegRNA (compared to the challenge of complexing gRNA-clkDNA pairs together separately in arrayed reactions as would be needed for PCE-based approaches.
- PE RNPs can be formed with many pegRNAs in a single tube, whereas PCE RNPs must be complexed with clkDNAs separately to ensure correct linkage between target and ACO-writing template.
- PE-CAGE can be performed using an RNA-dependent RNA polymerase (RdRP), whereby an RNA ACO instead of a DNA ACO would be installed.
- RdRP RNA-dependent RNA polymerase
- a suitable ligase that ligates a 5’ DNA to a 3’ RNA on a DNA splint can be used (e.g., T4 RNA Ligase), leading to adaptor-ligated target products.
- PE-CAGE should enable at or near 100% enrichment, and improvements in efficiency (e.g. altering the sequence composition and length of the pegRNA RTT) could facilitate PE-CAGE enrichment from low-input samples.
- PE-CAGE has potential advantages over CAGE for applications requiring highly multiplexed targeting within a given sample.
- extraneous ACO extension in PE-CAGE could be minimized by: (1) screening RNA hairpins that could be inserted into the pegRNA scafffold at the end of the RTT, which might terminate reverse transcription, (2) using a pool of adaptors containing overhangs which account for scaffold read-through, (3) by using synthetic modified pegRNAs that inhibit extension past the RTT (e.g. via use of a chemical linker or abasic site between the RTT and sgRNA scaffold, etc.), or (4) by using split pegRNAs where the RTT/PBS molecule is separated from the gRNA.
- the enzymes or enzyme fusions used in PE-CAGE reactions may also comprise a variety of architectures, including a nuclease (N) for cleaving the substrate and an RNA-dependent polymerase (P) (e.g. a reverse-transcriptase; RT) for extending the template, separated by optional linkers (L).
- N nuclease
- P RNA-dependent polymerase
- RT reverse-transcriptase
- L optional linkers
- N+P nuclease and polymerase architecture
- RNA primer that though annealing with the FBR of the clkDNA would create an RNA:DNA substrate for extension by a DNA-dependent polymerase (e.g. EcKlenow) (Figs. 8B, 8C, IOC, and 10D).
- a DNA-dependent polymerase e.g. EcKlenow
- Dissociation of the PCV2-Casl3 RNP from the target RNA would leave a hybrid target RNA:DNA molecule tagged on the 3’ end with a DNA ACO whose sequence is defined by the PT of the clkDNA.
- this DNA ACO could then be used as a sequence for annealing and ligating specific adaptor.
- the adaptor ligated and samples can then be further processed or directly loaded onto a sequencing platform (e.g. a nanopore sequencer).
- CAPTURE could also conceivably be performed with a prime editing like system (PE-CAPTURE), where Cast 3 nuclease can be fused to or co-supplied with a reverse transcriptase (or RdRP) and the Cast 3 gRNA can be extended at the 3’ or 5’ end to include a PBS and an RTT containing the overhang sequence (Figs. 9A-9C, IOC, and 10D)
- PE-CAPTURE prime editing like system
- Example 5 CAGE using Cas9 nuclease and no ssDNA tethering domain.
- a tethering domain such as PCV2
- CAGE reactions were conducted CAGE reactions on a PCR substrate with varying amounts of clkDNA containing or lacking a PCV2 recognition sequence and with PCV2-Cas9 or Cas9 alone (with no tethering domain). Reactions (containing cleaved PCR products with written flaps or cleaved products without flaps) were dG-tailed and amplified using a poly-C reverse primer and a junction specific forward primer.
- PCV2-Cas9 While the combination of PCV2-Cas9 and a clkDNA containing the PCV2 recognition sequence was generally most efficient (measured by ddPCR), the inclusion of the PCV2 sequence alone was generally beneficial and CAGE could be conducted with Cas9 alone (no PCV2 endonuclease) (Figs. 11A-B). Similar trends were also observed on a gDNA substrate at the HEK3 locus.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Biomedical Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biophysics (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- General Chemical & Material Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Described are methods for polymerizing nucleic acid overhangs onto target DNA or RNA molecules via DNA-dependent or RNA-dependent polymerization.
Description
METHODS FOR NUCLEIC ACID LABELING
CLAIM OF PRIORITY
This application claims the benefit of U.S. Provisional Patent Application Serial Nos. 63/458,519, filed on April 11, 2023, and 63/590,283, filed on October 13, 2023. The entire contents of the foregoing are hereby incorporated by reference.
FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with Government support under Grant Nos. HL 142494 and CA281401 awarded by the National Institutes of Health. The Government has certain rights in the invention.
TECHNICAL FIELD
Described are methods for polymerizing nucleic acid overhangs onto target DNA or RNA molecules via DNA-dependent or RNA-dependent polymerization.
BACKGROUND
The continued development of nucleic acid sequencing technologies has advanced various fields including genetics, functional genomics, genome editing, transcriptomics, and a range of other basic and medical science disciplines. Despite a dramatic reduction in the per-base cost of sequencing, some individual processes related to the use of these technologies remain inefficient and expensive. For instance, to prepare libraries of nucleic acid samples for sequencing, many approaches require the selective enrichment of target regions of interest via PCR-based amplification (to label the molecules of interest to interact with the sequencing platform and reduce background sequencing; Fig. 1 A)1 4, or without enrichment they rely on ultra-deep whole-genome or -transcriptome sequencing of the entire sample to achieve sufficient coverage of the region of interest (Fig. IB). Both methods have caveats, as they can obscure important aspects of the sample (due to PCR bias, PCR error, and the elimination of native base modifications) or add orders-of-magnitude greater cost (due to sequencing the entirety of the sample), respectively.
SUMMARY
Methods that enable amplification-free enrichment of target DNA sequences in vitro would be transformative for DNA sequencing applications. Although PCR- based amplification of specific molecules can lead to high levels of enrichment, this process creates bias in the eventual sequencing library, eliminates native base modifications, can be prone to intra-molecular template switching and read decomposition, that together result in sequencing data that does not accurately reflect the molecules of interest. Many amplification-free sequencing approaches have sought to solve this challenge, but they suffer from inadequate enrichment for the target molecules, leading to high levels of background (95%+) that reduce sequencing efficiency and throughput. Innovations that improve nucleic acid enrichment would streamline sequencing applications and dramatically reduce cost by avoiding sequencing unwanted off-target nucleic acid molecules. Here we describe methods for nucleic acid labeling or enrichment without amplification by leveraging RNA- programmable nucleases and polymerases. The principles of this approach are (1) user-specifiable sequence-specific nucleic acid cleavage, (2) direct writing of customizable adaptor-complementary ‘overhang’ sequences onto the target substrate nucleic acids of interest to create a modified nucleic acid, and (3) ligation of custom sequencing adaptors onto the nascent custom overhang to create a product nucleic acid (Fig. Id). The Click-Assisted Genome Enrichment (CAGE) method is applicable to DNA substrates (Figs. 10A and 10B), whereas the Click-Assisted Precise Targeting of Unaltered RNA for Enrichment (CAPTURE) method is applicable to RNA substrates (Figs. 10C and 10D). We demonstrate the efficacy of these approaches to write and ligate DNA adaptor-complementary overhangs onto targeted substrate nucleic acids in various one- or two-step molecular reactions with minimal user intervention. This approach is amenable to various modifications and extensible to other nucleic acid modalities or templates (e.g., RNA). The selective polymerization of the overhangs onto only the intended targeted substrate nucleic acids should lead to near perfect enrichment of the desired nucleic acid sequences, substantially reducing background sequencing of unwanted off-target molecules that are highly problematic for current amplification-free approaches. When combined with current short and long-read sequencing platforms, the amplification-free CAGE and CAPTURE methods facilitate a variety of applications including more accurate understanding of
DNA and RNA editing outcomes, and more scalable and unbiased interrogation of genomic, epigenomic, transcriptomic, and epitranscriptomic biology.
The present methods use a nuclease with a DNA or RNA template (e.g., either click editor or prime editor) and a DNA- or RNA-dependent polymerase to install a DNA or RNA 3 ’ overhang onto a target DNA or RNA substrate.
Provided herein are methods comprising: preparing a reaction mixture comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence on a substrate nucleic acid; (iii) one, two, or more clkDNA oligonucleotide templates (ssDNA, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, or modified DNA or RNA) comprising a localization moiety that binds to the clkDNA tethering domain, a sequence complementary to an adaptor- complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn.
Additionally, provided herein are methods for generating a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an adaptor-complementary overhang (ACO) sequence. The methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence on a substrate nucleic acid; and (iii) one, two, or more clkDNA oligonucleotide templates (ssDNA, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, modified DNA, or modified DNA/RNA bases) comprising a localization moiety that binds to the clkDNA tethering domain, a template complementary to the adaptor- complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction
mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with the reaction mixture of (A) and a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, and incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; and the polymerase extends the 3’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence. In some embodiments, the substrate nucleic acid are at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
In some embodiments, the methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor or overhang sequences comprise: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to above 60°C, e.g., to about 72°C; contacting the modified nucleic acid of DNA with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA (e.g., 20-27°C or room temperature), thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
In some embodiments, the methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs at comprising ACO sequence produced by a method described herein; contacting the modified nucleic acid of DNA with a ligase and a single stranded oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo, hybrid ssDNA/ssRNA, DNA, hybrid DNA/RNA, modified DNA, or modified DNA/RNA bases) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click
oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
Also provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and (iii) one, two, or more clkDNA oligonucleotide templates (i.e., single stranded templates) comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3 ’ single stranded flaps using the ACO as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under
conditions wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid of DNA (e.g., 65°C when a thermostable ligase is used), thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Also provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a DNA binding domain nuclease (DBDn), optionally cas9, optionally linked or fused to a clkDNA tethering domain; (ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and (iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template for the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for a non-thermostable ligase), for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a
further modified nucleic acid of DNA having 5’ ends comprising click-compatible moi eties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Additionally, provided herein are methods comprising: preparing a reaction mixture comprising: (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected targeted nucleic acid substrate; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn.
Additionally, provided herein are methods for generating a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an adaptor-complementary overhang (ACO) sequence. The methods comprise (A) providing a reaction mixture prepared according to a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with the reaction mixture of (A)
and an RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; and the polymerase extends the 3’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Further, provided herein are methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid of DNA with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA (e.g., 20- 27°C or room temperature), thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Also provided herein are methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid of DNA with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties
with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Additionally provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) and a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang (e.g., 1-100 bp overhangs) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for non-thermostable ligase) wherein the thermostable ligase ligates the dsDNA oligo to the nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor
sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Also provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9; (ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn; (B) providing a sample comprising a substrate nucleic acid DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3 ’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active (e.g., 65°C), for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moi eties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA,
thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA are at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Further provided herein are methods comprising: preparing a reaction mixture comprising: (i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence; (iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn.
Also provided herein are methods for generating a modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, the method comprising: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence; (iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor- complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn; (B) providing a sample comprising substrate nucleic acid of RNA, preferably RNA isolated from a cell or sample from an animal (e.g., mRNA or purified RNA); and contacting the
sample with the reaction mixture of (A) and a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the clkDNA oligonucleotide template binds to the RNA; and the polymerase extends the 3 ’ end of the RNA molecule using the ACO complementary region of the clkDNA as a template, thereby producing a modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence. In some embodiments, the substrate nucleic acid of RNA are at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Additionally provided herein are methods for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The method comprise: providing a sample comprising a modified nucleic acid of DNA comprising a 3’ end comprising an ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid (e.g., 20-27°C or room temperature for nonthermostable ligase), thereby producing a further modified nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Also provided herein are methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: providing a modified nucleic acid of DNA comprising a 3’ end comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid with a ligase and a single stranded DNA oligonucleotide modified with a clickcompatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid of dsDNA having a 5’ end comprising a click-compatible moiety; and contacting the further modified nucleic acid of DNA
having a 5’ end comprising a click-compatible moiety with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the clickcompatible moi eties on the further modified nucleic acid of DNA (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Further provided herein are methods for generating a product nucleic acid of RNA having ends comprised of defined adaptor sequences comprising: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a RNA binding domain nuclease (RBDn), optionally Cast 3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence;
(iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)- (iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn; (B) providing a sample comprising RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary region as a template, thereby producing a modified nucleic acid of RNA comprising a ds 3’ end comprising an adaptor- complementary overhang (ACO) sequence, and incubating the sample under conditions (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for non-
thermostable ligase) wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid of RNA, thereby producing a product nucleic acid of RNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of RNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
A method for generating a product nucleic acid (e.g., RNA) having ends comprised of defined adaptor sequences, the method comprising: (A) providing a reaction mixture prepared by a method described herein, comprising: (i) a RNA binding domain nuclease (RBDn), optionally Cast 3, optionally linked or fused to a clkDNA tethering domain; (ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence; (iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn; (B) providing a sample comprising RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary region as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for non-thermostable ligase) wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a region of dsDNA having a 5’ end comprising a click-compatible moiety; and contacting the region of dsDNA
having a 5’ end comprising a click-compatible moiety with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the clickcompatible moieties on the further modified nucleic acid (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Further, provided herein are methods comprising: preparing a reaction mixture comprising: (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) at least one pegRNAthat binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn.
Also provided herein are methods for generating a further modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, the method comprising: (A) providing a reaction mixture prepared according a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) at least one pegRNAthat binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding a sequence complementary to the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn; (B) providing a sample comprising RNA; and contacting the sample with the reaction mixture of (A) and an RNA- dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; and the polymerase extends the 3’ end using the ACO complementary region as a template, thereby producing a modified nucleic acid of
RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence. In some embodiments, the substrate nucleic acid is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Further, provided herein are methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences comprising: providing a sample comprising a modified nucleic acid of DNA comprising a 3’ end comprising ACO sequence produced by a method described herein; optionally treating the sample to inactivate any active enzymes, e.g., by heating the sample to about 72°C; contacting the modified nucleic acid of DNA with a double stranded DNA oligonucleotide comprising defined adaptor sequences and a ssDNA 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions (e.g., 20-27°C or room temperature for non-thermostable ligase) sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA, thereby producing a region of dsDNA having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Additionally provided herein are methods of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences. The methods comprise: providing a modified nucleic acid of DNA comprising 3’ overhangs (e.g., 1-100 bp overhangs) comprising an ACO sequence produced by a method described herein; contacting the modified nucleic acid with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moi eties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences. In some embodiments, the
substrate nucleic acid of DNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Also provided herein are methods for generating a product nucleic acid of RNA having ends comprised of defined adaptor sequences. The methods comprise: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) one, two, or more pegRNAs that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn; (B) providing a sample comprising RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang (e.g., 1-100 bp overhang) comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary region as a template, thereby producing a modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for non-thermostable ligase) wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid of RNA, thereby producing a modified nucleic acid of RNA having an end comprised of a defined adaptor sequence. In some embodiments, the substrate nucleic acid of RNA is at least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
A method for generating a product nucleic acid, e.g., RNA, having ends comprised of defined adaptor sequences, the method comprising: (A) providing a reaction mixture prepared by a method described herein, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3; (ii) one, two, or more pegRNAs
that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding a sequence complementary to the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn; (B) providing a sample comprising RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions (e.g., 37°C) wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3 ’ single stranded flaps using the ACO complementary region as a template, thereby producing a modified nucleic acid, comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions (e.g., 65°C for a thermostable ligase, 20-27°C or room temperature for non-thermostable ligase) wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a region of dsDNA having 5’ ends comprising clickcompatible moi eties; and contacting the further modified nucleic acid having 5’ ends comprising click-compatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid of RNA, thereby producing a further modified nucleic acid having ends comprised of defined adaptor sequences. In some embodiments, the substrate nucleic acid is least 20, 25, 30, or 50bp, and up to 1, 2, 3, 4, 5, or 6Mbp.
Additionally provided are methods for labeling DNA or RNA as described herein.
Also provided herein are methods comprising: cleaving a nucleic acid using a sequence-specific nuclease (e.g., via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA); writing adaptor-complementary
overhang (ACO) sequences onto a 3’ end of a cleaved nucleic acids (e.g., in a DNA version of this method via a DNA-dependent DNA polymerase of the PCE using the clkDNA as a template); and ligating an oligonucleotide, e.g., a sequencing adaptor, onto the 3’ ACO.
In some embodiments, the nucleic acid is DNA, and the nuclease is a Class II type II CRISPR, optionally CRISPR-Cas9.
In some embodiments, the nucleic acid is RNA, and the nuclease is a class II type VI CRISPR, optionally CRISPR-Casl3.
In some embodiments, the method further comprises sequencing the nucleic acid using the sequencing adaptor.
In some of the methods described herein, the DBDn can be a Cas-family enzymes (e.g., Cas9 or Casl2), a TnpB-family enzyme, or an IscB-family enzyme, e.g., as described herein.
As used herein, “linked” can include non-covalent attachment or association such as recruitment of the clkDNA through the gRNA (e.g., via RNA hairpin on the gRNA and RNA aptamer binding protein fused to a HUH, enabling recruitment of the clkDNA to Cas9 via the gRNA).
In some embodiments of the methods described herein, a plurality of different ACOs are used; in such methods, preferably a plurality of adaptors are used that correspond or are fully or partly complementary in sequence to the different ACOs (e.g., if two different ACO sequences are used, two different adaptor sequences are present in the mix, each complementary or partly complementary (e.g., at least 50%, 60%, 70%, 80%, 90%, or 95%, complementary) to one of the ACO sequences).
In some embodiments of the methods described herein, the gRNAs are allowed to bind to the DBDn before the clkDNA oligonucleotide templates are added to the mixture.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their
entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
DESCRIPTION OF DRAWINGS
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Figure 1A-D. Overview of methods for target molecule enrichment and sequencing. A, Schematic of amplification-based methods to enrich for target regions or to add sequencer-specific adapter sequences. B, Schematic of ultra-deep sequencing to permit adequate coverage of a region of interest. C, Schematic of current Cas9-targeted, phosphorylation-based enrichment approach. D, Schematic of one embodiment of a CAGE sequence-specific enrichment approach, leading to more selective background-free target substrate enrichment without amplification via user- specifiable adapter-complementary overhangs (ACOs). ROI, region of interest; DNA, deoxyribonucleic acid; RNA, ribonucleic acid
Figure 2A-C. Workflow for exemplary CAGE and sequence-specific adaptor ligation to DNA targets. A, Ribonucleoprotein (RNP) formation of exemplary Click Editors using a DNA-binding domain nuclease (DBDn), HUH endonuclease, and DNA-dependent polymerase complex (where the DNA polymerase can be unfused or fused to the DBDn) in combination with a guide RNA (gRNA) and clkDNA pair. B, CAGE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the modified DNA. C, CAGE workflow using oligo annealing to the ACO and subsequent adaptor attachment via click chemistry. ACO, adaptor-complementary overhang; cACO, complement of the ACO; gRNA, guide RNA; clkDNA, click DNA template oligonucleotide; FBR, flap-binding region; PT, polymerization template; RNP, ribonucleoprotein; DBDn, DNA binding domain nuclease; DNA, deoxyribonucleic acid; HUH, a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease.
Figure 3A-B. Detailed exemplary 6-step and 5-step CAGE workflow. A, Schematic of an exemplary 6-step CAGE workflow. A heat inactivation step (step 4)
dissociates Cas9 from the target DNA, liberating the click-edited ACO for adaptor ligation in subsequent steps. The DNA Ligase and adaptors are subsequently added in step 5. B, Schematic of an exemplary 5-step CAGE workflow. DNA ligase and adaptor are added together with other components in a single reaction prior to click editing. No heat inactivation step is included. ACO, adaptor-complementary overhang; cACO, complement of the ACO; sgRNA, single guide RNA; gRNA, guide RNA; clkDNA, click DNA template oligonucleotide; FBR, flap-binding region; PT, polymerization template; DBDn, DNA binding domain nuclease; RNP, ribonucleoprotein; DNA, deoxyribonucleic acid; HUH, a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease; RT, room temperature; dNTPs, deoxynucleoside triphosphates; ATP, adenosine triphosphate.
Figure 4A-G. CAGE-mediated adaptor ligation on PCR-generated fragments and human genomic DNA. A, Schematic of exemplary single-end CAGE workflow to assess feasibility of sequence-specific ACO addition to a substrate followed by adaptor ligation. B, Junction PCR across the adaptortarget junction of the product DNA to verify adaptor ligation following CAGE reactions on PCR- fragments. Bands illustrating PCR amplification are only present in conditions that have the adaptor successfully ligated; assessed via Qiaxcel capillary electrophoresis (Qiagen). C, Sanger sequencing results of the expected product (from a 6-step CAGE reaction) from PCR substate targeting experiments. D, Adaptor ligation efficiency assessed by ddPCR for 5- and 6- step reactions (against a PCR substrate) and reactions that varied the click editing incubation time. Individual datapoints, mean, and standard deviation shown for n=3 replicates. E, Junction PCR across the adaptor Target junction of the product DNA for human gDNA targeting (HEK site 3) experiments. Bands illustrating PCR amplification are only present in conditions that have the adaptor successfully ligated; assessed via Qiaxcel capillary electrophoresis (Qiagen). F, Sanger sequencing results of the expected product (from 6-step CAGE reactions) from gDNA targeting experiments. G, Adaptor ligation efficiency from gDNA targeting experiments assessed by ddPCR for 6-step CAGE reactions using a 2 hour click editing time. Individual datapoints, mean, and standard deviation shown for n=3 replicates. ACO, adaptor-complementary overhang; cACO, complement of the ACO; gRNA, guide RNA; clkDNA, click DNA template oligonucleotide; FBR, flapbinding region; PT, polymerization template; RNP, ribonucleoprotein; DNA,
deoxyribonucleic acid; HUH, a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease; RT, room temperature; dNTPs, deoxynucleoside triphosphates; ATP, adenosine triphosphate; PCR, polymerase chain reaction; CAGE, click-assisted genome enrichment.
Figure 5. Use of thermostable DNA ligases in CAGE reactions. Exemplary 5-step CAGE workflow, similar to as shown in Fig. 3B, except that thermostable DNA ligases used; thermostable ligases are activated at the same temperature that the DBDn is inactivated, leading to single-pot click editing and adaptor ligation without user intervention. ACO, adaptor-complementary overhang; cACO, complement of the ACO; gRNA, guide RNA; sgRNA, single guide RNA; clkDNA, click DNA template oligonucleotide; FBR, flap-binding region; PT, polymerization template; DBDn, DNA binding domain nuclease; RNP, ribonucleoprotein; DNA, deoxyribonucleic acid; HUH, a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease; RT, room temperature; dNTPs, deoxynucleoside triphosphates; ATP, adenosine triphosphate.
Figure 6A-C. Workflow for PE-CAGE and sequence-specific adaptor ligation to DNA targets. A, Ribonucleoprotein (RNP) formation of Prime Editors using a DNA-binding domain nuclease (DBDn) and RNA-dependent polymerase (e.g. reverse transcriptase (RT); where the RT can be unfused or fused to the DBDn) in combination with a prime editing guide RNA (pegRNA) and clkDNA pair. B, PE- CAGE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the target DNA. C, PE-CAGE workflow using oligo annealing to the ACO and subsequent adaptor attachment via click chemistry. cACO, complement of the ACO; PBS, primer binding site; RTT, reverse transcriptase template.
Figure 7A-C. PE-CAGE-mediated overhang installation on human genomic DNA. A, Schematic of a single-end PE-CAGE experiment to assess feasibility of ACO installation using a human genomic DNA substrate (at HEK site 3). B, Junction PCR (across the ACO:gDNA target junction) and Sanger sequencing results of the modified nucleic acid showing the expected ACO installation at the target site. C, ACO installation efficiency on genomic DNA at HEK site 3, as assessed by ddPCR. ACO, adaptor-complementary overhang; cACO, complement of the ACO; RT, reverse transcriptase; PBS, primer binding site; RTT, reverse transcriptase
template; pegRNA, prime editing guide RNA; DNA, deoxyribonucleic acid; gDNA, genomic DNA.
Figure 8A-C. Workflow for CAPTURE and sequence-specific adaptor ligation to RNA targets. A, Ribonucleoprotein (RNP) formation of Click Editors using an RNA-binding domain nuclease (RBDn), HUH endonuclease, and DNA- dependent polymerase complex (where the DNA polymerase can be unfused or fused to the RBDn) in combination with a guide RNA (gRNA) and clkDNA pair. B, CAPTURE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the target RNA. C, CAPTURE workflow using oligo annealing to the RNA ACO and subsequent adaptor attachment via click chemistry. clkDNA, click DNA template oligonucleotide; cACO, complement of the ACO; FBR, flap-binding region; PT, polymerization template; RNP, ribonucleoprotein; DNA, deoxyribonucleic acid; RNA, ribonucleic acid; HUH, a histidine-hydrophobic residue-histidine (HUH) motif-containing endonuclease; RT, room temperature; dNTPs, deoxynucleoside triphosphates; ATP, adenosine triphosphate; PCR, polymerase chain reaction; CAGE, click-assisted genome enrichment; pol, polymerase.
Figure 9A-C. Workflow for PE-CAPTURE and sequence-specific adaptor ligation to RNA targets. A, Ribonucleoprotein (RNP) formation of Prime Editors using an RNA-binding domain nuclease (RBDn) and RNA-dependent polymerase (e.g. reverse transcriptase (RT); where the RT can be unfused or fused to the RBDn) in combination with a prime editing guide RNA (pegRNA) and click DNA (clkDNA) pair. B, PE-CAPTURE workflow using adaptors that contain a 3’ overhang complementary to the adaptor-complementary overhang (ACO) region of the target RNA. C, PE-CAPTURE workflow using oligo annealing to the ACO and subsequent adaptor attachment via click chemistry. cACO, complement of the ACO; PBS, primer binding site; RTT, reverse transcriptase template; RNA, ribonucleic acid.
Figures 10A-D. Exemplary methodological flowcharts for CAGE and CAPTURE workflows. A, Exemplary CAGE workflow using adaptors with ACO- complementary 3’ overhangs. B, Exemplary CAGE workflow using oligo ligation and click-chemistry adaptor attachment. C, Exemplary CAPTURE workflow using adaptors with ACO-complementary 3’ overhangs. D, CAPTURE workflow using oligo ligation and click-chemistry adaptor attachment. For panels A-D, boxed steps in
the flowchart indicate potential user intervention. ACO, adaptor-complementary overhang; sgRNA, single guide RNA; pegRNA, prime editing guide RNA; clkDNA, click DNA template oligonucleotide; CAGE, click-assisted genome enrichment; CAPTURE, click-assisted precise targeting of unaltered RNA for enrichment; RNP, ribonucleoprotein; DBD, DNA binding domain; RBD, RNA binding domain; PE, prime editor; DNA, deoxyribonucleic acid; RNA, ribonucleic acid; HUH, a histidinehydrophobic residue-histidine (HUH) motif-containing endonuclease. dNTPs, deoxynucleoside triphosphates; ATP, adenosine triphosphate.
Figures 11A-B. CAGE reactions on PCR substrates with and without a tethering domain. CAGE on a PCR substrate containing a HEK3 Cas9 sgRNA target sequence either with (A) PCV2-Cas9 or (B) Cas9. Reactions were done with a range of clkDNA concentrations (0. IpM to 1 OpM) and a clkDNA containing a 15bp cACO and a 17bp PBS. The clkDNA also contained or lacked a PCV2 recognition sequence. ACO writing percentage, measured by NGS, is shown for various lengths of ACO installed where 9, 11, and 13bp represent truncations and 15bp represents full ACO addition.
Figure 12. CAGE reactions on gDNA substrate with and without a tethering domain. CAGE on human (HEK293T) gDNA using an sgRNA targeting HEK3. Reactions were done with a range of clkDNA amounts (Ipmol, 5pmol, or lOpmol) and a clkDNA containing a 15bp cACO, a 17bp PBS, and a 13bp PCV2 sequence. ACO writing percentage, measured by NGS, is shown - accounting for reads containing the 15bp ACO with no additional PCV2 sequence installation and reads with the ACO and 4, 9, or 13 bp of the PCV2 sequence.
DETAILED DESCRIPTION
Targeted nucleic acid enrichment methods enable users to selectively sequence only regions of interest, reducing cost and labor while increasing throughput, sequence coverage, and resolution. Typically, PCR-based amplification is employed for target enrichment. However, PCR amplification introduces noise from low-level polymerase error and bias, particularly on repetitive/complex templates or for applications that have varying template sizes). Furthermore, PCR-based methods may be unfeasible for certain template compositions, and they eliminate native base modifications that are crucial to certain applications. Other methods for amplification-
free target enrichment typically include the use of enzymes to selectively modify or cleave the target region5 7 (e.g., whole-sample dephosphorylation followed by targetspecific exposure of 5 ’-phosphates following enzymatic cleavage by a restriction enzyme of CRISPR-Cas nuclease7; Fig. 1C). However, these methods often lack adequate target-molecule selectivity, require protracted protocols, exhibit inadequate background reduction (e.g. due to incomplete whole-sample dephosphorylation) that results in only about 0.5-5% target enrichment5 7, and/or necessitate the use of high quantities of input DNA, that together lead to low-to-moderate overall enrichment with high residual unwanted ‘background’ sequencing of non-target molecules. Thus, there is a need for methods that can significantly reduce cost by maximizing target region enrichment and sequencing coverage. Ideally, an enrichment workflow to capture only the intended nucleic acid regions prior to sequencing would have the following properties; it would: be simple, require minimal user intervention, contain no amplification steps, be compatible with a range of nucleic acid inputs, and facilitate near- 100% target molecule enrichment prior to sequencing (e.g. via selective adaptor ligation to target fragment(s)).
Here we describe methods that can dramatically reduce the cost of sequencing while preserving native samples by enabling amplification-free selective enrichment of target molecules. A key premise of these approaches is to selectively cleave the substrate nucleic acid molecule(s) of interest followed by targeted polymerization of an adaptor-complementary overhang (ACO) sequence onto this sequence to create a modified nucleic acid; the sequence complementary to the ACO (cACO) is encoded on a click DNA (clkDNA) oligonucleotide template (Fig. ID). The sequence of the ACO is user-specifiable and can be comprised of a variety of different lengths and sequence compositions, e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nt and up to 10, 15, 20, 25, 30, 40, 50, 75, or 100 or more nt long, and ranges having any of the foregoing values as end points. The modified nucleic acid(s) or regions-of-interest (ROIs) comprising the ACO can then be subjected to downstream reactions that generate sequencing-competent product nucleic acid molecules; for instance, by ligation of a platform-specific sequencing adaptor where the adaptor has an overhang complementary to the ACO (e.g., for Illumina short-read or long-read sequencing, Oxford Nanopore Technologies (ONT) long-read sequencing, or Pacific Biosciences (PacBio) long-read sequencing). The selective addition of custom adaptor-
complementary overhangs onto the modified target nucleic acid should virtually eliminate the background inherent to other enrichment methods, leading to maximal sequencing bandwidth and product molecule coverage. This method is compatible with any short- and long-read sequencing platform using appropriate adaptors.
Thus, provided herein are methods of using a nuclease with a DNA or RNA template (either a click editor or prime editor) and a DNA- or RNA-dependent polymerase to install a DNA or RNA 3’ overhang onto a target DNA or RNA molecule, referred to herein as CAGE and CAPTURE, using various polymerases.
The methods can include installing a 3’ ACO onto a substrate nucleic acid where the ACO has complementarity to a sequencing adaptor, or ligating an oligo with a click chemistry moiety and using an adaptor with a compatible click chemistry moiety to attach an adaptor.
Any ACO sequence or adaptor sequence can be used that is compatible with any sequencing platform.
In addition, the adaptors can be modified with a moiety that can enable pull down of adaptor-ligated molecules from non-adaptor-ligated molecules (e.g., biotin on adaptor, then use streptavidin beads in final purification step); alternatively, oligoconjugated beads can be used to pull down product nucleic acids with an ACO installed.
Click-Assisted Genome Enrichment (CAGE) /Click-Assisted Precise Targeting of Unaltered RNA for Enrichment (CAPTURE)
The present inventors recently described the development of click editors (CEs), a new set of technologies that enable the installation of a range of different DNA substrate edits or modifications in cells for genome editing, for eventual use in vivo, or for various uses in vitro. Exemplary polymerase click editors (PCEs) couple HUH endonucleases with RNA-guided DNA binding domains and DNA polymerases (where the DNA polymerase is either fused or supplied in trans), that be optionally directed to a target site via a guide RNA (gRNA) (Fig. 2A). In PCEs where the DNA binding domain is a CRISPR-Cas9 nickase, the PCE-gRNA complex nicks the nontarget DNA strand (NTS) of a target site (within a chromosome/genome), exposing the NTS DNA ‘flap’ for modification. During this reaction, a click DNA (clkDNA) template is provided in trans; the clkDNA template encodes an HUH endonuclease recognition site (e.g., a PCV2 recognition site), a polymerization template (PT), and a
NTS flap binding region (FBR); Fig. 2A. The HUH enzyme of the PCE cleaves and covalently binds to the clkDNA, permitting local recruitment and annealing of the FBR of the clkDNA with the nicked NTS of the target site, acting as a DNA:DNA duplex to recruit the DNA-dependent DNA polymerase (which can be PCE-fused or unfused) to initiate extension of the NTS using the PT portion of the clkDNA as a template. This process enables the selective writing of custom 3’ flaps into regions of genomic DNA in human cells, which through subsequent DNA repair and removal of the native 5’ flap leads to permanent installation of clkDNA-encoded sequence alterations. Flaps are the strands exposed after the DNA is cut by the nuclease; for Cas9-gRNA bound target DNAs, the 3’ flaps on the non-target strand are extended by the polymerase, so 3’ flaps are the new sequence written in from the polymerase domain of the PCE.
Beyond the use of PCEs in cells, modified versions of PCEs can be deployed for a variety of uses in vitro. For instance, PCEs that are comprised of nucleases instead of nickases could enable simultaneous substrate cleavage and ‘tagging’ of user-specifiable sequences. And since the stoichiometries of different reaction components are easily titratable in vitro, the polymerase can be decoupled/unfused from the HUH-Cas9 fusion. The general principles of this approach are: (1) user- specifiable sequence-specific nucleic acid cleavage (e.g., via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA), (2) direct writing of customizable adaptor-complementary 3’ ‘overhang’ (ACO) sequences onto the substrate nucleic acids of interest (e.g., in a DNA version of this method via a DNA- dependent DNA polymerase of the PCE using the clkDNA as a template), where the sequence of the ACO can be of various lengths and sequence compositions, and (3) ligation of custom sequencing adaptors onto the nascent custom 3’ overhang on the modified nucleic acid (Fig. 2A). We refer to this method as Click-Assisted Genome Enrichment (CAGE); it should in principle enable near 100% amplification-free enrichment of target DNA (compared to current standards of -0.5-5%). CAGE permits the installation of long user-specifiable 3’ overhang sequences, offering numerous advantages compared to current Cas9-based enrichment methods that rely on single nucleotide (nt) overhangs for ligation of adaptors (but due to incomplete whole-sample dephosphorylation, still leads to ~95%+ background7). CAGE can be used to install an overhang on a single-end of a nucleic acid (using one gRNA), or
onto both ends (using two or more gRNA). While theCAGE method is applicable to DNA substrates (Figs. 10A and 10B), the Click-Assisted Precise Targeting of Unaltered RNAfor Enrichment (CAPTURE) method is applicable to RNA substrates (Figs. IOC and 10D)
For various sequencing applications, sample multiplexing using barcodes can dramatically reduce sequencing costs by permitting the pooling of orthogonally barcoded samples into a single sequencing run. In the context of CAGE, sample multiplexing should be permitted by using adaptors that encode barcodes (as is done by various sequencing platforms including Oxford Nanopore Technologies (ONT), Pacific Biosciences (PacBio), Illumina, etc.).
CAGE can be used, e.g., for single-end labeling of DNA substrates, or dualend labeling of DNA substrates; CAPTURE can be used for single- or dual -end labeling of RNA substrates. Single-end labeling of DNA might be preferable for quantifying or sequencing substrates with known or ambiguous ends (e.g., near the end of chromosomes or DNA fragments, for unidirectional sequencing near CRISPR- Cas genome editing target sites, sequencing from ‘bait’ sites to identify genome-scale changes, or for unbiased assessment of large sequence DNA integration or inversion events). Dual-end labeling to enrich specific sequences from certain larger DNA substrates (e.g., chromosomes) with known DNA sequences may be preferable to improve sequencing efficiency and termination.
Within the CAGE method are multiple variations that differ at the adaptor ligation step. In some embodiments, following writing of the ACO, a single adaptor molecule can be used that harbors a 3’ overhang (complementary to the ACO) and the remainder of the double-stranded region appropriate to the sequencing-platform. This single adaptor can be ligated to the ACO of the target site(s) in a single step (Figs. 2B and 10A). In some embodiments, a two-step reaction is used. A single stranded oligonucleotide that is complementary to the ACO and that contains a 5’ modification compatible with click chemistry is annealed and ligated to the ACO/target modified nucleic acid molecule. This further modified nucleic acid is then combined in a reaction with a second adaptor containing the appropriate modification to enable click chemistry with the initial ligated oligo, facilitating the rapid attachment of adaptors to the target sequence(s) resulting in a product nucleic acid (Figs. 2C and 10B). An example of this type of adaptor is Oxford Nanopore Technologies’ “Rapid Adaptor”.
The ability to polymerize a DNA sticky-end overhang onto target DNA or RNA sequences represents a powerful approach to enable targeted, amplification-free enrichment of nucleic acid sequences in vitro for various sequencing applications and beyond. The use of click editors and prime editors of various compositions underlie CAGE and CAPTURE, making this process simple and cost-effective by requiring only simple sgRNA-clkDNA pairs or pegRNAs. These technologies and methods enable predictable and efficient overhang installation through simple single-pot reactions with only one or two steps of user-intervention (for a total of ~l-3 hrs of protocol time). CAGE offers major advantages compared to current amplification-free targeted enrichment approaches by facilitating highly specific adaptor ligation through overhang-adaptor complementarity instead of phosphorylation states7, reducing background in sequencing runs to enable much higher coverage and resolution at significantly lower costs. Currently, there are no comparable methods to CAPTURE for targeted, amplification-free enrichment of RNA sequences.
The applications of CAGE and CAPTURE are wide-ranging across basic biology - genomics, epigenomics, transcriptomics, epitranscriptomics, diagnostics, microbial/microbiome studies, etc. Additionally, CAGE can facilitate a more thorough investigation of intended and unintended genomic or epigenetic alterations by genome editing technologies at endogenous loci, particularly as new genome editing technologies capable of larger sequence edits continue to emerge (e.g. insertions, deletions, inversions, etc.). Similarly, CAPTURE could also be used to profile the potential of RNA editing technologies to edit the transcriptome or alter RNA modifications.
In principle, CAGE enables at or near 100% enrichment (percent of sequencing reads that are expected to be attributable to the desired target region) of target sequences because this method relies on highly specific adaptor ligation, and only adaptor-ligated product nucleic acids are sequenced (whereas other Cas9-based approaches achieve only 0.5-5% enrichment due to inherent non-specific adaptor ligation to non-target molecules5 7). Even with 10-30% ligation efficiency of the adaptor to the ACO-extended region-of-interest, only the adaptor-ligated molecules should be sequenced (resulting in at or near 100% enrichment). Notably, adaptor ligation efficiency dictates required DNA input amount and does not determine enrichment. Improvements to the ACO and ligation efficiency will facilitate
compatibility with low and ultra-low input samples (e.g. varying the ACO sequence composition and length, inclusion of other factors like single-strand binding protein (SSB) to the reaction, etc.). Furthermore, the adaptor itself can be customized to be compatible with a broad range of sequencing platforms or for other uses, as well as modified to facilitate further physical isolation of adaptor-ligated products from non- adaptor-ligated molecules (e.g. adaptors containing a functional group like a biotin moiety, that would permit adaptor-ligated molecules to be isolated or pulled down from the population of molecules using streptavidin beads).
Architectures
The enzymes or enzyme fusions used in CAGE and CAPTURE reactions can comprise a variety of architectures, including a domain for tethering (T) the template (also referred to herein as clkDNA tethering domain), a DNA binding domain nuclease (DBDn) for cleaving the substrate, and a polymerase (P) for extending the template, separated by optional linkers (L). Although herein we utilized a tethering- linker-nuclease with polymerase in trans architecture (T-L- DBDn +P), where the tethering domain (for template recruitment) was the HUH enzyme PCV2 and the DNA binding domain nuclease (DBDn) was SpCas9, alternate orientations could also include the optional fusion of a polymerase. Examples of alternate architectures beyond T-L-DBDn+P include but are not limited to: DBDn-L-T+P, T-L-DBDn-L-P, T-L-P-L-DBDn, DBDn-L-T-L-P, DBDn-L-P-L-T, P-L-DBDn-L-T, or P-L-T-L- DBDn (where DBDn is the nuclease, T is the tethering domain, P is the polymerase, and L is a linker; each of varying compositions). The nuclease domain (DBDn) may be SpCas9 or other nucleases; the tethering domain (T) may be an HUH enzyme (e.g. PCV2 or others) or other tethering domains; the polymerase domain (P) may be E. coli Klenow fragment or other polymerases; and the linkers (L) may be of various lengths and amino acid compositions.
Thus the DNA polymerase can be either fused or unfused from the DBDn-T complex; in addition, the T (tethering domain) can also be recruited or linked to the DBDn via other methods instead of direct fusion, including but not limited to recruitment through the gRNA; if any component is unfused, that component can optionally be recruited or linked to the target site through a protein recruitment domain, e.g., phage coat proteins (CP) (coupled with sgRNAs encoding the corresponding RNA recognition hairpin recognized by a given coat protein).
Thus, one method to selectively recruit and link proteins or domains to specific target sites is to fuse or recruit the effector protein-of-interest (e.g., polymerase and/or HUH endonuclease or other tethering domain) to a recruitment domain (e.g., an RNA binding protein, e.g., MCP, PCP, or Com RNA binding protein41,42), e.g., the MS2-coat protein (MCP)) that then interacts with a specific hairpin sequence encoded within that gRNA (e.g., viral RNA sequences MS2, PP7, and com, e.g., an MS2 hairpin). This permits selective recruitment of the effector to the site bound by the primary gRNA; for example, the gRNA targeting the secondary nicking site would not harbor an MS2 hairpin, preventing recruitment to that site.
To recruit proteins to the target site, protein complexes can be formed using a protein recruitment domain coiled-coil (CC) protein domains, leucine zippers (LZs), or SunTags, that permit protein: protein interactions (among other types of protein recruitment strategies) can be used. Coiled-coil domains are known in the art, see, e.g., Woolfson, Adv Protein Chem. 2005;70:79-112 (design of coiled-coil structures and assemblies); Grigoryan and Keating, Curr Opin Struct Biol. 2008 Aug;18(4):477- 83 (structural specificity in coiled-coil interactions); Reinke et al., Am. Chem. Soc. 2010, 132, 17, 6025-6031 (synthetic coiled-coil interactome, heterospecific modules for molecular engineering); Ljubetic et al., Nature Biotechnology 35: 1094-1101 (2017)(coiled-coil protein-origami cages that self-assemble in vitro and in vivo); Fink et al., Nature Chemical Biology 15: 115-122 (2019)(orthogonal CC dimerizing domains); Lebar et al., Nature Chemical Biology 16:513-519 (2020) (orthogonal coiled-coil domains); Plaper et al., Scientific Reports 11 : 9136 (2021)(coiled-coil heterodimers); and Lainscek et al., Nature Communications 13:3604 (2022)(coiled- coil heterodimer-based recruitment of an exonuclease to CRISPR/Cas). Exemplary coiled-coil sequences include the following:
Name AA Sequence of Exemplary Coiled-Coil Domain
P 1 EIQ ALEE EN AQLEQ ENAALEE El AQLE Y
P2 KIAQLKE KNAALKE KNQQLKE KIQALKY
P3 EIQQLEE EIAQLEQ KNAALKE KNQALKY
P4 KIAQLKQ KIQALKQ ENQQLEE ENAALEY
P3 S EIQQLEE EISQLEQ KNSQLKE KNQQLKY
P4S KISQLKQ KIQQLKQ ENQQLEE ENSQLEY
P5 ENAALEE KIAQLKQ KNAALKE EIQALEY
P6 KNAALKE EIQ ALEE ENQALEE KIAQLKY
P7 EIQ ALEE KNAQLKQ EIAALEE KNQALKY
P8 KIAQLKE ENQQLEQ KIQALKE ENAALEY
P9 ENQALEQ KNAQLKQ EIAALEQ EIAQLEY
P 10 KNAQLKE ENAALEE KIQQLKE KIQALKY
P 11 ENQALEQ EIAQLEQ EIAALEQ KNAQLK Y
P 12 KNAQLKE KIAALKE KIQQLKE ENQ ALE Y
N5 EIAALEA KIAALKA KNAALKA EIAALEA
N6 KIAALKA EIAALEA ENAALEA KIAALKA
AP4 ELAANEE ELQQNEQ KLAQIKQ KLQAIKY
Exemplary combinations of CC domains include P1 :P2; P3:P4; P3:P4S; P3S:P4; P3S:P4S; P5:P6; P7:P8; P9:P10; P1 EP12; P3:P4; N5:N6; P3:AP4; and P3S:P4S.
Leucine zippers (LZs) are also known in the art. See, e.g., Amoutzias et al., Trends Biochem Sci. 2008 May;33(5):220-9; Bader and Vogt, (2006). Leucine Zipper Transcription Factors: bZIP Proteins. In: Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine. Springer, Berlin, Heidelberg, doi.org/10.1007/3- 540-29623-9 2180; and Busch and Sassone-Corsi, Trends Genet. 1990 Feb;6(2):36- 40 (see, e.g., exemplary LZ domain sequences in Fig. 1 of this paper; examples include: GCN4, yAP-1, C/EBP, CREB, CRE-BP1, c-Jun, JunB, JunD, FosB, Fra-1, and c-Fos).
SunTags are described in Tanenbaum et al., Cell. 2014 Oct 23; 159(3):635-46. Exemplary sequences include: GCN4: LLPKNYHLENEVARLKKLVGER; GCN4 variant: EELLSKNYHLENEVARLKK; and ScFv-GCN4: GPDIVMTQSPSSLSASVGDRVTITCRSSTGAVTTSNYASWVQEKPGKLFKGLI GGTNNRAPGVPSRFSGSLIGDKATLTISSLQPEDFATYFCALWYSNHWVFGQ GTKVELKRGGGGSGGGGSGGGGSSGGGSEVKLLESGGGLVQPGGSLKLSCA VSGFSLTDYGVNWVRQAPGRGLEWIGVIWGDGITDYNSALKDRFIISKDNGK NT VYLQMSKVRSDDT AL YYC VTGLFD YWGQGTL VT VS S . clkDNA Tethering Domain
The clkDNA tethering domain can include an HUH endonuclease (see, e.g., Table A) or a different ssDNA localization moiety, such as avidin (when the clkDNA is labeled with biotin), SNAP -tag (when the clkDNA is labeled with benzylguanine derivatives), CLIP -tag (when the clkDNA is labeled with benzylcytosine derivatives) (or other O6-alkylguanine-DNA-alkyltransferase derivatives) and HALO-tag or other haloalkane dehalogenase derivatives (when the clkDNA is labeled with a chloroalkane) (Table B).34 In some embodiments, HUH endonucleases are preferred
as they do not require specialized and expensive chemical modifications on the clkDNA and have high efficient covalent binding to their ssDNA substrate. Exemplary HUH endonucleases (Table A) include PCV2 HUH domain; DCV HUH domain; FBNYV HUH domain; RepBm HUH domain; Tral relaxase domain; dPCV2 (Y96F) HUH domain, MSMV HUH domain, TGMV HUH domain, ChiSCV-GT306, ChiSCV-GM510, ChiSCV-GM415, and other HUH domains described in Li, L. et al15.
*nonanucleotide sequence is bolded where known, additional sequence from 5' and 3' stem sequence flanking nonanucleotide sequence
Table B. Alternative clkDNA tethering domains
mSA-H , (monomeric streptavidin ) ; eMA, enhanced monomeric avidin
DNA binding domains
The CEs described herein optionally include an RNA-programmable DNA nickase that nicks the NTS, or a nuclease, including nickases from Cas-family enzymes (e.g., Cas9 or Casl2), TnpB-family, or IscB-family enzymes (Table C). See, e.g., Kapitonov et al., J Bacteriol. 2016 Mar 1; 198(5): 797-807; Karvelis et al.,
Nature. 2021; 599(7886): 692-696 (TnpB); Koonin and Makarova, PLoS Biol. 2022 Jan; 20(1): e3001481; Mingarro et al., Gene, 852: 147064 (2023); Altae-Tran et al,. Science. 2021 Oct;374(6563):57-65 (TnpB and IscB); Meers et al., bioRxiv 2023.03.14.532601 (TnpB and IscB); Schuler et al., Science. 2022 Jun 24;376(6600): 1476-1481; Kato et al., Nat Commun. 2022 Nov 7; 13(1):6719.
Nickases can be generated from wild type RNA-programmable DNA nucleases by the introduction of a mutation of a catalytic RuvC-II residue or a mutation of a catalytic HNH residue (Table C). For example, A. warmingii IscB nickases can include an H212A or E157A mutation; IscB nickases from other species can include corresponding mutations; see, e.g., WO 2022/087494. The nickase can also include one or more mutations that increase activity, reduce off-target effects, and/or alter protospacer adjacent motif (PAM) or target adjacent motif (TAM) specificity (Tables D and E). Exemplary Cas9 and Casl2 nickases and mutations are shown in Tables C-E.
* for Cas9 and IscB enzymes, the RuvC domain nicks the non-target strand (NTS) DNA and the HNH domain nicks the target strand (TS) DNA. For Casl2a/Cpfl or TnpB enzymes, the RuvC domain nicks both DNA strands. Mutations abrogate activity.
The sequence of ogeuIscB is as follows (from metagenome genome assembly, contig: NODE_25_length_150080_cov_8.882980; contig accession: OGEU01000025.1):
MAVVYVISKSGKPLMPTTRCGHVRILLKEGKARVVERKPFTIQLTYESAEETQ PLVLGIDPGRTNIGMS VVTESGESVFNAQIETRNI<DVPI<LMI<DRI<QYRMAHR
RLKRRCKRRRRAKAAGTAFEEGEKQRLLPGCFKPITCKSIRNKEARFNNRKRP VGWLTPTANHLLVTHLNVVKKVQKILPVAKVVLELNRFSFMAMNNPKVQR WQYQRGPLYGKGSVEEAVSMQQDGHCLFCKHGIDHYHHVVPRRKNGSETL ENRVGLCEEHHRLVHTDKEWEANLASKKSGMNKKYHALSVLNQIIPYLADQ
LADMFPGNFCVTSGQDTYLFREEHGIPKDHYLDAYCIACSALTDAKKVSSPK
GRPYMVHQFRRHDRQACHKANLNRSYYMGGKLVATNRHKAMDQKTDSLE
EYRAAHSAADVSKLTVKHPSAQYKDMSRIMPGSILVSGEGKLFTLSRSEGRN
KGQVNYFVSTEGIKYWARKCQYLRNNGGLQIYV
* predicted based on UniRule annotation on the UniProt database.
Effector Proteins
The methods and compositions described herein further include effector proteins that have DNA polymerase activity, e.g., DNA-dependent DNA polymerases, e.g., of family A, B, C, D, X, or Y, or RNA-dependent DNA polymerases (also known as reverse transcriptases) (for polymerase click editors (PCEs)). Exemplary polymerases include E. coli Klenow (EcKlenow, optionally with the D355A and/or E357A mutations that deactivate its 3’-5’ exonuclease domain); Taq Stoffel; Pol-Beta; Pol -Beta + Sso7d; Phi29 DNA Polymerase (D169A); Sequenase; T4 DNA Polymerase, and E. coli dKlenow (optionally with the D355A, E357A, D705A, and/or D882A mutations). Exemplary reverse transcriptases are described in more detail below. In some methods a ligase is used; exemplary DNA Ligases include T3 DNA ligase; T4 DNA ligase; T7 DNA ligase; ChlV (SplintR) DNA ligase; PhiKMV DNA
ligase; Vaccinia DNA ligase; E. coli DNA ligase, Taq DNA ligase; 9°C DNA ligase; or Hi-T4 DNA Ligase. Methods that use a clkDNA would need a DNA-dependent DNA polymerase, and methods that use a pegRNA would need an RNA-dependent DNA polymerase.
Previous literature on Taq DNA polymerase demonstrated activity-enhancing mutations at amino acid positions 732 (D732N)16, 50717, 54318, 605/61719, 685/686/687 (US 11046939B2), and 742/743 with or without a basic residue insertion of length 3 or length 9 between positions 738 and 73920. EcKlenow is also structurally homologous to TaqStoffel, hinting that analogous mutations may also increase the activity of EcKlenow. Moreover, non-specific DNA binding domains (e.g. villin headpiece, supercharged villin headpiece, Sso7d, NeqSSB, etc.) fused to the N- or C- terminus of DNA polymerases have been shown to increase the DNA affinity, stability, and processivity of the polymerase21 23. Exchange of the 3’-5’ exonuclease domain of TaqStoffel with that of EcKlenow may also endow TaqStoffel with proofreading capability, as has been done previously24. Thus, the present compositions and methods can use a DNA polymerase, such as EcKlenow or TaqStoffel, which include one or more of these modifications.
Reverse Transcriptases (RTs), Reduced Size RTs, and Variant RTs
The present compositions and methods can use any RT, including Group II introns. Group II introns are retroelements that consist of a self-splicing ribozyme and an intron encoded protein (IEP) which functions as a reverse transcriptase (RT), DNA endonuclease, and RNA maturase.
In some embodiments, the pentamutant Moloney Murine Leukemia Virus reverse transcriptase (MMLV-RT) can be used. The group II intron RT (commercially available as “MarathonRT”) from Eubacterium rectale (E.r.) has been shown to display superior intrinsic RT processivity compared to Superscript IV. As shown herein, substitution of the M-MLV RT in a PE with MarathonRT or other RTs resulted in efficient prime editing in the HEK293T cell line. Additonal Exemplary alternative RTs include those listed in Table F, below.
Table F: Alternative reverse transcriptases
*Geobacillus stearothermophilus GsI-IIC intron RT (denoted GsI-IIC RT; sold commercially as TGIRT-III; InGex); see Stamos et al., Mol Cell. 2017 Dec 7;68(5):926-939.e4.
Exemplary RT sequences include:
Eubacterium rectale RT (aka Marathon-RT; WT) SEQ ID NO: xx
MDTSNLMEQILSSDNLNRAYLQVVRNI<GAEGVDGMI<YTELI<EHLAI< NGETIKGQLRTRKYKPQPARRVEIPKPDGGVRNLGVPTVTDRFIQQAI AQVLTPIYEEQFHDHSYGFRPNRCAQQAILTALNIMNDGNDWIVDIDL EKFFDTVNHDKLMTLIGRTIKDGDVISIVRKYLVSGIMIDDEYEDSIVG
TPQGGNLSPLLANIMLNELDKEMEKRGLNFVRYADDCIIMVGSEMSA NRVMRNISRFIEEKLGLKVNMTKSKVDRPSGLKYLGFGFYFDPRAHQF I<AI<PHAI<SVAI<FI<I<RMI<ELTCRSWGVSNSYI<VEI<LNQLIRGWINYF KIGSMKTLCKELDSRIRYRLRMCIWKQWKTPQNQEKNLVKLGIDRNT ARRVAYTGKRIAYVCNKGAVNVAISNKRLASFGLISMLDYYIEKCVTC
Human endogenous retrovirus K consensus (HERV-Kcon) RT SEQ ID NO: xx
MKSRKRRNRVSFLGAATVEPPKPIPLTWKTEKPVWVNQWPLPKQKLE ALHLLANEQLEKGHIEPSFSPWNSPVFVIQKKSGKWRMLTDLRAVNA VIQPMGPLQPGLPSPAMIPKDWPLIIIDLKDCFFTIPLAEQDCEKFAFTIP AINNKEPATRFQWKVLPQGMLNSPTICQTFVGRALQPVREKFSDCYIIH YIDDILCAAETKDKLIDCYTFLQAEVANAGLAIASDKIQTSTPFHYLGM
QIENRKIKPQKIEIRKDTLKTLNDFQKLLGDINWIRPTLGIPTYAMSNLF SILRGDSDLNSKRMLTPEATKEIKLVEEKIQSAQINRIDPLAPLQLLIFAT AHSPTGIIIQNTDLVEWSFLPHSTVKTFTLYLDQIATLIGQTRLRIIKLCG NDPDKIVVPLTKEQVRQAFINSGAWQIGLANFVGIIDNHYPKTKIFQFL KLTTWILPKITRREPLENALTVFTDGSSNGKAAYTGPKERVIKTPYQSA
QRAELVAVITVLQDFDQPINIISDSAYVVQATRDVETALIKYSMDDQL NQLFNLLQQTVRKRNFPFYITHIRAHTNLPGPLTKANEQADLLVSSALI KAQELHA
Geobacillus stearothermophilus GsI-IIC RT (WT) SEQ ID NO: xxx
MALLERILARDNLITALKRVEANQGAPGIDGVSTDQLRDYIRAHWSTI HAQLLAGTYRPAPVRRVEIPKPGGGTRQLGIPTVVDRLIQQAILQELTP IFDPDF S S S SFGFRPGRNAHD AVRQ AQGYIQEGYRYVVDMDLEKFFDR VNHDILMSRVARKVKDKRVLKLIRAYLQAGVMIEGVKVQTEEGTPQG GPLSPLLANILLDDLDKELEKRGLKFCRYADDCNIYVKSLRAGQRVKQ SIQRFLEKTLKLKVNEEKSAVDRPWKRAFLGFSFTPERKARIRLAPRSI QRLKQRIRQLTNPNWSISMPERIHRVNQYVMGWIGYFRLVETPSVLQT IEGWIRRRLRLCQWLQWI<RVRTRIRELRALGLI<ETAVMEIANTRI<GA WRTTKTPQLHQALGKTYWTAQGLKSLTQRYFELRQG
Geobacillus stearothermophilus GsI-IIC intron RT (GsI-IIC RT) pentamutants can also be used, e.g., comprising mutations DI 1R/N23R/G71R/G113K/P194R (positions bolded in SEQ ID NO:xxx, above.
Exemplary MMLV RT sequences include the following:
MMLV-RT pentamutant (used in classic PE2), without NLS, starts with T
(not M) SEQ ID NO: xx
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAP LIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTP LLPVI<I<PGTNDYRPVQDLREVNI<RVEDIHPTVPNPYNLLSGLPPSHQW YTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFK NSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTR ALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKET VMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFN WGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLT QKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTM
GQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPV VALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYT DGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQAL KMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEIL ALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPD TSTLLIENSSP
The present compositions and methods can make use of variants as known in the art and as provided herein, e.g., MarathonRT, GsI-IIC RT, and MMLV-RT variants, e.g., PE2 MMLV RT (with D200N, T306K, W313F, T330P, L603W mutations), or MMLV or PE2 MMLV RT truncations (truncations 2, 5, and 6;
Griinewald et al. Nat Biotechnol. 2023 Mar;41(3):337-343), as well as RT HFV,
HERV, LtrA, HERV-Kcon, Tel4c, Marathon, GsI-IIC, Ma-Int5, engineered Marathon
(optionally with DI 4R, N26R, D74R, N116K, or N197R mutations), etc.
Recombinase
Additional effectors (such as serine recombinases) can be included in the present proteins and compositions. For example, The PCEs, LCEs, or dual-overhang ligation approaches can be used to install a recombinase attachment site (att) at a desired position in the genome. Serine recombinases, either fused to a PCE/LCE or expressed in trans, integrate a DNA donor containing a corresponding recombinase attachment site and a cargo of interest at the targeted location. Serine recombinases can include Bxbl, PaOl, BceINT, etc (including those discovered from metagenomic mining efforts as described in Ref25). clkDNA Templates
The clkDNA templates used in the present compositions and methods include (i) a localization moiety, (ii) a polymerization template (PT), and (iii) a flap binding region (FBR). In some embodiments, the clkDNA templates are in the order (i)-(ii)- (iii) from 5’ to 3’, but other configurations are possible (e.g. (ii)-(iii)-(i), e.g., wherein the clkDNA has a 3’ moiety (e.g., chloroalkane, etc combined with a SNAP tag) rather than an HUH).
The localization moiety is a sequence or modification that binds or links to the tethering domain on the DBDn, e.g., an HUH endonuclease recognition site (when the CE includes an HUH), biotin (when the CE includes avidin), label with O6- benzylguanine derivatives (when the CE includes SNAP), label with O1- benzylcytosine derivatives (when the CE includes CLIP -tag), and labeled with a chloroalkane (when the CE includes HALO-tag). RNA or DNA hairpins can also be used to localize effectors (when the CE includes an RNA or DNA binding protein, such as a phage coat protein like MCP, PCP, BoxB, or Com).
The polymerization template (PT) for use with PCEs includes a portion that encodes homology to the target genome, e.g., at least 3, 4, 5, 6, 7, 8, 9, or 10 nt long, and optionally up to 50, 100, 200, 250, or 500 nt long, and a portion that includes the edit that is at least 1 nt long.
The flap binding region is complementary or partly complementary to the genomic flap released by the nickase (and thus in some embodiments the FBR is complementary or partly complementary to part of the gRNA protospacer sequence). In general, the length of the genomic flap is the distance between the DNA nick on the NTS and equivalent NTS position that is analogous to the end of the TS/gRNA spacer,
which will often be about 15-20, e.g., 17, nt but it can be target specific. In some embodiments, the flap can be shorter (e.g., in the case of truncated gRNAs (Fu et al., Nat Biotechnol. 2014 Mar; 32(3): 279-284) if the gRNA spacer region is shorter. In some embodiments, the flap can be longer, though such arrangements may be thermodynamically less favorable, if the TS/NTS is unpaired outside of the gRNA spacer/TS region.
In some methods, more than two clkDNAs and gRNAs can be used when doing multiplex labeling of multiple fragments (either single or dual end). The labeling can be done all in a single reaction, with different ACOs/adapters depending on the fragment being labeled. Each ACO will have a corresponding adapter sequence.
EXEMPLARY SEQUENCES AND CONSTRUCTS
In some embodiments, the sequence of a protein or nucleic acid used in a composition or method described herein is at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to a reference sequence set forth herein. To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In a preferred embodiment, the length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid “identity” is equivalent to amino acid or nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, the percent identity between two amino acid sequences can be determined using the
Needleman and Wunsch ((1970) J. Mol. Biol. 48:444-453) algorithm that has been incorporated into the GAP program in the GCG software package (available on the world wide web at gcg.com), using the default parameters, e.g., a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
Amino acid sequence of an exemplary protein expression construct for
Sumo-PCV2-gsXTENgs-SpCas9-NLS-6xHis
MSDSEVNQEAKPEVKPEVKPETHINLKVSDGSSEIFFKIKKTTPLRRLM EAFAKRQGKEMDSLRFLYDGIRIQADQTPEDLDMEDNDIIEAHREQIGGSPSK KNGRSGPQPHKRWVFTLNNPSEDERKKIRDLPISLFDYFIVGEEGNEEGRTPH LQGFANFVKKQTFNKVKWYLGARCHIEKAKGTDQQNKEYCSKEGNLLMEC GAPRSQGQRSGGSSGSETPGTSESATPESSGGSDKKYSIGLDIGTNSVGWAVIT DEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTR RKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVA YHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSD VDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALV RQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVK LNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFR IPYYVGPLARGNSRFAWMTRKSEETITPWNFEEWDKGASAQSFIERMTNFD KNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDK DFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYT GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQK AQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLY YLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGK SDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFI KRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDF QFYI<VREINNYHHAHDAYLNAVVGTALII<I<YPI<LESEFVYGDYI<VYDVRI< MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVW DKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILAD ANLDI<VLSAYNI<HRDI<PIREQAENIIHLFTLTNLGAPAAFI<YFDTTIDRI<RYT STKEVLDATLIHQSITGLYETRIDLSQLGGDGGGSGTRLPKKKRKVGGGSHHH HHH
Amino acid sequence of an exemplary tethering domain fused to a DNA- targeting nuclease for CAGE using a PCV2-gsXTENgs-SpCas9 fusion protein
MSPSKKNGRSGPQPHKRWVFTLNNPSEDERKKIRDLPISLFDYFIVGEE GNEEGRTPHLQGFANFVI<I<QTFNI<VI<WYLGARCHIEI<AI<GTDQQNI<EYCS
KEGNLLMECGAPRSQGQRSGGSSGSETPGTSESATPESSGGSDKKYSIGLDIGT NSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLK RTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI FGNIVDEVAYHEI<YPTIYHLRI<I<LVDSTDI<ADLRLIYLALAHMII<FRGHFLIE GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQ DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKM DGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQ SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLS GEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTY HDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVM KQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDD SLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQ
LQNEI<LYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLI<DDSIDNI<VL TRSDKNRGKSDNVPSEEWKKMKNYWRQLLNAKLITQRKFDNLTKAERGGL SELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSK LVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGD YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKG NELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS I<RVILADANLDI<VLSAYNI<HRDI<PIREQAENIIHLFTLTNLGAPAAFI<YFDTTI DRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
Amino acid sequence of an exemplary DNA-dependent polymerase for
CAGE or CAPTURE using E. coli Klenow fragment
VISYDNYVTILDEETLKAWIAKLEKAPVFAFATATDSLDNISANLVGLS FAIEPGVAAYIPVAHDYLDAPDQISRERALELLKPLLEDEKALKVGQNLKYDR GILANYGIELRGIAFDTMLESYILNSVAGRHDMDSLAERWLKHKTITFEEIAG KGKNQLTFNQIALEEAGRYAAEDADVTLQLHLKMWPDLQKHKGPLNVFENI EMPLVPVLSRIERNGVI<IDPI<VLHNHSEELTLRLAELEI<I<AHEIAGEEFNLSST KQLQTILFEKQGIKPLKKTPGGAPSTSEEVLEELALDYPLPKVILEYRGLAKLK STYTDKLPLMINPKTGRVHTSYHQAVTATGRLSSTDPNLQNIPVRNEEGRRIR QAFIAPEDYVIVSADYSQIELRIMAHLSRDKGLLTAFAEGKDIHRATAAEVFG LPLETVTSEQRRSAKAINFGLIYGMSAFGLARQLNIPRKEAQKYMDLYFERYP GVLEYMERTRAQAKEQGYVETLDGRRLYLPDIKSSNGARRAAAERAAINAP MQGTAADIIKRAMIAVDAWLQAEQPRVRMIMQVHDELVFEVHKDDVDAVA KQIHQLMENCTRLDVPLLVEVGSGENWDQAH
Also available commercially as DNA Polymerase I, Large (Klenow) Fragment from New England Biolabs (Catalog #M0210)
Amino acid sequence of Streptococcus pyogenes Cas9 (SpCas9) as an exemplary nuclease protein for CAGE or PE-CAGE
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLI GALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRL EESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVD AKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITK APLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGG ASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELH AILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETIT PWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSV EISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIE ERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKS DGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL QTVI<VVDELVI<VMGRHI<PENIVIEMARENQTTQI<GQI<NSRERMI<RIEEGH<E LGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIV PQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLIT QRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE NDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTAL IKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT GGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKS KKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELEN GRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFV EQHI<HYLDEIIEQISEFSI<RVILADANLDI<VLSAYNI<HRDI<PIREQAENIIHLFT LTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
Amino acid sequence of an exemplary RNA-dependent polymerase for
PE-CAGE or PE-CAPTURE using an engineered M-MLV RT (bearing D200N,
L603W, T306K, W313F, T330P mutations)
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAP LIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVK KPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDA FFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDL ADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKA QICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFC RLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLT KPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMV AAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALL LDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDA DHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQ ALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILAL LKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIE NSSPSGGSKRTADGSEFE;
Also available commercially as SuperScript III Reverse Transcriptase from
ThermoFisher Scientific (Catalog # 18080044)
Amino acid sequence of an exemplary tethering domain fused to an RNA- targeting nuclease for CAPTURE using a PCV2-gsXTENgs-Casl3 fusion protein, using Listeria seeligeri serovar l/2b Casl3 (LseCasl3)
MSPSKKNGRSGPQPHKRWVFTLNNPSEDERKKIRDLPISLFDYFIVGEE GNEEGRTPHLQGFANFVI<I<QTFNI<VI<WYLGARCHIEI<AI<GTDQQNI<EYCS KEGNLLMECGAPRSQGQRSGGSSGSETPGTSESATPESSGGSMWISIKTLIHHL GVLFFCDYMYNRREKKIIEVKTMRITKVEVDRKKVLISRDKNGGKLVYENEM QDNTEQIMHHKKSSFYKSVVNKTICRPEQKQMKKLVHGLLQENSQEKIKVSD VTKLNISNFLNHRFKKSLYYFPENSPDKSEEYRIEINLSQLLEDSLKKQQGTFIC WESFSKDMELYINWAENYISSKTKLIKKSIRNNRIQSTESRSGQLMDRYMKDI LNKNKPFDIQSVSEKYQLEKLTSALKATFKEAKKNDKEINYKLKSTLQNHER QIIEELKENSELNQFNIEIRKHLETYFPIKKTNRKVGDIRNLEIGEIQKIVNHRLK NKIVQRILQEGKLASYEIESTVNSNSLQKIKIEEAFALKFINACLFASNNLRNM VYPVCKKDILMIGEFKNSFKEIKHKKFIRQWSQFFSQEITVDDIELASWGLRGA IAPIRNEIIHLKKHSWKKFFNNPTFKVKKSKIINGKTKDVTSEFLYKETLFKDY FYSELDSVPELIINKMESSKILDYYSSDQLNQVFTIPNFELSLLTSAVPFAPSFKR VYLKGFDYQNQDEAQPDYNLKLNIYNEKAFNSEAFQAQYSLFKMVYYQVFL PQFTTNNDLFKSSVDFILTLNKERKGYAKAFQDIRKMNKDEKPSEYMSYIQSQ LMLYQKKQEEKEKINHFEKFINQVFIKGFNSFIEKNRLTYICHPTKNTVPENDN IEIPFHTDMDDSNIAFWLMCKLLDAKQLSELRNEMIKFSCSLQSTEEISTFTKA REVIGLALLNGEKGCNDWKELFDDKEAWKKNMSLYVSEELLQSLPYTQEDG QTP VINRSIDL VKK YGTETILEKLF S S SDD YK VS AKDIAKLHEYD VTEKIAQQE SLHKQWIEKPGLARDSAWTKKYQNVINDISNYQWAKTKVELTQVRHLHQLT IDLLSRLAGYMSIADRDFQFSSNYILERENSEYRVTSWILLSENKNKNKYNDY ELYNLKNASIKVSSKNDPQLKVDLKQLRLTLEYLELFDNRLKEKRNNISHFNY LNGQLGNSILELFDDARDVLSYDRKLKNAVSKSLKEILSSHGMEVTFKPLYQT NHHLKIDKLQPKKIHHLGEKSTVSSNQVSNEYCQLVRTLLTMK
Amino acid sequence of an exemplary polymerase click editor (PCE) comprising PCV2-gsXTENgs-nSpCas9(H840A)-gsNLSgs-EcKlenow
MSPSKKNGRSGPQPHKRWVFTLNNPSEDERKKIRDLPISLFDYFIVGEE GNEEGRTPHLQGFANFVI<I<QTFNI<VI<WYLGARCHIEI<AI<GTDQQNI<EYCS KEGNLLMECGAPRSQGQRSGGSSGSETPGTSESATPESSGGSDKKYSIGLDIGT NSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLK RTARRRYTRRI<NRICYLQEIFSNEMAI<VDDSFFHRLEESFLVEEDI<I<HERHPI FGNIVDEVAYHEI<YPTIYHLRI<I<LVDSTDI<ADLRLIYLALAHMII<FRGHFLIE GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQ DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKM DGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQ
SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLS GEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTY HDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVM
KQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDD SLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQ LQNEI<LYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLI<DDSIDNI<VL TRSDKNRGKSDNVPSEEWKKMKNYWRQLLNAKLITQRKFDNLTKAERGGL SELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSK LVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGD YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKG NELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS KRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTI
DRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSKRTADGSEF ESPKKKRKVSGGSSGGSVISYDNYVTILDEETLKAWIAKLEKAPVFAFATATD SLDNISANLVGLSFAIEPGVAAYIPVAHDYLDAPDQISRERALELLKPLLEDEK ALKVGQNLKYDRGILANYGIELRGIAFDTMLESYILNSVAGRHDMDSLAERW LKHKTITFEEIAGKGKNQLTFNQIALEEAGRYAAEDADVTLQLHLKMWPDLQ I<HI<GPLNVFENIEMPLVPVLSRIERNGVI<IDPI<VLHNHSEELTLRLAELEI<I<A HEIAGEEFNLSSTKQLQTILFEKQGIKPLKKTPGGAPSTSEEVLEELALDYPLPK VILEYRGLAKLKSTYTDKLPLMINPKTGRVHTSYHQAVTATGRLSSTDPNLQ NIPVRNEEGRRIRQAFIAPEDYVIVSADYSQIELRIMAHLSRDKGLLTAFAEGK DIHRATAAEVFGLPLETVTSEQRRSAKAINFGLIYGMSAFGLARQLNIPRKEA QKYMDLYFERYPGVLEYMERTRAQAKEQGYVETLDGRRLYLPDIKSSNGAR RAAAERAAINAPMQGTAADIIKRAMIAVDAWLQAEQPRVRMIMQVHDELVF EVHKDDVDAVAKQIHQLMENCTRLDVPLLVEVGSGENWDQAH
EXAMPLES
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
Methods
The following materials and methods were used in the Examples below.
Plasmids and oligonucleotides
The protein expression plasmid for PCV2-SpCas9 was modified via isothermal assembly from Addgene plasmid #123643 to exchange the linker sequence between PCV2 and SpCas9 for a 24 amino acid gsXTENgs linker comprising the following sequence: SGGSSGSETPGTSESATPESSGGS (SEQ ID NO:X). clkDNAs, synthetic sgRNAs (Synthego), adaptor sequences, primers, and probes are listed in Table 1
PCV2-Cas9 purification
PCV2-Cas9 was overexpressed in BL21(DE3) cells by IPTG induction. Protein was purified in an adapted protocol from Anders and Jinek8. Cells were pelleted and lysed in 50 mM HEPES, 200 mM NaCl, 20% (w/v) sucrose, 15 mM imidazole, pH 7.4, and eluted from a 5 ml EconoFit Ni-charged IMAC column (BioRad) using a gradient from 15-500 mM imidazole followed by overnight incubation with SUMO protease Ulpl at 4 °C, while dialyzing into 50 mM HEPES, 150 mM KC1, 5% glycerol, 1 mM DTT, 1 mM EDTA, pH 7.5. Cation exchange was performed using a 5 ml HiTrap SP HP column (GE Healthcare), eluting using a gradient from 100 mM to 1 M KC1. Protein was concentrated in a 100 kDa MWCO spin concentrator (Amicon), frozen in liquid nitrogen and stored at -80 °C. For experiments, a 60 pM of PCV2-Cas9 in 50% glycerol was used and stored at -20 °C.
CAGE reaction with target PCR substrate
The sequence of the target DNA molecule (PCR substrate), which contains a chosen sgRNA sequence, is provided below. The 5-step, single-pot protocol proceeded as follows: (1) RNPs were formed using 100 pmol (1.1 pM final) sgRNA, IX Cutsmart Buffer (New England Biolabs; NEB), and 48 pmol PCV2-Cas9 protein (0.53 pM final) and were incubated at room temperature for 20 min. 100 pmol of clkDNA encoding an ACO complementary sequence (cACO) was then added, and the mixture was incubated an additional 10 min at room temperature. (2) The CAGE reaction mix was assembled using 5 pmol of PCV2-Cas9 RNPs, 10 U of E. coli DNA Polymerase I Large Fragment (exo-) (NEB), 0.25mM dNTPs, ImM dATP, and IX CutSmart. 100 fmol of PCR target substrate (~70 ng) was then added to the reaction mixture. (3) The CAGE reaction was then incubated for 30-120 minutes (min) at 37 °C (click editing reaction). (4) Next, the reaction was incubated at 72 °C for 5 min (to inactivate the PCV2-Cas9 complex and dissociate Cas9 from bound DNA). (5) Finally, 400 U of T4 DNA ligase (New England Biolabs) and 500 fmol of the mock adaptor were added, and the mixture was incubated at room temperature for 20 min (Fig. 3A). A separate four step conception (with 2-steps of user intervention) of this protocol that lacks step #4, while adding the adaptor molecule at step #3 in the protocol rather than step #5, was also carried out (Fig. 3B). Reactions were bead purified with 0.8x beads, prepared as previously described9,10.
Ligation efficiency was analyzed by several methods, including junction PCR (containing ~ 15 ng of template, 0.5 pM primers (Table 1), 0.4 U Q5 High-Fidelity DNA Polymerase (NEB), 0.5 pM dNTPs, that together were thermocycled at: 1 cycle, 98 °C for 3 min; 35 cycles, 98 °C for 10 sec, 66 °C for 15 sec, 72 °C for 20 sec; and hold at 4 °C), Sanger sequencing (sending approximately 100 ng PCR for sequencing), and ddPCR reactions (described below).
CAGE reaction with genomic DNA
For CAGE reactions targeting genomic DNA (gDNA), a similar protocol to that described above for PCR substrates was performed, with the following exceptions: 1 pg of human gDNA extracted from HEK 293T cells (extracted as previously described11) was added to the CAGE reaction at step #2 instead of 70 ng of target DNA substrate, and a new set of gRNA and clkDNA specific for HEK site 3 was utilized. Click editing incubation time (step #3) was 120 min.
PE-CAGE reaction with genomic DNA
PE-CAGE reactions were performed using in vitro transcribed pegRNAs (transcribed using T7 RiboMax Express Large Scale RNA Production System; Promega) generated from a PCR templates that included a T7 promoter, an appropriate gRNA spacer and scaffold, and a 3’ extension including the primer binding site (PBS) and reverse transcriptase template (RTT) encoding the ACO to be installed at HEK site 3. In vitro cleavage and prime editiing reactions were performed similar to as previously described12, using 1 pg of HEK 293T genomic DNA, nuclease active SpCas9-HiFi (IDT), SuperScript III RT (ThermoFisher), the in vitro transcribed pegRNA, and the adaptor molecule. Reactions were incubated for 1 hr at 37 °C. ddPCR analysis of adaptor ligation efficiency
Ligation efficiency was assessed via ddPCR reactions containing 100-200 ng of human genomic DNA (for gDNA targeting experiments) or 10 pg of PCR target (for PCR-substrate targeting experiments), ddPCR Supermix for Probes (Bio-Rad), Hindlll-HF (0.25 U pl-1, New England Biolabs), RPP30 (for gDNA experiments) or PCR-substrate (for PCR-substrate experiments) control primers and probes (Table 1; 900 nM each primer, 250 nM probe), and target/adaptor specific primers and probes (Table 1 ; 900 nM each primer, 250 nM probe), according to the manufacturer’s protocol. Droplets were generated using a QX200 Automated Droplet Generator
(BioRad). Thermal cycling conditions were: 1 cycle, 95 °C for 10 min; 40 cycles, 94 °C for 30 sec, 58 °C for 1 min; 1 cycle, 98 °C for 10 min; and hold at 4 °C. PCR products were analyzed using a QX200 Droplet Reader (BioRad) and the number of “adaptor-ligated” target copies and “total” copies (defined by RPP30 copies) was calculated using QuantaSoft (v.1.7.4). Adaptor ligation efficiency was defined as the ratio of adaptor-ligated target copies to total copies.
* denotes phosphorothioate linkages , PCV2 site ( CTGTAAGTATTACCAGC ) is underlined
Target PCR fragment sequence:
CATGGGACTTCAGCATGGCGGTGTTTGCAGATTACGCGAGCGGGTTCTGA CCTGAAGGCTCTGCGCGGATGAGTAAACTTGGTCTGACAGTTACCAATGC TTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATA GTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACC ATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTC CAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAG TGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGA AGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCAT TGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAG CTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCA AAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTG GCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACT GTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAG TCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCA ATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCAT TGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGA GATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTT TTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCC GCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTT CCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGG ATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCA CATTTCCCCGAAAAGTGCCACCTGACGTCGACGGATCGGACGTTCGCGTT TGCCGTGCGTGTAATGTAGTACTAGTATAAACGAACAGGATCAGGCCAAA TTCCTGAATA
Example 1. Click-Assisted Genome Enrichment (CAGE)
For our initial method we envisioned a single-pot workflow that consisted of five main steps with only three areas of user intervention: (1) RNP formation via PCV2-Cas9 protein incubation with the appropriate sgRNA and clkDNA, (2) Click editing reaction by mixing the PCE RNP with the target DNA substrate, EcKlenow, and reaction buffer (containing dNTPs, CutSmart buffer (NEB), and ATP), and (3) incubation of the mixture for 30-120 minutes (min) at 37 °C (click editing reaction), (4) incubation at 72 °C for 5 min (to inactivate the PCV2-Cas9 complex and dissociate Cas9 from bound DNA), (5) addition of T4 DNA ligase and the mock adaptor and then incubating at room temperature for 20 min, and (6) purification of the product prior to sequencing (Fig. 3A). A separate five step conception of this
protocol that lacks step #4 (adding the adaptor molecule at step #2 in the protocol rather than step #5; Fig. 3B).
In experiments to test the initial ACO-writing step of a 6-step CAGE reaction on a single end of a substrate, we sought to ligate a mock adaptor onto a specific site of a DNA substrate in vitro (Fig. 4A). To do so, we purified an HUH-Cas9 fusion protein (comprised of the PCV2 HUH N-terminally fused to nuclease Streptococcus pyogenes Cas9 (SpCas9) via an 24aa gsXTENgs linker), purchased E. coli DNA Polymerase I Klenow Fragment (exo-) (EcKlenow) (New England Biolabs), designed a gRNA to target a specific site on a PCR substrate, and designed and ordered a clkDNA containing an appropriate FBR and a 20 nt PT (Fig. 4A). Successful polymerization by the PCE using the clkDNA in vitro should lead to the writing of a specific 20 nt 3’ overhang onto the NTS of the PCR-based DNA substrate at the SpCas9 cleavage site encoded on the target substrate (Fig. 4A). Lastly, we designed and constructed a mock adaptor that encodes a 20 nt sequence complementary to the ACO sequence on the target substrate.
To determine if the PCE-mediated ACO writing and adaptor ligation steps were successful, we conducted a PCR reaction using primers that should amplify across the overhang/ligati on junction (where the forward primer is on the PCR substrate and the reverse primer is on the adaptor; Fig. 4B). Reactions containing the correct components led to a PCR product, indicating successful PCE-mediated ACO writing and adaptor ligation (Fig. 4B). Two separate control reactions did not result in PCR amplification, when using either a clkDNA with an incorrect FBR but correct PT (thus the clkDNA is not expected to anneal to the NTS), or a clkDNA encoding a correct FBR but incorrect PT (thus would be incompetent for adaptor ligation due to unmatched ACO) (Fig. 4B). We also found that a protocol lacking step #4 (incubation at 72 °C for 5 min; Fig. 3B) was also feasible, though at observably lower efficiency presumably due to target-bound Cas9 inhibiting adaptor ligation (Fig. 4B). Sanger sequencing of the adaptor-ligated PCR product demonstrated the expected product containing a correctly click-edited ACO installed at the precise location in the PCR substrate (due to Cas9 cleavage) followed by ligation of the adaptor sequence (Fig. 4C).
To determine the efficiency of ACO writing and adaptor ligation, we performed ddPCR on the CAGE reaction conducted using the 6-step protocol. We
observed approximately 25.9% efficiency (Fig. 4D), which equates to approximately 20-million adaptor-ligated fragments in a 40 pL CAGE reaction. Increasing the click editing time from 30 min to 2 hours (during step #3 of the 6-step protocol) further increased efficiency to 31.8% (Fig. 4D). Like we observed in our junction PCR experiment (Fig. 4B), eliminating step #4 for a 5-step protocol (Fig. 3B) decreased adaptor ligation efficiency to 9.8% as determined by ddPCR (Fig. 4D).
Next, we wondered whether we could perform a CAGE reaction using genomic DNA instead of a PCR-generated substrate. To do so, we designed a PCE RNP and clkDNA to add an ACO to HEK site 3 using genomic DNA from HEK 293T cells (the same ACO as above from our PCR-based tests, permitting us to use the same mock adaptor for ligation; Fig. 4A). We followed the same protocol as our previous 6-step CAGE reaction with a 2 hour click editing incubation step using the same mock adaptor, but using 1 mg of human genomic DNA as the substrate (Fig. 4A). Junction PCR demonstrated successful adaptor ligation at the target site (Fig. 4E), sanger sequencing of the PCR product showed the expected sequence (HEK3 gDNA sequence - click edited overhang sequence - adaptor sequence; Fig. 4F), and ddPCR revealed approximately 26.6% ACO writing and adaptor ligation (Fig. 4G). These results reveal that CAGE is extensible to genomic DNA substrates.
Together, these results demonstrate that CAGE directs efficient installation of 3’ overhangs (ACOs) at user-defined target sequences in vitro, which can serve as specific and effective handles for ligation of adaptors containing a complementary 3’ overhang.
Example 2. CAGE reactions with less steps of user intervention
Minimizing user intervention would create a more streamlined workflow (Fig. 5). To do so, we imagined that thermostable ligases could enable the elimination of steps involving the manual addition of ligase and adaptor after Cas9 heat inactivation. Instead, all reagents (PCV2-Cas9 RNPs, EcKlenow, Ligase, adaptor, and buffer) could be mixed in a single tube and run on a thermocycler with an appropriate protocol (e.g. click editing at 37 °C for 30-120 min, followed by an elevated temperature at ~65 °C for 30-60 min). The incubation step at 65 °C should simultaneously inactivate/dissociate Cas9 while activating the ligase to complete the CAGE reaction.
Example 3. Other compositions of CAGE using prime editors instead of click editors
We also wondered whether CAGE could be performed using prime editors (PEs) instead of PCEs (in PE-CAGE reactions). PEs typically consist of a Cas9 nickase fused to or co-supplied with a RNA-dependent polymerase (e.g. reverse transcriptase (RT)) and an extended prime editing gRNA (pegRNA) that encodes a primer binding site (PBS) that anneals to the nicked NTS and a reverse transcription template (RTT) encoding the edit of interest12 (Fig. 6A). For PE-CAGE, we envisioned using a Cas9 nuclease (instead of H840A nickase in conventional PEs) with a fused or unfused RT domain (Fig. 6A). During a prime editing reaction, the PE-extended 3’ flap from the NTS (using the RTT of the pegRNA as a template) would create an ACO, similar to how the PCE creates a 3’ ACO from the clkDNA. A PE-CAGE approach should be compatible with both modes of adaptor ligation described above for PCE-CAGE (Figs. 6B, 6C, 10A, and 10B). We note a few potential complications of PE-CAGE. RT -mediated template extension past the RTT region of the pegRNA could lead to writing of nucleotides of the gRNA scaffold into the ACO (which could inhibit adaptor ligation). Furthermore, this method is potentially subject to increased cost if using synthetic pegRNAs (due to increased length and thus cost of the pegRNA compared to a conventional gRNA). However, PE-CAGE may hold advantages for applications that require extensive and simultaneous multiplexed enrichment within a given sample since the ACO template is inherently coupled to the gRNA in the pegRNA (compared to the challenge of complexing gRNA-clkDNA pairs together separately in arrayed reactions as would be needed for PCE-based approaches. PE RNPs can be formed with many pegRNAs in a single tube, whereas PCE RNPs must be complexed with clkDNAs separately to ensure correct linkage between target and ACO-writing template. We also note that PE-CAGE can be performed using an RNA-dependent RNA polymerase (RdRP), whereby an RNA ACO instead of a DNA ACO would be installed. A suitable ligase that ligates a 5’ DNA to a 3’ RNA on a DNA splint can be used (e.g., T4 RNA Ligase), leading to adaptor-ligated target products.
We performed a PE-CAGE reaction using nuclease active SpCas9-HiFi (IDT), SuperScript III RT provided in trans (ThermoFisher), an in vitro transcribed pegRNA (where the RTT portion encodes the template sequence for the ACO), human genomic
DNA as a substrate and an adaptor complementary to the RTT of the pegRNA Fig. 7 A). We performed the PE-CAGE reaction and to determine if the PE-mediated ACO writing was successful, PCR amplified the resulting product (using primers that should amplify across the ACO junction). Sanger sequencing revealed the anticipated product (Fig. 7B), confirming that Prime editing can install 3’ overhangs on target DNA molecules. We also performed ddPCR to quantify PE-mediated ACO extension, revealing approximately 20% ACO installation efficiency (Fig. 7C). These results demonstrate that PE-CAGE can also be used in protocols for selective labeling and enrichment of target nucleic acids.
Together, these results demonstrate that prime editing can also be used to efficiently install 3’ overhangs on target molecules in vitro, which could serve as sequence-specified handles for adaptor ligation. Identically to CAGE, PE-CAGE should enable at or near 100% enrichment, and improvements in efficiency (e.g. altering the sequence composition and length of the pegRNA RTT) could facilitate PE-CAGE enrichment from low-input samples. PE-CAGE has potential advantages over CAGE for applications requiring highly multiplexed targeting within a given sample. For example, since the RTT and sgRNA sequences are physically linked and a pool of pegRNAs can be complexed with a prime editor in a single tube (as opposed to CAGE, where clkDNA-sgRNA pairs would need to be complexed individually before pooling RNPs), multiple orthogonal PE-CAGE ACO-writing and adapter ligation steps should be possible simultaneously in a single sample. However, with PE-CAGE, adaptor ligation efficiency may be reduced from the heterogeneity of ACO writing (due to RT-mediated extension past the RTT into the sgRNA scaffold, which would add unwanted bases to the ACO and thus preclude adaptor ligation). Furthermore, the price per reaction would be higher if using synthetic pegRNAs. However, extraneous ACO extension in PE-CAGE could be minimized by: (1) screening RNA hairpins that could be inserted into the pegRNA scafffold at the end of the RTT, which might terminate reverse transcription, (2) using a pool of adaptors containing overhangs which account for scaffold read-through, (3) by using synthetic modified pegRNAs that inhibit extension past the RTT (e.g. via use of a chemical linker or abasic site between the RTT and sgRNA scaffold, etc.), or (4) by using split pegRNAs where the RTT/PBS molecule is separated from the gRNA.
We anticipate that the enzymes or enzyme fusions used in PE-CAGE reactions may also comprise a variety of architectures, including a nuclease (N) for cleaving the substrate and an RNA-dependent polymerase (P) (e.g. a reverse-transcriptase; RT) for extending the template, separated by optional linkers (L). Although herein we utilized a separated nuclease and polymerase architecture (N+P), where the nuclease/DNA binding domain was SpCas9 and the polymerase was MMLV-RT, we envision alternate architectures beyond N+P that include but are not limited to: N-L-P or P-L- N (where N is the nuclease, P is the polymerase, and L is a linker; each of varying compositions). The nuclease domain (N) may be SpCas9 or other nucleases; the RNA-dependent polymerase domain (P) may be MMLV reverse-transcriptase or other polymerases; and the linkers (L) may be of various lengths and amino acid compositions.
Example 4. Click-Assisted Precise Targeting of Unaltered RNA for Enrichment (CAPTURE)
We also envisioned that the process of targeted overhang installation could be applied to RNA substrates to enrich for user-specifiable RNA molecules or transcripts from a pool without amplification or cDNA conversion (Fig. 8). To do so, we hypothesized that substituting PCV2-Cas9 for PCV2-Casl3 (ideally, a high fidelity Cast 3 variant that minimizes or eliminates unwanted collateral cleavage of nearby non-target RNA molecules13) would enable precise RNA transcript targeting and enrichment as defined by the Cast 3 gRNA (Fig. 8A). Cleavage of the target RNA molecule at the user-specified gRNA target site would create an RNA primer that though annealing with the FBR of the clkDNA would create an RNA:DNA substrate for extension by a DNA-dependent polymerase (e.g. EcKlenow) (Figs. 8B, 8C, IOC, and 10D). Dissociation of the PCV2-Casl3 RNP from the target RNA would leave a hybrid target RNA:DNA molecule tagged on the 3’ end with a DNA ACO whose sequence is defined by the PT of the clkDNA. As with CAGE, this DNA ACO could then be used as a sequence for annealing and ligating specific adaptor. The adaptor ligated and samples can then be further processed or directly loaded onto a sequencing platform (e.g. a nanopore sequencer).
CAPTURE could also conceivably be performed with a prime editing like system (PE-CAPTURE), where Cast 3 nuclease can be fused to or co-supplied with a
reverse transcriptase (or RdRP) and the Cast 3 gRNA can be extended at the 3’ or 5’ end to include a PBS and an RTT containing the overhang sequence (Figs. 9A-9C, IOC, and 10D)
Example 5. CAGE using Cas9 nuclease and no ssDNA tethering domain.
To investigate whether a tethering domain (such as PCV2) was essential, we conducted CAGE reactions on a PCR substrate with varying amounts of clkDNA containing or lacking a PCV2 recognition sequence and with PCV2-Cas9 or Cas9 alone (with no tethering domain). Reactions (containing cleaved PCR products with written flaps or cleaved products without flaps) were dG-tailed and amplified using a poly-C reverse primer and a junction specific forward primer. While the combination of PCV2-Cas9 and a clkDNA containing the PCV2 recognition sequence was generally most efficient (measured by ddPCR), the inclusion of the PCV2 sequence alone was generally beneficial and CAGE could be conducted with Cas9 alone (no PCV2 endonuclease) (Figs. 11A-B). Similar trends were also observed on a gDNA substrate at the HEK3 locus.
On both PCR and genomic substrates, we also observed truncation products when using wild type E. coli Klenow fragment (Figs. 11A-B), which may be due to its relatively low processivity and/or difficulty reading through secondary template structure. We also observed polymerization of the PCV2 recognition sequence - likely from unbound clkDNA during RNP formation (Fig. 12).
References
1. Kozarewa, I., Armisen, J., Gardner, A. F., Slatko, B. E. & Hendrickson, C. L. Overview of Target Enrichment Strategies. CP Molecular Biology 112, (2015).
2. Karamitros, T. & Magiorkinis, G. Multiplexed Targeted Sequencing for Oxford Nanopore MinlON: A Detailed Library Preparation Procedure, in Next Generation Sequencing (eds. Head, S. R., Ordoukhanian, P. & Salomon, D. R.) vol. 1712 43-51 (Springer New York, 2018).
3. Leija-Salazar, M. et al. Evaluation of the detection of GBA missense mutations and other variants using the Oxford Nanopore MinlON. Mol Genet Genomic Med 7, e564 (2019).
4. Gabrieli, T. et al. Selective nanopore sequencing of human BRCA1 by Cas9- assisted targeting of chromosome segments (CATCH). Nucleic Acids Research 46, e87-e87 (2018).
5. Wallace, A. D. et al. CaBagE: A Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing. PLOS ONE 16, e0241253 (2021).
6. Iyer, S. V., Kramer, M., Goodwin, S. & McCombie, W. R. ACME: an Affinitybased Cas9 Mediated Enrichment method for targeted nanopore sequencing. 2022.02.03.478550 Preprint at https://doi.org/10.1101/2022.02.03.478550 (2022).
7. Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol 38, 433-438 (2020).
8. Anders, C. & Jinek, M. In Vitro Enzymology of Cas9. in Methods in Enzymology vol. 546 1-20 (Elsevier, 2014).
9. Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939-946 (2012).
10. Kleinstiver, B. P. et al. Engineered CRISPR-Cas 12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nat Biotechnol 37, 276-282 (2019).
11. Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P.
Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290-296 (2020).
12. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019).
13. Tong, H. el al. High-fidelity Casl3 variants for targeted RNA degradation with minimal collateral effect. 2021.12.18.473271 https://www.biorxiv.org/content/10. ! 101/2021.12.18.47327 Ivl (2021) doi: 10.1101/2021.12.18.473271.
OTHER EMBODIMENTS
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
Claims
WHAT IS CLAIMED IS:
1. A method comprising: preparing a reaction mixture comprising:
(i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain;
(ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence;
(iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of an adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn, and optionally wherein the reaction mixture further comprises:
(iv) a substrate nucleic acid comprising a region of interest.
2. A method for generating a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 1, comprising:
(i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain;
(ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and
(iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of an adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and
incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and contacting the sample with the reaction mixture of (A) and a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; and the polymerase extends the 3’ single stranded flaps using the cACO portion of the clkDNA as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence.
3. A method of generating a product nucleic acid of DNA having ends comprised of defined adaptor or overhang sequences, the method comprising: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs comprising an ACO sequence produced by the method of claim 2; optionally treating the sample to inactivate any active enzymes, optionally by heating the sample to above 60°C, optionally to about 72°C; contacting the modified nucleic acid of DNA with a partially double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA adapter with ssDNA cACO region) and a ligase, under conditions sufficient for the ligase to ligate the adapter to the modified nucleic acid, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
4. A method of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising: providing a modified nucleic acid of DNA comprising 3’ overhangs at comprising ACO sequence produced by the method of claim 2; contacting the modified nucleic acid of DNA with a ligase and a single stranded
DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
5. A method for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 1, comprising:
(i) a DNA binding domain nuclease (DBDn), optionally Cas9, optionally linked or fused to a clkDNA tethering domain;
(ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and
(iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of an adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a DNA-dependent DNA polymerase if the polymerase is absent from the reaction
mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
6. A method for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 1, comprising:
(i) a DNA binding domain nuclease (DBDn), optionally cas9, optionally linked or fused to a clkDNA tethering domain;
(ii) one, two, or more guide RNAs (gRNAs) that bind to the DBDn and comprise a spacer comprising a sequence complementary to a selected target sequence; and
(iii) one, two, or more clkDNA oligonucleotide templates comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of an adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), wherein (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and
contacting the sample with: the reaction mixture of (A), a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified fragment of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
7. A method comprising: preparing a reaction mixture comprising:
(i) a DNA binding domain nuclease (DBDn), optionally Cas9;
(ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a
sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn.
8. A method for generating a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, the method comprising:
(A) providing a reaction mixture prepared according to claim 7, comprising (i) a DNA binding domain nuclease (DBDn), optionally Cas9;
(ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and contacting the sample with the reaction mixture of (A) and an RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; and the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence.
9. A method of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising: providing a sample comprising a modified nucleic acid of DNA comprising 3’ overhangs at comprising ACO sequence produced by the method of claim 8; optionally treating the sample to inactivate any active enzymes, optionally by heating the sample to about 72°C; contacting the modified nucleic acid of DNA with a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
10. A method of generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising: providing a modified nucleic acid of DNA comprising 3’ overhangs at comprising ACO sequence produced by the method of claim 8; contacting the modified nucleic acid of DNA with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
11. A method for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 7, comprising
(i) a DNA binding domain nuclease (DBDn), optionally Cas9;
(ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid of DNA, thereby producing product nucleic acid of DNA having ends comprised of defined adaptor sequences.
2. A method for generating a product nucleic acid of DNA having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 7, comprising
(i) a DNA binding domain nuclease (DBDn), optionally Cas9;
(ii) one, two, or more pegRNAs that bind to the DBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of the ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the DBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the DBDn;
(B) providing a sample comprising a substrate nucleic acid of DNA, preferably genomic DNA isolated from a cell; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions wherein: the nuclease induces a pair of double-stranded breaks with 3’ single stranded flaps; the clkDNA oligonucleotide templates bind to the 3’ single stranded flaps; the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid of DNA comprising 3’ overhangs comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3 ’ overhangs and be ligated to the
modified nucleic acid of DNA, thereby producing a further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid of DNA having 5’ ends comprising click-compatible moieties with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid of DNA to attach the dsDNA click oligos to the ends of the further modified nucleic acid of DNA, thereby producing a product nucleic acid of DNA having ends comprised of defined adaptor sequences.
13. A method comprising: preparing a reaction mixture comprising:
(i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain;
(ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence;
(iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn.
14. A method for generating a modified nucleic acid of RNA comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 13, comprising:
(i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain;
(ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a
spacer comprising a sequence complementary to a selected target sequence;
(iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence, and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA, preferably RNA isolated from a cell or sample from an animal; and contacting the sample with the reaction mixture of (A) and a DNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the clkDNA oligonucleotide template binds to the RNA; and the polymerase extends the 3’ end of the RNA molecule using the ACO complementary region of the clkDNA as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence.
15. A method of generating a product nucleic acid of RNA having an end comprised of a defined adaptor sequence, the method comprising: providing a sample comprising a modified nucleic acid of RNA comprising a 3’ end comprising an ACO sequence produced by the method of claim 14; optionally treating the sample to inactivate any active enzymes, optionally by heating the sample to about 72°C; contacting the modified nucleic acid with a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid, thereby producing a product nucleic acid having an end comprised of a defined adaptor sequence.
16. A method of generating a product nucleic acid of RNA having ends comprised of defined adaptor sequences, the method comprising: providing a modified nucleic acid of RNA comprising a 3’ end comprising an ACO sequence produced by the method of claim 14; contacting the modified nucleic acid of RNA with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid of RNA, thereby producing a further modified nucleic acid of RNA having a 5’ end comprising a click-compatible moiety; and contacting the further modified nucleic acid RNA having a 5’ end comprising a click-compatible moiety with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moi eties on the further modified nucleic acid RNA to attach the dsDNA click oligos to the ends of the further modified nucleic acid of RNA, thereby producing an product nucleic acid of RNA having ends comprised of defined adaptor sequences.
17. A method for generating a product nucleic acid of RNA having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 13, comprising:
(i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain;
(ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence;
(iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the
RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid comprising a ds 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid, thereby producing an product nucleic acid having ends comprised of defined adaptor sequences.
18. A method for generating a product nucleic acid, optionally RNA/DNA, having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 13, comprising:
(i) a RNA binding domain nuclease (RBDn), optionally Casl3, optionally linked or fused to a clkDNA tethering domain;
(ii) at least one guide RNA (gRNA) that binds to the RBDn and comprises a spacer comprising a sequence complementary to a selected target sequence;
(iii) at least one clkDNA oligonucleotide template comprising a localization moiety that binds to the clkDNA tethering domain, a template encoding the complement of the adaptor-complementary overhang (ACO) sequence (cACO), and a flap binding region (FBR), and optionally (iv) a DNA-dependent DNA polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(iii) or (i)-(iv) are added to the mixture in any order, and
incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn and the clkDNA oligonucleotide templates to bind to the clkDNA tethering domain on the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid comprising an RNA with a region of dsDNA having a 5’ end comprising a click-compatible moiety; and contacting the region of dsDNA having a 5’ end comprising a click-compatible moiety with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid having ends comprised of defined adaptor sequences.
19. A method comprising: preparing a reaction mixture comprising:
(i) a RNA binding domain nuclease (RBDn), optionally Casl3;
(ii) at least one pegRNAthat binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT)
encoding the complement of an ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn.
20. A method for generating a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, the method comprising:
(A) providing a reaction mixture prepared according to claim 19, comprising (i) a RNA binding domain nuclease (RBDn), optionally Casl3;
(ii) at least one pegRNAthat binds to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of an ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA; and contacting the sample with the reaction mixture of (A) and an RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; and the polymerase extends the 3’ end using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence.
21. A method of generating a product nucleic acid having ends comprised of defined adaptor sequences, the method comprising: providing a sample comprising a modified nucleic acid comprising a 3’ end comprising ACO sequence produced by the method of claim 20;
optionally treating the sample to inactivate any active enzymes, optionally by heating the sample to about 72°C; contacting the modified nucleic acid with a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo) and a ligase, under conditions sufficient for the ligase to ligate the dsDNA oligo to the modified nucleic acid, thereby producing a product nucleic acid having ends comprised of defined adaptor sequences.
22. A method of generating a product nucleic acid having ends comprised of defined adaptor sequences, the method comprising: providing a modified nucleic acid comprising 3’ overhangs at comprising ACO sequence produced by the method of claim 20; contacting the modified nucleic acid with a ligase and a single stranded DNA oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence under conditions sufficient for the ssDNA click oligo to anneal to the 3’ overhangs and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid having 5’ ends comprising click-compatible moi eties; and contacting the further modified nucleic acid having 5’ ends comprising clickcompatible moieties with dsDNA click oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid having ends comprised of defined adaptor sequences.
23. A method for generating a product nucleic acid having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 19, comprising
(i) a RNA binding domain nuclease (RBDn), optionally Casl3;
(ii) one, two, or more pegRNAs that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of an ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and
optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a double stranded DNA oligonucleotide comprising defined adaptor sequences and a 3’ overhang comprising a sequence complementary to the ACO sequence (dsDNA cACO oligo), and a ligase, optionally a thermostable ligase, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ end using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase ligates the dsDNA oligo to the modified nucleic acid, thereby producing a product nucleic acid having an end comprised of a defined adaptor sequence.
24. A method for generating a product nucleic acid having ends comprised of defined adaptor sequences, the method comprising:
(A) providing a reaction mixture prepared by the method of claim 19, comprising
(i) a RNA binding domain nuclease (RBDn), optionally Casl3;
(ii) one, two, or more pegRNAs that bind to the RBDn and comprise a 3’ extension including a primer binding site (PBS) a reverse transcriptase template (RTT) encoding the complement of an ACO, a scaffold and spacer comprising a sequence complementary to a selected target sequence; and optionally (iii) an RNA-dependent polymerase, optionally wherein the polymerase is linked or fused to the RBDn in the reaction mixture; wherein (i)-(ii) or (i)-(iii) are added to the mixture in any order, and
incubating the reaction mixture under conditions for the gRNAs to bind to the RBDn;
(B) providing a sample comprising a substrate nucleic acid of RNA; and contacting the sample with: the reaction mixture of (A), a RNA-dependent DNA polymerase if the polymerase is absent from the reaction mixture, a single stranded DNA click oligonucleotide modified with a click-compatible moiety at the 5’ end (ssDNA click oligo) comprising a sequence complementary to the ACO sequence, and a thermostable ligase, incubating the sample under conditions wherein: the clkDNA oligonucleotide templates bind to the RNA; the polymerase extends the 3’ single stranded flaps using the ACO complementary portion of the clkDNA as a template, thereby producing a modified nucleic acid comprising a 3’ end comprising an adaptor-complementary overhang (ACO) sequence, and incubating the sample under conditions wherein the thermostable ligase is active, for the ssDNA click oligo to anneal to the 3’ end and be ligated to the modified nucleic acid, thereby producing a further modified nucleic acid having 5’ ends comprising click-compatible moieties; and contacting the further modified nucleic acid having 5’ ends comprising clickcompatible moieties with dsDNA oligos comprising defined adaptor sequences and a 3’ click compatible moiety that reacts with the click-compatible moieties on the further modified nucleic acid (dsDNA click oligo) to attach the dsDNA click oligos to the ends of the further modified nucleic acid, thereby producing a product nucleic acid having ends comprised of defined adaptor sequences.
25. A method for labeling DNA or RNA as described herein.
26. A method comprising: cleaving a nucleic acid using a sequence-specific nuclease (optionally via a nuclease-active PCE construct comprising a CRISPR-Cas nuclease directed by a gRNA);
writing adaptor-complementary overhang (ACO) sequences onto a 3’ end of a cleaved nucleic acid (optionally via a DNA-dependent DNA polymerase of the PCE using the clkDNA as a template); and ligating an oligonucleotide, optionally a sequencing adaptor, onto the 3’ ACO.
27. The method of claim 26, wherein the nucleic acid is DNA, and the nuclease is a Class II type II CRISPR, optionally CRISPR Cas9.
28. The method of claim 26, wherein the nucleic acid is RNA, and the nuclease is a class II type VI CRISPR, optionally CRISPR Cast 3.
29. The method of claims 26-28, further comprising sequencing the nucleic acid using the sequencing adaptor.
30. Any of the previous claims, wherein the DBDn is a Cas-family enzyme (optionally Cas9 or Casl2), a TnpB-family enzyme, or a IscB-family enzyme.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363458519P | 2023-04-11 | 2023-04-11 | |
| US63/458,519 | 2023-04-11 | ||
| US202363590283P | 2023-10-13 | 2023-10-13 | |
| US63/590,283 | 2023-10-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024215891A1 true WO2024215891A1 (en) | 2024-10-17 |
Family
ID=93060047
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/024078 Pending WO2024215891A1 (en) | 2023-04-11 | 2024-04-11 | Methods for nucleic acid labeling |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024215891A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200080111A1 (en) * | 2015-09-18 | 2020-03-12 | The Regents Of The University Of California | Methods for Autocatalytic Genome Editing and Neutralizing Autocatalytic Genome Editing and Compositions Thereof |
| US20200231952A1 (en) * | 2017-06-13 | 2020-07-23 | Regents Of The University Of Minnesota | Materials and methods for increasing gene editing frequency |
| US20210171985A1 (en) * | 2018-06-28 | 2021-06-10 | Crispr Therapeutics Ag | Compositions and methods for genomic editing by insertion of donor polynucleotides |
| US20230091690A1 (en) * | 2019-12-30 | 2023-03-23 | The Broad Institute, Inc. | Guided excision-transposition systems |
-
2024
- 2024-04-11 WO PCT/US2024/024078 patent/WO2024215891A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200080111A1 (en) * | 2015-09-18 | 2020-03-12 | The Regents Of The University Of California | Methods for Autocatalytic Genome Editing and Neutralizing Autocatalytic Genome Editing and Compositions Thereof |
| US20200231952A1 (en) * | 2017-06-13 | 2020-07-23 | Regents Of The University Of Minnesota | Materials and methods for increasing gene editing frequency |
| US20210171985A1 (en) * | 2018-06-28 | 2021-06-10 | Crispr Therapeutics Ag | Compositions and methods for genomic editing by insertion of donor polynucleotides |
| US20230091690A1 (en) * | 2019-12-30 | 2023-03-23 | The Broad Institute, Inc. | Guided excision-transposition systems |
Non-Patent Citations (1)
| Title |
|---|
| FERREIRA DA SILVA JOANA, TOU CONNOR J., KING EMILY M., ELLER MADELINE L., MA LINYUAN, RUFINO-RAMOS DAVID, KLEINSTIVER BENJAMIN P: "Click editing enables programmable genome writing using DNA polymerases and HUH endonucleases", BIORXIV, 13 September 2023 (2023-09-13), pages 1 - 26, XP093225375, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2023.09.12.557440v1> [retrieved on 20240626], DOI: 10.1101/2023.09.12.557440 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220162688A1 (en) | Enhanced Adaptor Ligation | |
| ES2855748T3 (en) | Primer extension target enrichment | |
| CN107075511B (en) | Formation of synthons | |
| JP6165789B2 (en) | Methods for in vitro linking and combinatorial assembly of nucleic acid molecules | |
| CN108060191B (en) | A method for adding a double-stranded nucleic acid fragment to a linker, a library construction method and a kit | |
| JP6108494B2 (en) | Template-independent ligation of single-stranded DNA | |
| CN104114702B (en) | Template switch is used for the purposes of DNA synthesis | |
| US20100222238A1 (en) | Asymmetrical Adapters And Methods Of Use Thereof | |
| CN110012671B (en) | Normalization of NGS library concentrations | |
| KR102278495B1 (en) | DNA production method and kit for linking DNA fragments | |
| CA3054881A1 (en) | Method of replicating or amplifying circular dna | |
| CN110914418A (en) | Compositions and methods for sequencing nucleic acids | |
| JP2002532085A (en) | Improved method for inserting nucleic acids into circular vectors | |
| US20230257805A1 (en) | Methods for ligation-coupled-pcr | |
| WO2018005720A1 (en) | Method of determining the molecular binding between libraries of molecules | |
| CA3220708A1 (en) | Oligo-modified nucleotide analogues for nucleic acid preparation | |
| WO2024215891A1 (en) | Methods for nucleic acid labeling | |
| US6641998B2 (en) | Methods and kits to enrich for desired nucleic acid sequences | |
| US20210198718A1 (en) | Method of attaching adaptors to single-stranded regions of double-stranded polynucleotides | |
| US20250313876A1 (en) | Isolation of dna fragments | |
| Martínez-Carrón et al. | Methods of amplifying template nucleic acid using a thermostable Tthprimpol | |
| KR20230154078A (en) | Genomic library construction and targeted epigenetic assay using CAS-gRNA ribonucleoprotein | |
| WO2023191034A1 (en) | Method for producing double-stranded dna molecules having reduced sequence errors | |
| CN117255856A (en) | Genomic library preparation and targeting epigenetic assays using CAS-gRNA ribonucleoprotein | |
| CN115803433A (en) | Thermostable ligases with reduced sequence bias |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24789450 Country of ref document: EP Kind code of ref document: A1 |