WO2023060539A1 - Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation - Google Patents
Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation Download PDFInfo
- Publication number
- WO2023060539A1 WO2023060539A1 PCT/CN2021/124025 CN2021124025W WO2023060539A1 WO 2023060539 A1 WO2023060539 A1 WO 2023060539A1 CN 2021124025 W CN2021124025 W CN 2021124025W WO 2023060539 A1 WO2023060539 A1 WO 2023060539A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- seq
- variants
- sequence
- reverse transcriptase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1276—RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- the present disclosure relates to compositions and methods for detecting target cleavage sites of CRISPR/Cas nucleases and DNA translocations in a genome.
- the present disclosure also discloses compositions and methods for enhancing insertion efficiencies for gene editing.
- CRISPR-based genome editing exhibited great potentials in both biomedical research and clinic applications.
- CRISPR-based therapies have their unique advantages because of the flexibility of guide RNA (gRNA) design and direct targeting to the causal nucleic acid sequences.
- gRNA guide RNA
- non-specific targeting of gRNAs which causes undesired mutagenesis, might bring unexpected nuclease toxicity to cells.
- a comprehensive survey of the off-target sites, including local sequence changes and DNA translocations from distal sites is urged due to the great translation potential of CRISPR therapies in genetic disorders and other human diseases.
- CIRCLE-seq incorporated additional enrichment step by circularizing, linearizing, and then selectively amplifying fragmentized DNA (Tsai et al., 2017) .
- SITE-seq employed biotinylated oligos to tag and enrich the nuclease-digested sites (Cameron et al., 2017) .
- in vitro techniques typically reported more off-target sites that could not be identified in an actual cellular context.
- DSBs double strand breaks
- dsODN exogenous double-stranded oligodeoxynucleotides
- BLISS is another type of in cellulo technique, which utilized in situ DSB ligation in the fixed cell or tissues cells and characterized the off-target sites for both spCas9 and As/LbCpf1 (Yan et al. 2017) .
- genome editing may also induce rearrangement of DNA fragments (Zuccaro et al., 2020; Liang et al., 2020; Alanis-Lobato et al., 2021) , which is more toxic to genome stability and cell function.
- Methodologies for direct detection of DNA translocation are lagged.
- the present disclosure provides a novel method for detecting the editing sites of Cas nuclease and its variants and DNA translocations at these editing sites, wherein an appropriate sequence (e.g., a label) is inserted to the editing sites and the inserted labels are further enriched for sequencing, e.g., high throughput sequencing (HTS) .
- the present method employs a guide RNA, and a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA.
- the methods and compositions disclosed herein diversify the toolkit to evaluate the genotoxicity of CRISPR applications in research and therapeutic applications.
- the detection method disclosed herein is referenced as Prime Editor Assisted off-target Characterization (PEAC-seq) or “PEAC-seq technique, ”
- the PEAC-seq provides a method to detect Cas9 cleavage sites with high accuracy and sensitivity.
- PEAC-seq can be used in vitro and in vivo, as illustrated in the Examples of this disclosure.
- PEAC-seq can also be used to detect DNA translocations at Cas9 cleavage sites.
- PEAC-seq is designed to insert an insertion sequence (e.g., a label) into a Cas9 cleavage site (including both on-target and off-target sites) in the genome.
- insertion sequences function as labels, marking the Cas9 cleavage sites.
- the incorporation of the insertion sequences in the genomic DNA is also referred to herein as “labeling. ”
- the insertion sequence e.g., a label
- the insertion sequence can incorporate a tag sequence to represent and enrich the edited sites in the genome.
- the reverse transcriptase and the Cas9 nuclease are fused together as, e.g., a fusion protein.
- the labeling of the genomic DNA by an insertion sequence is performed at the same location of a cleavage site right after the Cas9 nuclease cleaves the genomic DNA at that cleavage site.
- the Cas9 cleavage sites can be identified on the genome. The accompanying process of cut-and-insertion ensures consistency between cutting events and insertion events.
- DNA translocations at Cas9 cleavage sites can be identified by a detection method disclosed herein. DNA translocations are typically more toxic but cannot be directly detected by current methods.
- the present disclosure provides a comprehensive and streamlined method to identify CRISPR off-targeting sites both in vitro and in vivo, as well as DNA translocation events.
- the method employs a guide RNA comprising an insertion sequence reverse transcriptase (RT) template and does not rely on additional exogenous label sequence.
- RT reverse transcriptase
- the present disclosure provides a guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template.
- the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, for example, cytosine represents less than 25% (e.g., less than 20%, less than 15%, less than 10%, less than 5%) of the nucleotides in the nucleotide sequence.
- the insertion sequence template comprises a nucleotide sequence that is SEQ ID NO: 113.
- the insertion sequence template comprising a nucleotide sequence that can be of any length, e.g., from about 10bp to 30bp.
- the insertion sequence can be of any length, including but not limited to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length.
- the guide RNA further comprises a primer binding site (PBS) that is capable of binding the non-complementary strand of a target gene.
- PBS primer binding site
- the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, and a guide RNA.
- the guide RNA comprises a spacer, a scaffold, and an insertion sequence template, wherein the insertion sequence template is a reverse transcriptase template comprising a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- the Cas nuclease disclosed herein is selected from Cas9, its variants and mutated forms.
- the reverse transcriptase disclosed herein is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
- the Cas nuclease and the reverse transcriptase are formed as a fusion protein.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- the present disclosure provides a vector comprising a guide RNA, a nucleotide sequence encoding a Cas nuclease, and a nucleotide sequence encoding a reverse transcriptase.
- the guide RNA comprises an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- RT reverse transcriptase
- the Cas nuclease is selected from Cas9, its variants and mutated forms.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
- the present disclosure provides a kit for detecting Cas nuclease cleavage sites and DNA translocation in genomic DNA, comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA comprising an insertion sequence reverse transcriptase (RT) template, e.g., wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- RT insertion sequence reverse transcriptase
- the Cas nuclease is selected from Cas9, its variants and mutated forms.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
- the Cas9 nuclease and the reverse transcriptase are formed as a fusion protein.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- the present disclosure provides a method for detecting Cas9 nuclease cleavage sites in genomic DNA of a cell, comprising (a) providing a guide RNA wherein the guide RNA can bind to a target gene on the genome DNA and comprises an insertion sequence RT template, (b) providing a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA, and (c) contacting the genomic DNA with the complex in a condition to obtain labeled genomic DNA, wherein the genome DNA is cleaved at one or more cleavage sites, and one or more insertion sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites.
- the insertion sequence RT template is located at the 3’ end of the guide RNA.
- the Cas nuclease is selected from Cas9, its variants and mutated forms.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
- the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- the insertion sequence RT template is of SEQ ID NO: 113.
- the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- the insertion sequence template is about 10 to 30 nucleotides.
- the guide RNA comprises a structure of spacer-scaffold-insertion sequence template-primer binding site (PBS) , from 5’ to 3’ end, in which the spacer is able to bind the target gene on the genome, the scaffold is able to bind the Cas nuclease, the primer binding site is able to bind the non-complementary strand of the target gene, and the insertion sequence template is the reverse transcription template for the reverse transcriptase.
- PBS spacer-scaffold-insertion sequence template-primer binding site
- the Cas and the reverse transcriptase form a Cas-RT fusion protein.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- the Cas cleavage sites could be either on-target or off-target.
- the present disclosure provides a method for detecting Cas9 cleavage sites and detecting DNA translocation at the cleavage sites in genomic DNA of a cell, comprising (a) obtaining labeled genomic DNA with a method described herein, (b) targeting and amplifying one or more labeled portions of the genomic DNA, (c) sequencing the amplified portion of the genomic DNA, and (d) analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocation at the Cas cleavage sites.
- the one or more amplified portions of the genomic DNA in step (b) each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more insertion sequences.
- the method can be used to identify Cas nuclease off-target sites by comparing the Cas cleavage sites identified by the method disclosed herein with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site. It would be understood that, based on the information provided by the method disclosed herein, those of ordinary skill in the art are able to locate the cleavage sites on the genome with readily available tools such as Burrows-Wheeler Aligner (BWA) .
- BWA Burrows-Wheeler Aligner
- the labeled genomic DNA is processed by Tn5 tagmentation before amplification.
- the labeled genomic DNA is processed by Tn5 tagmentation before amplification, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
- UMI unique molecular identifiers
- the genomic DNA comprising the insertion sequence or a portion of the insertion sequence is targeted and enriched by a method selected from PCR and hybrid capture-based target enrichment methods.
- the enrichment of labeled genomic DNA is performed by two rounds of PCR, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
- the 3’ end of the primers that bind to the insertion sequence are at least 2-bp away from the insertion boundary so that the extension sequence information can be used to filter out random priming reads.
- the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising (a) identifying the off-target sites for Cas cleavage using each of the guide RNAs with a method disclosed herein, and (b) determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
- the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants comprising (a) identifying the off-target cleavage site for each of the Cas nuclease variants with a method disclosed herein, and (b) determining the relative specificity of the Cas nuclease variants based on the total number of off-target sites identified for each of the Cas nuclease variants, wherein a Cas nuclease variant having fewer off-target sites is more specific than a Cas nuclease variant having more off-target sites.
- the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising (a) identifying the off-target cleavage site and DNA translocation for each of the guide RNAs with a method disclosed herein, and (b) determining the relative genotoxicity of the guide RNAs based on the total number of off-target sites and DNA translocation identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and more DNA translocation.
- Fig. 1 High-throughput screen of optimized insertion template sequences using a Cas9-MMLV system
- Fig. 1A is a schematic representation of the experimental procedure.
- Fig. 1A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) , a nucleotide sequence encoding Cas9, a nucleotide sequence encoding an M-MLV reverse transcriptase, and a nucleotide sequence encoding an enhanced green fluorescent protein (denoted as “EGFP” ) .
- the guide RNA comprises a spacer, a scaffold, an insertion sequence having randomized nucleotides (denoted as “Random” ) , and PBS.
- Fig. 1B shows the nucleotide fraction at each position alongside the insertion sequences (5’ to 3’) .
- the G nucleotide is disfavored across all positions, especially at the first and the last position of the insertion sequences.
- the screen was conducted with insertion sequences comprising 20 random nucleotides. Only full-length insertion was counted.
- Fig. 1C shows the percentage of reads representing successful insertion of the consecutive nucleotides (polyC, polyT, and polyG) .
- the template is ten consecutive nucleotides.
- the y-axis is the percentage of reads representing the different lengths of insertions.
- Fig. 2A is a schematic representation of the experimental procedure.
- Fig. 2A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) that comprises a spacer, a scaffold, an insertion sequence encoding a wild type or one of the modified versions of His tag, LoxP tag or Flag tag (denoted as “Modified Tags” ) , and PBS.
- gRNA guide RNA
- Fig. 2B shows the insertion efficiencies of the original and the modified His (6X) tags at three target sites (PRNP, RNF2, and EMX1) .
- the His2 comprises more C nucleotides and the His3 comprises more G nucleotides.
- Fig. 2C shows the insertion efficiency comparisons of the original and modified versions of His (6x) tag, LoxP tag, and Flag tag at the HEK3 locus.
- the LoxP2 maintains its G%compared to LoxP1, while LoxP3 comprises more G nucleotides and shows decreased insertion efficiency.
- the Flag4 tag is depleted of G nucleotides and shows dramatically increased efficiency.
- Fig. 2D is a table summarizing the sequences and nucleotide compositions of each tested tags.
- Fig. 3 illustrates an embodiment in this disclosure: PEAC-seq
- Fig. 3A is a schematic representation of the PEAC-seq experimental procedure.
- Fig. 3A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) that comprises a spacer, a scaffold, an insertion sequence (denoted as “PEAC-seq insertion” ) , and PBS.
- gRNA guide RNA
- PEAC-seq insertion insertion sequence
- PBS PBS.
- the gDNA was extracted and treated with Tn5 tagmentation.
- the Tn5 was embedded with UMI-adaptors, which could eliminate PCR duplications.
- DNA fragments were amplified by pairs of primers (e.g., one pair priming at the PEAC-seq insertion, another pair priming with the Tn5 adaptor) .
- Fig. 3B is a schematic representation of the primers designed for enrichment and library preparation of PEAC-seq.
- Fig. 3C is a Venn diagram showing the off-targets overlap identified by the PEAC-seq technique and GUIDE-seq technique. The results of the two techniques targeting the VEGFA TS1 site are shown.
- Fig. 3D is a visualization of the on-target and off-target sites of VEGFA TS1 identified by PEAC-seq.
- the ‘*’ represented a PEAC-seq site that was also called by the GUIDE-seq.
- the ‘**’ represented a PEAC-seq site validated by Amplicon-NGS but not called by the GUIDE-seq.
- Fig. 3E are screenshots of PEAC-seq signal tracks from the IGV Genome Browser.
- One on-target site, one off-target site called by both PEAC-seq and GUIDE-seq, and one off-target site called by PEAC-seq only were presented.
- the top two tracks represented signals from the PEAC-seq experiments.
- the first track was from the forward primer (the genomic region downstream to the spacer)
- the second track was from the reverse primer (the genomic region upstream to the spacer) .
- the bottom two tracks represented signals from the wild-type samples (no Cas9-MMLV treatment) .
- the DNA models under each of the signal tracks show the direction of the spacer and PAM of each case.
- Fig. 3F shows that the number of reads from the sites called by the PEAC-seq and GUIDE-seq techniques are highly correlated.
- Fig. 3G shows that the off-target sites called by both the PEAC-seq and GUIDE-seq (grey bars) tend to have less mismatches, while the PEAC-seq unique sites (slash bars) and the GUIDE-seq unique sites (horizontal bars) tend to have more mismatches.
- Fig. 3H shows mutation frequencies plotted at each position alongside the gRNA and PAM sequences (from 5’ to 3’ ) . From top to bottom are profiles of the PEAC-seq and GUIDE-seq techniques targeting the VEGFA TS1, TS2, and TS3, respectively.
- Fig. 4A shows signal tracks of one PEAC-seq site with unexpected upstream signals from F-primer amplicons. Dashed bar in the middle: cutting site; Arrows: unexpected upstream signals.
- Fig. 4B shows proposed models of the generation of unexpected upstream signals.
- Both the Receiver site and the Donor site could generate DSBs and proximal to each other within the nucleus.
- Model (i) and Model (ii) joined the DSB ends from the same Receiver site.
- Model (iii) , Model (iv) and Model (v) joined one donor DSB and one Receiver DSB. If the donor DSB carried the PEAC-seq insertion, the unexpected upstream signal would be observed at the Receiver Site.
- the gRNA location was set on the top strand.
- Fig. 4C shows the design of validation PCR to identify the genomic sequence of the Donor Sites.
- Two specific primers (Nest-F1 and Nest-F2) were designed upstream of the gRNA of the Receiver Site.
- the Nest-F1 and Nest-F2 were sequentially used with the downstream Tn5 primer, and two amplicons were generated.
- the 2nd amplicons were sent for Amplicon-NGS.
- Fig. 4D shows the translocation cases identified by PEAC-seq + Amplicon-NGS.
- Fig. 4E shows translocation scores of all sites.
- the arrow indicates the Receiver Site in Fig. 4D.
- a DNA translocation score was calculated as “translocation reads number” / ( “normal reads number” + “translocation reads number” + 10) .
- Fig. 5A is a schematic representation of an in vivo PEAC-seq experiment.
- Fig. 5B is a Venn diagram showing the overlap between the PEAC-seq on-target and off-targets of pcsk9 and the top18 cleavage sites identified by DISCOVER-seq (Wienert et al, 2019) . The identified one site from PEAC-seq was also identified by DISCOVER-seq.
- Fig . 5C illustrates the sequence visualization of the pcsk9 on-target and off-targets by PEAC-seq.
- One off-target was identified from one of the two embryos.
- the site was also reported by DISCOVER-seq and validated by Amplicon-NGS.
- the color scale of “%insertion” represented the indel frequency reported by CRISPResso.
- the star symbol represented on-target site.
- Fig. 5D show the signal track of the on-target and off-target sites identified by PEAC-seq in two different embryos and wild-type control.
- the signal of the WT control at chr4: 106463845 was 1000-fold lower than the embryo samples and was considered as background.
- Fig. 6A is a comparison of editing efficiency of Cas9, Cas9-MMLV, and Cas9n-MMLV.
- the dinucleotide insertion efficiency of Cas9-MMLV is comparable to the indel frequency of Cas9, and both are much higher than Cas9n-MMLV.
- Fig. 6B and Fig. 6C illustrate the distribution of insertion length when 10N (Fig. 6B) or 20N (Fig. 6C) random sequences were used as the insertion sequence templates.
- Amplicons were enriched by PEAC-seq insertion-specific primers and Tn5 primers. Three forward primers and two reverse primers were used together with the upstream and downstream Tn5 primers. A total of five NGS libraries were generated and sequenced. A modified GUIDE-seq analysis pipeline was applied and six lists of candidate sites were generated from each pair of the forward and the reverse primers.
- Fig. 10A is a Venn diagram showing the overlap of on-target and off-targets of VEGFA TS2 between the PEAC-seq and GUIDE-seq techniques. There was an overlap of eighty-one sites. Seventy-one sites were GUIDE-seq unique and thirty-four sites were PEAC-seq unique.
- Fig. 10B is the GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS2.
- the star symbol represents the on-target site.
- Fig. 10C shows the signal tracks of PEAC-seq sites at VEGFA TS2. Chromosome locations and the overlap with GUIDE-seq were also shown.
- Fig. 11A is a Venn diagram showing the overlap of on-target and off-targets of VEGFA TS3 between the PEAC-seq and GUIDE-seq. There was an overlap of thirty-five sites. Twenty-five sites were GUIDE-seq unique, and eight sites were PEAC-seq unique.
- Fig. 11B is a GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS3.
- the star symbol represents the on-target site.
- Fig. 11C shows the signal tracks of PEAC-seq sites at VEGFA TS3. Chromosome locations and the overlap with GUIDE-seq were also shown.
- Fig. 12A is a Venn diagram showing the overlap of on-target and off-targets of EMX1 between the PEAC-seq and GUIDE-seq. Four sites were overlapped. Twelve sites were GUIDE-seq unique.
- Fig. 12B is a GUIDE-seq visualization output of PEAC-seq sites at EMX1.
- Fig. 12C shows the signal tracks of PEAC-seq sites at EMX1. Chromosome locations and the overlap with GUIDE-seq were also shown.
- Fig. 13A is a Venn diagram showing the overlap of on-target and off-targets of RNF2 between the PEAC-seq and GUIDE-seq. One site was called by both methods.
- Fig. 13B is a GUIDE-seq visualization output of PEAC-seq sites at RNF2.
- the star symbol represents the on-target site.
- Fig. 13C shows the signal tracks of PEAC-seq sites at RNF2. Chromosome locations and the overlap with GUIDE-seq were also shown.
- Fig. 14A is a Venn diagram showing the overlap of on-target and off-targets of FANCF between the PEAC-seq and GUIDE-seq. There was an overlap of three sites. Six sites were GUIDE-seq unique.
- Fig. 14B is a GUIDE-seq visualization output of PEAC-seq sites at FANCF. The star symbol represented on target site.
- Fig. 14C shows the signal tracks of PEAC-seq sites at FANCF. Chromosome locations and the overlap with GUIDE-seq were also shown.
- Fig. 15A is a Venn diagram showing the overlap between the PEAC-seq on-target and off-targets of PnPla3 and the top21 off-targets validated by WGS (Anderson et al, 2018) . Three cleavage sites were identified from two different embryos. All three sites were reported previously.
- Fig. 15B is a sequence visualization of the Pnpla3 on-target and off-targets.
- One off-target site was identified in both embryos, and in each embryo was identified an embryo-specific off-target. All three off-targets were reported previously and also verified by Amplicon-NGS. The star symbol represented on target site.
- Fig. 15C shows the signal tracks of the on-target and off-targets sites identified by PEAC-seq in two different embryos and wild-type control.
- Fig. 16 The translocation events call by PEAC-seq were validated by anchored multiplex PCR (AMP)
- Circos plots show the chromosome rearrangements at the two receiver sites Translocation validation site1 (chr22: 37266776-37266799) (Fig. 16A) and Translocation validation site2 (chr14: 61612048-61612071) (Fig. 16B) . Both sites are off-targets of VEGFA TS3. Arcs were used to represent the rearrangements between the Translocation validation sites and other sites. The receiver sites were marked as diamonds, and the known VEGFA TS3 off target sites were marked as stars.
- the Cas9 target sequences i.e. cleavage sites identified by PEAC-seq targeting six genes (VEGFA TS1 (Fig. 17A) , VEGFA TS2 (Fig. 17B) , VEGFA TS3 (Fig. 17C) , EMX1 (Fig. 17D) , RFN2 (Fig. 17E) and FANCF (Fig. 17F) ) , and their Chromosome locations. The number of mismatches is also shown. The cleavage site with 0 mismatch is the on-target site and the others are off-target sites.
- Fig. 18A shows the Cas9 target sequences (i.e., cleavage sites) from Embryo #5 and Embryo #12 identified by PEAC-seq targeting Pcsk9.
- Fig. 18B shows the Cas9 target sequences (i.e., cleavage sites) from Embryo #21 and Embryo #31 identified by PEAC-seq targeting Pnpla3.
- nucleic acids are written left to right in the 5' to 3' orientation; and amino acid sequences are written left to right in amino to carboxy orientation, respectively.
- variable refers to varied form of a subject, which includes wild-type forms, naturally occurring or artificially mutant forms.
- CRISPR technology holds significant promise to biological studies and gene therapies because of its high flexibility and efficiency when applying in mammalian cells.
- endonuclease e.g., Cas9
- Cas9 may potentially generate undesired edits when wandering along the genome.
- the detection of off-target is crucial to the biotechnological and clinical applications of the CRISPR technology.
- the present disclosure provides a method in which a label sequence is encoded within the CRISPR-Cas system and is inserted along with the cleavage sites.
- the uniqueness of the detection method disclosed herein is the simultaneous identification of off-targets and DNA translocations. It can be used in different delivery methods of CRISPR therapy, including in vivo delivery approaches (e.g., lipid nanoparticle and adeno-associated virus) .
- the method is also compatible with all variants and mutants of any of the variants of Cas9, including but not limited to the wild-type Cas9 and Cas9 nickase (Cas9n) .
- the detection method disclosed herein is referred to as Prime Editor-Assisted Off-Target Characterization by Sequencing (PEAC-seq) .
- the Prime Editing system is a “search-and-replace” genome editing technology that mediates targeted insertions, deletions, and base-to-base conversions and combinations thereof in human cells without the need for double strand breaks (DSBs) or donor DNA templates.
- DSBs double strand breaks
- Prime Editors use a reverse transcriptase (RT) fused to an RNA-programmable nickase (e.g., Cas9 nickase) and a prime editing guide RNA (also known as pegRNA) to copy genetic information directly from an extension on the pegRNA into the target genomic locus (Anzalone et al., 2019) .
- RT reverse transcriptase
- pegRNA prime editing guide RNA
- the template sequence on the pegRNA extension will be reverse transcribed into DNA and hybridize to the unedited complementary strand with the help of another endonuclease (e.g. FEN1) .
- the PEAC-seq method in the present disclosure replaces the Cas9 nickase in the Prime Editing system with a Cas9, which creates DSBs in the genomic DNA.
- a Cas9 which creates DSBs in the genomic DNA.
- random sequence screen and polymers screen were conducted for appropriate insertion sequence compositions, where preferences of nucleotide incorporation were characterized.
- the negative impact of G as the last incorporated nucleotide has been reported earlier, and a recent study reported the disfavor of the last G is length dependent. (Kim et al., Nat. Biotechnol. 34, 198-206 (2001) )
- the inventors discovered from an unbiased screen that the G nucleotide is disfavored, sometimes, strongly, across the entire insertion sequence, indicating that the G incorporation should be avoided across the entire insertion, especially at the 5' and 3' ends.
- this sequence preference can be considered, e.g., use synonymous codon to avoid G insertion and increase insertion efficiency.
- the higher insertion efficiency of His (6X) tag compared to the Flag and LoxP tags as reported in earlier research might not only because its shorter sequence length, the depletion of G nucleotide could also contribute to its high insertion efficiency.
- a detection method disclosed herein employs a guide RNA comprising an insertion sequence RT template, as well as a reverse transcriptase and a Cas nuclease as a fusion protein.
- the reverse transcribed sequences (based on the RT template) inserted into the cleavage sites function as labels, marking the Cas cleavage sites. Since the reverse transcriptase and the Cas nuclease are fused together, the labeling process, i.e., inserting a sequence into a cleavage site, is performed right after the cleavage event and at the same location. By enriching and sequencing the portions of genomic DNA that contain the insertion sequence, the Cas cleavage sites can be identified on the genome.
- an insertion sequence refers to a DNA sequence that is encoded by the RT template comprised in a guide RNA and the products reverse transcribed from this RT template. Both partial and full-length products may exist in a reverse transcription. When “insertion sequence” is used to refer to the reverse transcription products, it includes both the partial and full-length products.
- the portion of genomic DNA that comprises the insertion sequence needs to be targeted and enriched by PCR.
- two separate reactions are performed, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
- the forward primer e.g., F1, F2, or F3 in Fig3B
- the reverse primer e.g., R1 or R2 in Fig3B
- amplicon refers to the portion of genomic DNA that is amplified in a PCR.
- the inventors also discovered that the occurrence of DNA translocation is independent to the frequency of DSB at a particular site, which indicated that other factors, e.g., DSB context sequences or local chromatin states might also contribute to translocation occurrence.
- a detection method disclosed herein does not rely on extra exogenous sequences besides the CRISPR-Cas system (e.g., a guide RNA comprising an insertion sequence RT template, a Cas nuclease, and a reverse transcriptase) to label the Cas cleavage sites, the method can not only be used in vitro and in cellulo, it can also be used in vivo. These applications will be described in detail below.
- the CRISPR-Cas system e.g., a guide RNA comprising an insertion sequence RT template, a Cas nuclease, and a reverse transcriptase
- a guide RNA is a short synthetic RNA composed of a scaffold sequence necessary for Cas-binding and a user-defined nucleotide spacer that defines the gene target to be modified.
- the strand of genomic DNA that is bound by the spacer is typically referred to as the complementary strand.
- the other strand of DNA is typically referred to as the non-complementary strand.
- the guide RNA used herein is made up of two RNA molecules which are a crRNA and a tracrRNA, wherein the crRNA is customized to bind a target gene, and the tracrRNA serves as a binding scaffold for a Cas enzyme.
- the guide RNA used herein is a single guide RNA (sgRNA) , wherein the single RNA molecule comprises a custom-designed crRNA sequence fused to a scaffold tracrRNA sequence.
- a single guide RNA is used to increase the editing efficiency.
- the guide RNA further comprises an extension arm to its 3’ end.
- the extension arm provides a DNA synthesis template sequence that encodes a single strand DNA flap that is to be inserted into a Cas cleavage site.
- at the 3’ end of the extension arm is a primer binding site (PBS) that binds to the non-complementary strand of the target gene and serves as a primer for the reverse transcriptase.
- PBS primer binding site
- the DNA synthesis template sequence comprised in a guide RNA is referred to as an insertion sequence RT template in the present disclosure.
- the inventors discovered that guanine is disfavored at nearly all positions of the insertion sequence, especially at the 5’ end and the 3’ end. And this preference of nucleotide composition against guanine is consistent regardless of the length of the insertion.
- guanine should be avoided from the insertion sequence in order to increase the insertion efficiency, which means cytosine should be avoided from the insertion sequence RT template comprised in the guide RNA.
- the present disclosure provides a guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template.
- the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, for example, cytosine represents less than 25% (e.g., less than 20%, less than 15%, less than 10%, less than 5%) of the nucleotides in the nucleotide sequence.
- an insertion sequence RT template is cytosine-depleted if it comprises a nucleotide sequence wherein cytosine represents not more than 25%, e.g., less than 25%of the nucleotides in the nucleotide sequence.
- the insertion sequence RT template comprises a nucleotide sequence that is SEQ ID NO: 113.
- the insertion sequence template comprises a nucleotide sequence of any length, e.g., from about 10bp to 30bp.
- the insertion sequence could be of any length, including but not limited to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length.
- the guide RNA further comprises a primer binding site (PBS) that is capable of binding the non-complementary strand of the target gene.
- PBS primer binding site
- the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- Hybrid capture is a method used in target DNA enrichment, where a “bait” molecule is used to select target regions from DNA libraries.
- the hybrid capture method that could be used herein include, but not limited to, biotinylated oligonucleotide baits.
- the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, and a guide RNA wherein the guide RNA comprises a spacer, a scaffold, and an insertion sequence template, wherein the insertion sequence template is a reverse transcriptase template comprising a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
- the Cas nuclease and the reverse transcriptase are formed as a fusion protein, optionally with a peptide linker in between the Cas nuclease and the reverse transcriptase.
- a fusion protein can be made from a fusion gene, e.g., created by joining parts of two different genes.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- the present disclosure provides a vector comprising a guide RNA comprising an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence, a nucleotide sequence encoding a Cas nuclease, and a nucleotide sequence encoding a reverse transcriptase.
- RT insertion sequence reverse transcriptase
- the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
- the present disclosure provides a kit for detecting Cas nuclease cleavage sites and DNA translocation in genomic DNA, comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and the guide RNA comprising an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- RT insertion sequence reverse transcriptase
- the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
- the Cas9 nuclease and the reverse transcriptase are encoded as a fusion protein.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- CRISPR clustered, regularly interspaced, short palindromic repeats
- Cas CRISPR-associated systems
- the present disclosure involves a Cas nuclease or a variant or a mutant of any of the variants thereof.
- All variants and mutants of Cas9 can be used in a method, composition, or kit disclosed herein, including but not limited to a wild-type Cas9 or a Cas9 nickase (Cas9n) .
- the Cas9 nuclease used herein could either be wild type or be genetically modified.
- the Cas9 nucleases to be used herein could be selected from SpCas9 (Cas9 isolated from Streptococcus pyogenes) , SaCas9 (Cas9 isolated from Staphylococcus aureus) , StCas9 (Cas9 isolated from Streptococcus thermophilus) , NmCas9 (Cas9 isolated from Neisseria meningitidis) , FnCas9 (Cas9 isolated from Francisella novicida) , CjCas9 (Cas9 isolated from Campylobacter jejuni) , ScCas9 (Cas9 isolated from Streptococcus canis) , and any variants and mutant forms of the Cas9 listed above, such as high-fidelity Cas9 (Kleinstiver et al., Nature. 2016 Jan 28) and enhanced SpCas9 (Slaymaker et al., Sciences. 2016 Jan 01) .
- the present disclosure involves a reverse transcriptase or a variant or a mutant of any of the variants thereof, which can be provided as a fusion protein with a Cas nuclease, or provided in trans.
- Reverse transcriptase also known as RNA-dependent DNA polymerase, is a DNA polymerase enzyme that transcribes single-stranded RNA into DNA.
- Reverse transcriptase is found in many eukaryotic and prokaryotic systems like telomerase, retrotransposons, retrons, and are found abundantly in the genomes of plants and animals. Any of the wild type, variant, and mutant forms of reverse transcriptase which are known in the art or which can be made using methods known in the art are contemplated herein.
- the reverse transcriptase that can be used herein include, but not limited to, Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants or mutants of any of the variants forms.
- M-MLV Moloney Murine Leukemia Virus
- AMV Avian Myeloblastosis Virus
- the reverse transcriptase is fused directly to the Cas nuclease. In some embodiments, the reverse transcriptase is connected to the Cas nuclease with a linker.
- the present disclosure provides a method for labeling Cas9 nuclease cleavage sites in genomic DNA of a cell, comprising (a) providing a guide RNA that can bind to a target gene on the genome DNA and comprises an insertion sequence RT template, (b) providing a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA, and (c) contacting the genomic DNA with the complex in a condition to obtain a labeled genomic DNA, wherein the genome DNA is cleaved at one or more cleavage sites, and one or more insertion sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites.
- the insertion sequence RT template is located at the 3’ end of the guide RNA.
- the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
- the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
- the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- the insertion sequence RT template is of SEQ ID NO: 113.
- the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- the insertion sequence template comprises about 10 to 30 nucleotides.
- the guide RNA has a structure of spacer-scaffold-insertion sequence template-primer binding site (PBS) , from 5’ to 3’ end, in which the spacer is able to bind the target gene on the genome, the scaffold is able to bind the Cas nuclease, the primer binding site is able to bind the non-complementary strand of the target gene, and the insertion sequence template is the reverse transcription template for the reverse transcriptase.
- PBS spacer-scaffold-insertion sequence template-primer binding site
- the Cas and the reverse transcriptase form a Cas-RT fusion protein.
- the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
- the Cas cleavage sites could be either on-target or off-target.
- a Cas nuclease binds to a genetic locus that has a sequence exactly the same as the target gene, the cleavage site created there is an on-target cleavage site. Otherwise, the cleavage site is an off-target site.
- the present disclosure provides a method for detecting Cas9 cleavage sites and detecting DNA translocation at those cleavage sites in genomic DNA of a cell, comprising (a) obtaining a labeled genomic DNA with a method described herein, (b) targeting and amplifying one or more labeled portions of the genomic DNA, (c) sequencing the amplified portion of the genomic DNA, and (d) analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocations at the Cas cleavage sites.
- the one or more amplified portions of the genomic DNA in step (b) each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more insertion sequences.
- the method could be used to identify Cas nuclease off-target sites by comparing the Cas cleavage sites identified by the method disclosed herein with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site. It would be understood that, based on the method disclosed herein, those of ordinary skill in the art are able to locate the cleavage sites on the genome with readily available tools such as Burrows-Wheeler Aligner (BWA) .
- BWA Burrows-Wheeler Aligner
- the genomic DNA is processed by Tn5 tagmentation before amplification.
- Tagmentation uses a hyperactive variant of the Tn5 transposase that mediates the fragmentation of double-stranded DNA and ligates synthetic oligonucleotides at both ends (Adey et al. 2010) .
- Wild-type Tn5 transposon is a composite transposon in which two near-identical insertion sequences (IS50L and IS50R) are flanking three antibiotic resistance genes (Reznikoff 2008) .
- Each IS50 contains two inverted 19-bp end sequences (ESs) , an outside end (OE) and an inside end (IE) .
- Tn5 tagmentation platform or kits and their variants or mutants of any of the variants could be used in the present disclosure, such as Nextera DNA kits and on-bead tagmentation.
- the genomic DNA is processed by Tn5 tagmentation before amplification, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
- UMI is a type of molecular barcoding that provides error correction and increased accuracy in sequencing data analysis.
- the molecular barcodes are short sequences used to uniquely tag each molecule in a sample library.
- the UMI-included adapters are embedded into Tn5 so that dsDNA fragments after tagmentation are tagged with these UMI-included adapters, which could be used to eliminate PCR duplicates from the sequencing data.
- the genomic DNA comprises the insertion sequence or a portion of the insertion sequence is targeted and enriched by a method selected from PCR, or a hybrid capture-based target enrichment method.
- Hybrid capture-based target enrichment method that can be used herein includes, but not limited to, biotinylated oligonucleotide baits.
- PCR polymerase chain reaction
- a set of flanking primers anneal at the outer regions of the DNA sequence of interest, and therefore, unwanted DNA are not amplified.
- Another available group of methods for target enrichment is hybrid capture-based methods.
- One commonly used hybridization capture tag uses a biotinylated oligonucleotide bait. Any methods that can effectively enrich a targeted portion of the genomic DNA can be used herein.
- the enrichment is performed by two rounds of PCR, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
- the 3’ end of the primers that bind to the insertion sequence are at least 2-bp away from the insertion boundary so that the extension sequence information can be used to filter out random priming reads (see Fig3B) . If the primer correctly binds to the insertion sequence, there would be at least 2 bp at the beginning of the extension sequence that are complementary to the insertion sequence.
- the insertion boundary described herein is the first and last base pair of the insertion sequence.
- DNA translocation is also referred to as chromosome translocation, or chromosome rearrangement.
- a translocation a segment from one chromosome is transferred to a nonhomologous chromosome or to a new site on the same chromosome.
- Chromosomal translocations appear to arise from improper repair of DNA double-strand breaks (DSBs) , which are highly toxic lesions.
- DSBs DNA double-strand breaks
- the “guardians” of genome integrity mostly ensure reliable repair of DSBs; also, unrepaired DSBs can lead to apoptosis or senescence.
- imprecise repair of DSBs has the potential to be highly deleterious, as it can lead to genome instability, including the formation of chromosomal rearrangements.
- chromosomal translocations can arise when DNA ends from DSBs on two heterologous chromosomes are improperly joined. (Scott et al., 2000)
- DSB created may generate three types of ends from each cleavage site (Receiver Site or Donor Site) , including one original upstream end, one original downstream end, and one upstream end appended with a complete or partial tag insertion. If multiple DSBs simultaneously happened in the nucleus and physically proximal to each other, DSB ends from different breaking points might join together and cause genome rearrangements.
- the upstream end of a Donor Site may bring a reversely placed insertion sequence to the upstream end of a Receiver Site (Fig4B, model (v) ) , and this joining would generate an amplicon from the upstream end of the Receiver Site, which usually won’ t be amplified by the F-primer.
- the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising (a) identifying the off-target sites for Cas cleavage using each of the guide RNAs with a method disclosed herein, and (b) determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
- the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants and mutants comprising (a) identifying the off-target cleavage site for each of the Cas nuclease variants and mutants with a method disclosed herein, and (b) determining the relative specificity of the Cas nuclease variants and mutants based on the total number of off-target sites identified for each of the Cas nuclease variants and mutants, wherein a Cas nuclease variant or mutant having fewer off-target sites is more specific than a Cas nuclease variant or mutant having more off-target sites.
- the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising (a) identifying the off-target cleavage site and DNA translocation for each of the guide RNAs with a method disclosed herein, and (b) determining the relative genotoxicity of the guide RNAs based on the total number of off-target sites and DNA translocation identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and DNA translocation.
- sequencing includes any method of determining the sequence of a nucleic acid. Any method of sequencing can be used in the present disclosure, including chain terminator (Sanger) sequencing and dye terminator sequencing. In preferred embodiments, Next Generation Sequencing (NGS) is used. NGS is a high-throughput sequencing technology that performs thousands or millions of sequencing reactions in parallel. Although different NGS platforms use varying assay chemistries, they all generate sequence data from a large number of sequencing reactions run simultaneously on a large number of templates. Typically, the sequence data is collected using a scanner, and then assembled and analyzed bioinformatically. Thus, the sequencing reactions are performed, read, assembled, and analyzed in parallel. See e.g.
- NGS methods require template amplification and some do not.
- Amplification-requiring methods include pyrosequencing; the Solexa/Illumina platform, and the Supported Oligonucleotide Ligation and Detection (SOLID) platform.
- Methods that do not require amplification include single-molecule sequencing methods, nanopore sequencing, HeliScope, real-time sequencing by synthesis, single molecule real time (SMRT) DNA sequencing methods using zero-mode waveguides (ZMWs) and others.
- SMRT single molecule real time
- ZMWs zero-mode waveguides
- hybridization-based sequence methods or other high-throughput methods can also be used, e.g., microarray analysis, NANOSTRING, ILLUMINA, or other sequencing platforms.
- the methods described herein can be used in any cell that is capable of repairing a DSB in genomic DNA and synthesizing new strand of DNA based on a template.
- the two major DSB repair pathways in eukaryotic cells are homologous recombination and non-homologous end joining (NHEJ) .
- the methods could be performed in cells capable of any of the repair pathways.
- the off-target sites could happen anywhere in the genome and that brings big challenges to detect these sites comprehensively if without enrichment.
- Strategies like GUIDE-Seq applying a unique exogenous sequence to tag and enrich the cleavage sites, have been proved effective to identify the off-targeting sites in cellulo.
- dsODN exogenous double-stranded oligodeoxynucleotides
- the templated information on the 3’ of guide RNA were employed to tag and enrich the edited genomic sites, which avoided the dsODN transfection.
- Cas9-MMLV should have higher insertion efficiency than Cas9n-MMLV since the double strand breaks can help incorporate the templated reverse transcription sequences.
- the insertion efficiency of Cas9-MMLV was indeed comparable to the indel efficiency of Cas9 and much higher than the Cas9n-MMLV (Fig6A) .
- the template sequences were optimized to identify appropriate insertion sequences.
- Fig1A Guide RNAs targeting the widely used VEGFA site3 (TS3) with random sequences were designed as the insertion template. Screens on guide RNAs with RT templates of 10 or 20 random nucleotides were performed. Since 20-bp should have provided enough nucleotides for specific primer binding for PCR enrichment, longer nucleotides were not tested further, although longer nucleotides sequences can be used. The length distribution of insertions across different libraries (Fig6B, 6C) was first examined.
- nucleotide composition of these insertions was examined. Interestingly, the guanine is disfavored at nearly all positions, especially at the 5' and 3’ of the insertion sequence (Fig1B, Fig7) . And the preference of nucleotide composition was consistent across the different length of insertion.
- a dinucleotide (CA) insertion test was conducted to target VEGFA TS3 site (SEQ ID NO: 127) .
- the HEK-293T cells were seeded in 12-well plate and grown until ⁇ 80%confluency. Each well was transfected with 2.5 ⁇ g plasmids (1.875 ⁇ g of the pCMV Cas9 plasmid (SEQ ID NO: 11) , or the pCMV Cas9-MMLV plasmid (SEQ ID NO: 12) , or the pCMV Cas9H840A-PE2 (SEQ ID NO: 13) plasmids and 0.625 ⁇ g PE-GUIDE test2 gRNA plasmids (SEQ ID NO: 10) ) by Lipofectamine 3000.
- the genomic DNA was extracted 48h post transfection. 1 ⁇ g of gDNA was used as the template to amplify the insertion regions with primer T3 sanger F (SEQ ID NO: 7) and T3 sanger R (SEQ ID NO: 8) .
- the PCR products were sent for sanger sequencing after gel extraction. And the returned ab1 files were analyzed for gene editing results with Synthego ICE Analysis (https: //ice. synthego. com/) .
- Oligos with different lengths of random sequences (10N (SEQ ID NO: 2) and 20N (SEQ ID NO: 1) ) were synthesized. In the following steps, the 10N and 20N were used as two independent samples.
- Each oligo template was amplified by opti-oligoF (SEQ ID NO: 3) and opti-oligoR (SEQ ID NO: 4) primers in four 50 ⁇ L reactions. The products from four reactions were combined, size-selected on an agarose gel, and purified by 1.8x AMPure XP beads (Beckman #A63881) . The purified PCR products were cloned into a backbone vector (SEQ ID NO: 9) via Golden Gate assembly (GGA) .
- GGA Golden Gate assembly
- the GGA reaction was performed in 50 ⁇ L volume with the following components: 50 fmol backbone, 150 fmol insertions, 0.5 ⁇ L T4 DNA Ligase (Thermo #EL0014) , 1 ⁇ L BsmBI (Thermo #ER0451) , and 1x T4 Buffer.
- the reaction was conducted as 90 cycles of 37°C 5mins and 22°C 5mins; 65°C for 30mins; and 37°C for 3 hours.
- the ligation products were purified by 0.8x AMPure XP beads and transformed into the NEB stable electroporation competent cells (NEB #C3040H) following the manufacturer’s instruction.
- the electroporation was performed on Eppendorf Eporator.
- the transformed cells were propagated on 24.5*24.5cm plates at 30°C for 20 hours. Colonies were collected, and plasmids were extracted using the QIAGEN Plasmid Plus Midi Kit (QIAGEN #12943) following the manufacturer’s instructions.
- HEK-293T cells were seeded in T75 flasks, and the transfection was conducted when the confluency was around 80%.
- 80 ⁇ g plasmids were transfected by Lipofectamine 3000 (Thermo #L3000075) following the manufacturer’s instructions.
- Cells were collected after 48hrs of transfection.
- the gDNA was collected, and 1 ⁇ g gDNA was used as the template in 50 ⁇ L reaction to amplify the targeted sequence (VEGFA Site3) .
- site-specific primers T3-ON-F SEQ ID NO: 5
- T3-ON-R SEQ ID NO: 6
- 2.5 ⁇ L product from the 1st PCR was used as the template in the 2nd PCR and amplified by the universal primers seq-F2 (SEQ ID NO: 128) and seq-R2 (SEQ ID NO: 129) .
- the library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
- the polyN library screen was conducted in similar conditions.
- the plasmids expressing polyG (10G) (SEQ ID NO: 14) , polyC (10C) (SEQ ID NO: 15) , and polyA (10A) (SEQ ID NO: 16) insertion gRNA was synthesized and transfected into HEK293T with the methods described above.
- PolyT (10T) was not included since it terminates the Pol III transcription from U6 promoter.
- the genomic DNA was extracted and used for insertion region amplification with nested PCR. Site-specific primers were used in the 1st round PCR, and 2.5 ⁇ L product was used as the template in the 2nd round of PCR.
- the amplicons were purified by AMPure XP beads using 0.5x+0.6x double size selection.
- the library was sequenced on the Illumina Novaseq platform as paired-end 150bp. And the sequencing data was processed following the manual of CRISPResso2.
- the Amplicon-NGS data was processed by CRISPResso2 (Pinnello et al., Nat Biotechnol 34, 2016) , with parameters “--max_paired_end_reads_overlap 140, --min_paired_end_reads_overlap 10, --exclude_bp_from_left 0, --exclude_bp_from_right 0, --plot_window_size 30, and --min_frequency_alleles_around_cut_to_plot 0.1” .
- sequencing reads mapped to the targeted regions were used to quantify the base compositions of the integrated sequences in 20-nt windows with the customized scripts.
- both the wild-type (unmodified) and the nucleotide composition modified forms of His (6x) tag, Flag tag and LoxP tag were used as RT templates of guide RNA and their insertion efficiencies were quantified by Amplicon-NGS (Fig2A) .
- the insertion efficiency of the WT and the modified His tags at three different genomic loci were first tested. Compared to His1-WT and His2, which have no G nucleotide, the His3 composed 44.4%G, and its insertion efficiency dramatically decreased to 35.4%, 5.9%, and 28.3%of its WT form at PRNP, RNF2, and EMX1 loci, respectively (Fig2B &Fig2D) .
- Oligos were designed to insert both the wild-type and modified forms of His (6x) tag (SEQ ID NO: 117-119) , Flag tag (SEQ ID NO: 120-122) and LoxP tag (SEQ ID NO: 123-126) at four genomic loci (HEK3, PRNP, RNF2 and EMX1) . Sequences were modified to achieve different base contents, and the gRNA expression vector (Table 1) was manufactured by General Biol. All plasmids were prepared by a QIAprep Spin Miniprep Kit (QIAGEN #27106) . The HEK-293T cells were seeded in 12-well plate and grown until ⁇ 80%confluency.
- Each well was transfected with 5.3 ⁇ g plasmids (4 ⁇ g tag plasmids and 1.3 ⁇ g nicking gRNA plasmids) by Lipofectamine 3000.
- the post-transfection cells were collected after 48 hours.
- the cell sorter (SONY MA900) was used to sort about 100,000 double-positive cells (mKate2 positive representing the tag plasmids and GFP positive representing the nicking gRNA plasmids) .
- gDNA was extracted, and 1ug of gDNA was used as the template to amplify the insertion region by nested PCR.
- Site-specific primers (Table 1) were used in the 1st round PCR, and 2.5 ⁇ L product was used as the template in the 2nd round of PCR.
- the primers used in the 2 nd round of PCR are chosen from a set of P5 index primers (SEQ ID NO: 48-55) and a set of P7 index primers (SEQ ID NO: 56-67) .
- One P5 index primer and one P7 index primer are used in the 2 nd round PCR. Any one of the P5 index primers could be used together with any one of the P7 index primers.
- the amplicons were purified by AMPure XP beads using 0.5x+0.6x double size selection.
- the library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
- a 21-nt cytosine-depleted sequence was designed based on the findings in Example 1 and Example 2 as an insertion sequence RT template at the 3’ of guide RNAs. It’s reasoned that the C-depleted template could result in high insertion efficiency, and the 21-nt would provide enough length for primer priming, which both will enhance the enrichment for the editing sites.
- the UMI-included adaptors were embedded in Tn5 to eliminate PCR duplications from the sequencing data.
- VEGFA TS1 SEQ ID NO: 130
- VEGFA TS2 SEQ ID NO: 131
- VEGFA TS3 SEQ ID NO: 127)
- EMX1 SEQ ID NO: 132
- RNF2 SEQ ID NO: 133
- FANCF FANCF
- a modified GUIDE-seq analysis pipeline was used to rank and filter the called cleavage sites and compared the lists generated from the different primer sets. Since the GUIDE-seq pipeline used priming information from both the top and the bottom strands of the insertion, six candidate lists were generated from the three forward primers and two reverse primers (Fig. 8) . The F1/R2 list was chosen in this Example for the following analysis, as they showed the best consistency across all six lists (Fig 17A-F, Fig9-14) . Other primer sets can also be used for the method disclosed herein.
- PEAC-seq identified 16 cleavage sites including the on-target one, and 14 of cleavage sites were also reported by GUIDE-seq (Fig3C-E, Fig8) . Noticeably, top candidate sites were highly consistent between PEAC-seq and GUIDE-seq (Fig3D) , and the number of NGS reads that quantitatively represented the editing frequency were highly correlated (Fig3F) . Although the PEAC-seq-unique off-target sites (PEAC15 and PEAC16) were represented by a smaller number of reads, their signal tracks were very similar to the on-target and the shared off-target sites (Fig3E) .
- the number of mismatches also contributed to the off-target editing.
- the off-target sites that both PEAC-seq and GUIDE-seq reported composed a smaller number of mismatches than off-target sites unique to one method (Fig3G) .
- the inventors also examined whether the position of mismatches on the gRNA sequence might affect the off-targets identification, especially in the PBS (primer binding site) that is crucial to reverse transcription. To do that, the inventors plotted the mutation frequency to the sites grouped as “Overlap” , the “PEAC-seq unique” , and the “GUIDE-seq unique” along the gRNA and PAM sequences (Fig3H) .
- the Cas9 nickase was replaced by wildtype Cas9 and GTATGAGGTTGGTGGATTGGT (SEQ ID NO: 113) was used as the RT template of gRNA. They were assembled into a single vector as the PEAC-seq backbone.
- the spacer sequences targeting VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RFN2 and FANCF were cloned into the PEAC-seq backbone individually, thus the plasmids of SEQ ID NO: 68-73 were obtained, respectively.
- HEK-293T cells were seeded in a 12-well plate and grown until ⁇ 80%confluency.
- Each well was transfected with 3 ⁇ g plasmids by Lipofectamine 3000.
- the post-transfection cells were collected after 48 hours.
- the cell sorter (SONY MA900) was used to sort about 100,000 GFP positive cells.
- About 500ng extracted gDNA was digested with NotI then cleaned up with 0.5x AMPure XP beads to remove the carryover plasmids.
- the gDNA fragments were retained on the AMPure XP beads, and on-beads Tn5 digestion was performed at 55°C for one hour and adaptors were inserted at the ends of the fragments.
- the Tn5 was expressed and embedded with the adaptors in-house.
- Tn5 primeA SEQ ID NO: 74
- PegTn5 primeB 8N
- SEQ ID NO: 75 Tn5 assembly.
- 6 ⁇ L 0.2%SDS was added to terminate the reaction.
- the products were purified and size-selected by 1.5x AMPure XP beads and eluted in 50uL H 2 O.
- the 21bp insertion sequence was used to enrich the editing sites (both on-target and off-target) in the NGS library preparation.
- two separate reactions were performed. Each reaction used a 20 ⁇ L template in a total of 50uL volume at ⁇ 30 cycles.
- PEAC-seq insertion sequence As the forward primer binding site and the downstream Tn5 adaptor as the reverse primer binding site.
- 2.5 ⁇ L1st round product was used as the template in the 2nd round amplification in a total of 50 ⁇ L volume for 17 cycles, and Illumina adaptors were added.
- the amplicons were purified by AMPure XP beads using 0.6x+0.25x double size selection.
- the library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
- Off target universal primer (SEQ ID NO: 76) is downstream Tn5 adaptor.
- PEAC-seq The performance of PEAC-seq was evaluated by six sites (VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RNF2, and FANCF) .
- sites VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RNF2, and FANCF.
- PEAC 15F (F1) and Off target universal primer PEAC 18R (R2) and Off target universal primer
- PEAC 18R (R1) and Off target universal primer PEAC 5F (F3) and Off target universal primer
- PEAC 10F (F2) and Off target universal primer PEAC 15F (F1) and Off target universal primer
- PEAC 18R R2 and Off target universal primer
- PEAC 5F F3
- PEAC 10F F2 and Off target universal primer.
- PEAC 15F (F1) , PEAC 18R (R2) , PEAC 18R (R1) (SEQ ID NO: 77-79) were fully contained in a PEAC 21bp insertion sequence and were shared within different sites.
- 15F means a forward primer matching the first 15 bp of the PEAC 21bp insertion sequence.
- 18R means a reverse primer matching the last 18 bp of the PEAC 21bp insertion sequence.
- 19R means a reverse primer matching the last 19 bp of the PEAC 21bp insertion sequence.
- PEAC 5F (F3) and PEAC 10F (F2) were partially contained in the PEAC 21bp insertion sequence (5F means 5 bp contained, 10F means 10 bp contained) , while the remaining part of PEAC 5F (F3) and PEAC 10F (F2) matched the specific edit sites. So each of the six sites (VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RNF2, and FANCF) had its own PEAC 5F (F3) and PEAC 10F (F2) primers and listed as SEQ ID NO: 80-91.
- the PEAC-seq data was analyzed using a modified pipeline from GUIDE-seq. Firstly, adapters were trimmed using cutadapt (Martin M., EMBnet 17, 2011) , and reads without appropriate adapter was removed. Then the reads were mapped to the human or mouse genome (hg38, mm10) using bwa (Li H., Bioinformatics 28, 2012) . Reads mapped to the same location and shared the same UMI were considered as PCR duplicates and merged in the following analysis. In order to fit in the target identification pipeline from GUIDE-seq, the reads name from bam files were modified, and the bam files from the forward and backward PCR were labeled and merged.
- the reads number from the GUIDE-seq output file was normalized to reads per million and the number of reads were calculated with correct primer extension.
- the forward primer ( “F-primer” hereafter, the F1, F2, or F3 in Fig3B) amplified regions downstream, but not upstream, of the double strand break (DSB) (Fig3A &Fig3D) .
- F-primer the F1, F2, or F3 in Fig3B
- DSB double strand break
- DSB could generate three types of ends from each cleavage site (Receiver Site or Donor Site) , including one original upstream end, one original downstream end, and one upstream end appended with a complete or partial tag insertion. If multiple DSBs simultaneously happened in the nucleus and physically proximal to each other, DSB ends from different breaking points might join together and cause genome rearrangements.
- the upstream end of a Donor Site would bring a reversely placed PEAC-seq insertion to the upstream end of a Receiver Site (Fig4B, model (v) ) , and this joining would generate an amplicon from the upstream end of the Receiver Site, which usually won’ t be amplified by the F-primer.
- two nested PCR primers upstream of the guide RNA were designed.
- the site-specific nested PCR primers were served as forward primers, and downstream Tn5 primer was served as reverse primer.
- the nested primers were sequentially used to amplify the adjacent sequences of translocated DSBs.
- About 300 ng PEAC-seq gDNA was fragmentized by Tn5, purified with 1.5x AMpure XP beads and eluted with 23 ⁇ L H 2 O.
- About 20 ⁇ L purified DNA was used as template for the 1 st round PCR for 20 cycles.
- 2.5 ⁇ L products from the 1 st PCR was used as template for another 20 cycles in the 2 nd round of the nested PCR.
- Another 20-cycles PCR was conducted to add the sequencing adaptors.
- the amplicons were purified by 0.6x then 0.25x double-size beads selection.
- the library was sequenced on the Illumina Novaseq platform as paired-end 150b
- PEAC-seq used the templated information on gRNA to insert tag sequences but not using exogenous tags. This straightforward procedure allowed us to investigate its application in vivo.
- mice embryos were edited at the pronuclear stage by injecting in vitro transcribed Cas9-MMLV mRNA and guide RNAs targeting Pcsk9 and Pnpla3. Embryos were collected around E14.5 to E21 and off-target lists were generated by conducting the PEAC-seq (Fig5, Fig15) .
- Both the guide RNA and the mRNA of Cas9-MMLV were prepared by in vitro transcription.
- the DNA template of guide RNA was amplified from the plasmids “pcsk9-gRNA” (SEQ ID NO: 96) and “mPnpla-gRNA” (SEQ ID NO: 97) by T7 forward primers (SEQ ID NO: 99 for Pcsk9, and SEQ ID NO: 101 for Pnpla3) and T7 reverse primers (SEQ ID NO: 100 for Pcsk9, and SEQ ID NO: 102 for Pnpla3) .
- the PCR products were gel purified using MinElute Gel Extraction Kit (QIAGEN #28606) , which was used as the template for in vitro transcription by HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB #E2050S) .
- the pCMV-Cas9-MMLV plasmid (SEQ ID NO: 98) was linearized by MssI (Thermo #FD1344) . According to the manufacturer’s instructions, 1 ⁇ g linearized product was used as a template to generate Cas9-MMLV mRNA from in vitro transcription by HiScribe T7 ARCA mRNA Kit (NEB #E2060S) .
- C57BL/6 and ICR mice were purchased and housed in the Laboratory Animal Resource Center (LARC) at the Westlake University.
- LARC Laboratory Animal Resource Center
- the LARC is a certified pathogen-free and environmental-control facility (21 ⁇ 2°C, 55 ⁇ 15%humidity and 12: 12-h light: dark cycle) .
- the C57BL/6 mice were used for embryo collection, and ICR females were used as recipients. All animal experiments were conducted under the protocol approved by the animal care and ethical committee of the Westlake University.
- Embryos were then flushed several times to rinse off the hyaluronidase and cumulus cells. Afterward, embryos were transferred into a dish with prewarmed KSOM medium (Millipore #MR-106-D) covered by mineral oil followed by three additional washes.
- the mixture of Cas9-PE2 mRNA (100ng/ ⁇ L) and guide RNA (50ng/ ⁇ L) was injected into the cytoplasm of the zygote in M2 medium.
- the injection was conducted using a microinjector (NARISHIGE #IM-400B) with constant flow settings.
- the injected embryos were cultured in KSOM medium with amino acids in a cell culture incubator at 37°C and with 5%CO 2 , then were transplanted into oviducts of pseudopregnant ICR females at 0.5 dpc. Pups were sacrificed at E14.5 ⁇ E21, and organs were collected, dissected and snap-frozen in liquid nitrogen. Samples were stored at -80°C until further analysis.
- the gDNA from organs was extracted using TIANamp Genomic DNA Kit (TIANGEN #DP304-03) according to the manufacturer’s instructions. PCR was applied to amplify the targeting regions and attach the Illumina adaptors to amplicons. Primers used are SEQ ID NO: 103-112. The in vivo PEAC-seq library was constructed as described in Example 3 by Tn5 fragmentation.
- GEO accessions GSE179523, GSE179436, GSE179374
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides compositions and methods for detecting target cleavage sites of CRISPR/Cas nucleases and DNA translocations at the cleavage sites in a genome. The present disclosure also discloses compositions and methods for enhancing the insertion efficiency for gene editing.
Description
The present disclosure relates to compositions and methods for detecting target cleavage sites of CRISPR/Cas nucleases and DNA translocations in a genome. The present disclosure also discloses compositions and methods for enhancing insertion efficiencies for gene editing.
SEQUENCE LISTING
Please insert the sequence listing, filed herewith in electronic format, into the application before the claims.
CRISPR-based genome editing exhibited great potentials in both biomedical research and clinic applications. Compared to small-molecule drugs and antibody drugs, CRISPR-based therapies have their unique advantages because of the flexibility of guide RNA (gRNA) design and direct targeting to the causal nucleic acid sequences. However, non-specific targeting of gRNAs, which causes undesired mutagenesis, might bring unexpected nuclease toxicity to cells. Especially in clinical applications, a comprehensive survey of the off-target sites, including local sequence changes and DNA translocations from distal sites, is urged due to the great translation potential of CRISPR therapies in genetic disorders and other human diseases.
To date, researchers have developed versatile tools to investigate off-target sites in the genome. In vitro techniques capture nuclease-induced cleavage events directly from purified genomic DNA or chromatin. Kim et al. developed Digenome-seq that utilized nuclease digestion to produce many DNA fragments with identical 5’ ends and recognized off-targets from deep sequencing, which typically requires 400~500 million reads per sample (Kim et al., 2015) . The same group developed DIG-seq that followed the same principle but retained the chromatin during nuclease digestion (Kim et al., 2018) . CIRCLE-seq incorporated additional enrichment step by circularizing, linearizing, and then selectively amplifying fragmentized DNA (Tsai et al., 2017) . SITE-seq employed biotinylated oligos to tag and enrich the nuclease-digested sites (Cameron et al., 2017) . However, in vitro techniques typically reported more off-target sites that could not be identified in an actual cellular context. In cellulo techniques, such as GUIDE-seq, labelled and enriched double strand breaks (DSBs) in the genome of living cells using exogenous double-stranded oligodeoxynucleotides (dsODN) in an end-joining process (Tsai et al., 2015) , but the need for exogenous dsODN limited its application in vivo. BLISS is another type of in cellulo technique, which utilized in situ DSB ligation in the fixed cell or tissues cells and characterized the off-target sites for both spCas9 and As/LbCpf1 (Yan et al. 2017) . Due to the clinical potential of CRISPR, in vivo off-target identification has been highly demanded to evaluate genotoxicity of CRISPR-based therapy. One strategy was to use in vitro or computational approaches to prioritize a list of genomic regions and validate them one by one through Amplicon-NGS (Newby et al., 2021; Musunuru et al., 2021) , which suffered from tedious labor work if it comes with a long list. DISCOVER-seq, on the other hand, utilized the immunoprecipitation of MRE11, which involved in DNA repairing pathway, to represent and enrich the edited sites (Wienert et al., 2019) . However, the dynamic nuclease activity might not be fully captured by the “snapshot” signals from MRE11 immunoprecipitation.
Furthermore, genome editing may also induce rearrangement of DNA fragments (Zuccaro et al., 2020; Liang et al., 2020; Alanis-Lobato et al., 2021) , which is more toxic to genome stability and cell function. Methodologies for direct detection of DNA translocation are lagged.
SUMMARY
The present disclosure provides a novel method for detecting the editing sites of Cas nuclease and its variants and DNA translocations at these editing sites, wherein an appropriate sequence (e.g., a label) is inserted to the editing sites and the inserted labels are further enriched for sequencing, e.g., high throughput sequencing (HTS) . The present method employs a guide RNA, and a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA. The methods and compositions disclosed herein diversify the toolkit to evaluate the genotoxicity of CRISPR applications in research and therapeutic applications.
In certain embodiments, the detection method disclosed herein is referenced as Prime Editor Assisted off-target Characterization (PEAC-seq) or “PEAC-seq technique, ” The PEAC-seq provides a method to detect Cas9 cleavage sites with high accuracy and sensitivity. PEAC-seq can be used in vitro and in vivo, as illustrated in the Examples of this disclosure. PEAC-seq can also be used to detect DNA translocations at Cas9 cleavage sites. For example, PEAC-seq is designed to insert an insertion sequence (e.g., a label) into a Cas9 cleavage site (including both on-target and off-target sites) in the genome. These insertion sequences function as labels, marking the Cas9 cleavage sites. The incorporation of the insertion sequences in the genomic DNA is also referred to herein as “labeling. ” The insertion sequence (e.g., a label) can be optimized in composition and length to increase insertion efficiency. For instance, the insertion sequence can incorporate a tag sequence to represent and enrich the edited sites in the genome.
In certain embodiments, the reverse transcriptase and the Cas9 nuclease are fused together as, e.g., a fusion protein. The labeling of the genomic DNA by an insertion sequence is performed at the same location of a cleavage site right after the Cas9 nuclease cleaves the genomic DNA at that cleavage site. By sequencing the portions of the genomic DNA that contain the insertion sequence or a fragment of the insertion sequence, the Cas9 cleavage sites can be identified on the genome. The accompanying process of cut-and-insertion ensures consistency between cutting events and insertion events.
In certain embodiments, DNA translocations at Cas9 cleavage sites can be identified by a detection method disclosed herein. DNA translocations are typically more toxic but cannot be directly detected by current methods.
In certain embodiments, the present disclosure provides a comprehensive and streamlined method to identify CRISPR off-targeting sites both in vitro and in vivo, as well as DNA translocation events. The method employs a guide RNA comprising an insertion sequence reverse transcriptase (RT) template and does not rely on additional exogenous label sequence. The method is compatible with various delivery method.
In an aspect, the present disclosure provides a guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template. In some embodiments, the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, for example, cytosine represents less than 25% (e.g., less than 20%, less than 15%, less than 10%, less than 5%) of the nucleotides in the nucleotide sequence.
In some embodiments, the insertion sequence template comprises a nucleotide sequence that is SEQ ID NO: 113.
In some embodiments, the insertion sequence template comprising a nucleotide sequence that can be of any length, e.g., from about 10bp to 30bp. The insertion sequence can be of any length, including but not limited to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length.
In some embodiment, the guide RNA further comprises a primer binding site (PBS) that is capable of binding the non-complementary strand of a target gene.
In some embodiment, the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
In another aspect, the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, and a guide RNA. In certain embodiments, the guide RNA comprises a spacer, a scaffold, and an insertion sequence template, wherein the insertion sequence template is a reverse transcriptase template comprising a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the Cas nuclease disclosed herein is selected from Cas9, its variants and mutated forms.
In some embodiments, the reverse transcriptase disclosed herein is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
In some embodiments, the Cas nuclease and the reverse transcriptase are formed as a fusion protein.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
In another aspect, the present disclosure provides a vector comprising a guide RNA, a nucleotide sequence encoding a Cas nuclease, and a nucleotide sequence encoding a reverse transcriptase.
In some embodiments, the guide RNA comprises an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the Cas nuclease is selected from Cas9, its variants and mutated forms.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
In another aspect, the present disclosure provides a kit for detecting Cas nuclease cleavage sites and DNA translocation in genomic DNA, comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA comprising an insertion sequence reverse transcriptase (RT) template, e.g., wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the Cas nuclease is selected from Cas9, its variants and mutated forms.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
In some embodiments, the Cas9 nuclease and the reverse transcriptase are formed as a fusion protein.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
In an aspect, the present disclosure provides a method for detecting Cas9 nuclease cleavage sites in genomic DNA of a cell, comprising (a) providing a guide RNA wherein the guide RNA can bind to a target gene on the genome DNA and comprises an insertion sequence RT template, (b) providing a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA, and (c) contacting the genomic DNA with the complex in a condition to obtain labeled genomic DNA, wherein the genome DNA is cleaved at one or more cleavage sites, and one or more insertion sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites.
In some embodiments, the insertion sequence RT template is located at the 3’ end of the guide RNA.
In some embodiments, the Cas nuclease is selected from Cas9, its variants and mutated forms.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
In some embodiments, the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, e.g., cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the insertion sequence RT template is of SEQ ID NO: 113.
In some embodiments, the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
In some embodiments, the insertion sequence template is about 10 to 30 nucleotides.
In some embodiments, the guide RNA comprises a structure of spacer-scaffold-insertion sequence template-primer binding site (PBS) , from 5’ to 3’ end, in which the spacer is able to bind the target gene on the genome, the scaffold is able to bind the Cas nuclease, the primer binding site is able to bind the non-complementary strand of the target gene, and the insertion sequence template is the reverse transcription template for the reverse transcriptase.
In some embodiments, the Cas and the reverse transcriptase form a Cas-RT fusion protein.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
In some embodiments, the Cas cleavage sites could be either on-target or off-target.
In another aspect, the present disclosure provides a method for detecting Cas9 cleavage sites and detecting DNA translocation at the cleavage sites in genomic DNA of a cell, comprising (a) obtaining labeled genomic DNA with a method described herein, (b) targeting and amplifying one or more labeled portions of the genomic DNA, (c) sequencing the amplified portion of the genomic DNA, and (d) analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocation at the Cas cleavage sites.
In some embodiments, the one or more amplified portions of the genomic DNA in step (b) each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more insertion sequences.
In some embodiments, the method can be used to identify Cas nuclease off-target sites by comparing the Cas cleavage sites identified by the method disclosed herein with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site. It would be understood that, based on the information provided by the method disclosed herein, those of ordinary skill in the art are able to locate the cleavage sites on the genome with readily available tools such as Burrows-Wheeler Aligner (BWA) .
In some embodiments, the labeled genomic DNA is processed by Tn5 tagmentation before amplification.
In some embodiments, the labeled genomic DNA is processed by Tn5 tagmentation before amplification, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
In some embodiments, the genomic DNA comprising the insertion sequence or a portion of the insertion sequence is targeted and enriched by a method selected from PCR and hybrid capture-based target enrichment methods.
In some embodiments, the enrichment of labeled genomic DNA is performed by two rounds of PCR, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
In some embodiments where PCR is performed, the 3’ end of the primers that bind to the insertion sequence are at least 2-bp away from the insertion boundary so that the extension sequence information can be used to filter out random priming reads.
The methods disclosed herein could be used in vitro, in cellulo, or in vivo.
In an aspect, the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising (a) identifying the off-target sites for Cas cleavage using each of the guide RNAs with a method disclosed herein, and (b) determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
In another aspect, the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants comprising (a) identifying the off-target cleavage site for each of the Cas nuclease variants with a method disclosed herein, and (b) determining the relative specificity of the Cas nuclease variants based on the total number of off-target sites identified for each of the Cas nuclease variants, wherein a Cas nuclease variant having fewer off-target sites is more specific than a Cas nuclease variant having more off-target sites.
In another aspect, the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising (a) identifying the off-target cleavage site and DNA translocation for each of the guide RNAs with a method disclosed herein, and (b) determining the relative genotoxicity of the guide RNAs based on the total number of off-target sites and DNA translocation identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and more DNA translocation.
Fig. 1 High-throughput screen of optimized insertion template sequences using a Cas9-MMLV system
Fig. 1A is a schematic representation of the experimental procedure. Fig. 1A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) , a nucleotide sequence encoding Cas9, a nucleotide sequence encoding an M-MLV reverse transcriptase, and a nucleotide sequence encoding an enhanced green fluorescent protein (denoted as “EGFP” ) . The guide RNA comprises a spacer, a scaffold, an insertion sequence having randomized nucleotides (denoted as “Random” ) , and PBS.
Fig. 1B shows the nucleotide fraction at each position alongside the insertion sequences (5’ to 3’) . The G nucleotide is disfavored across all positions, especially at the first and the last position of the insertion sequences. The screen was conducted with insertion sequences comprising 20 random nucleotides. Only full-length insertion was counted.
Fig. 1C shows the percentage of reads representing successful insertion of the consecutive nucleotides (polyC, polyT, and polyG) . The template is ten consecutive nucleotides. The y-axis is the percentage of reads representing the different lengths of insertions.
Fig. 2 Validation of the insertion nucleotide preference using a Cas9n-MMLV system
Fig. 2A is a schematic representation of the experimental procedure. Fig. 2A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) that comprises a spacer, a scaffold, an insertion sequence encoding a wild type or one of the modified versions of His tag, LoxP tag or Flag tag (denoted as “Modified Tags” ) , and PBS.
Fig. 2B shows the insertion efficiencies of the original and the modified His (6X) tags at three target sites (PRNP, RNF2, and EMX1) . Compared to the His1 (WT) , the His2 comprises more C nucleotides and the His3 comprises more G nucleotides.
Fig. 2C shows the insertion efficiency comparisons of the original and modified versions of His (6x) tag, LoxP tag, and Flag tag at the HEK3 locus. The LoxP2 maintains its G%compared to LoxP1, while LoxP3 comprises more G nucleotides and shows decreased insertion efficiency. The Flag4 tag is depleted of G nucleotides and shows dramatically increased efficiency.
Fig. 2D is a table summarizing the sequences and nucleotide compositions of each tested tags.
Fig. 3 illustrates an embodiment in this disclosure: PEAC-seq
Fig. 3A is a schematic representation of the PEAC-seq experimental procedure. Fig. 3A illustrates a construct comprising a guide RNA (denoted as “gRNA” ) that comprises a spacer, a scaffold, an insertion sequence (denoted as “PEAC-seq insertion” ) , and PBS. The gDNA was extracted and treated with Tn5 tagmentation. The Tn5 was embedded with UMI-adaptors, which could eliminate PCR duplications. After tagmentation, DNA fragments were amplified by pairs of primers (e.g., one pair priming at the PEAC-seq insertion, another pair priming with the Tn5 adaptor) .
Fig. 3B is a schematic representation of the primers designed for enrichment and library preparation of PEAC-seq.
Fig. 3C is a Venn diagram showing the off-targets overlap identified by the PEAC-seq technique and GUIDE-seq technique. The results of the two techniques targeting the VEGFA TS1 site are shown.
Fig. 3D is a visualization of the on-target and off-target sites of VEGFA TS1 identified by PEAC-seq. The ‘*’ represented a PEAC-seq site that was also called by the GUIDE-seq. The ‘**’ represented a PEAC-seq site validated by Amplicon-NGS but not called by the GUIDE-seq.
Fig. 3E are screenshots of PEAC-seq signal tracks from the IGV Genome Browser. One on-target site, one off-target site called by both PEAC-seq and GUIDE-seq, and one off-target site called by PEAC-seq only were presented. For each site, the top two tracks represented signals from the PEAC-seq experiments. The first track was from the forward primer (the genomic region downstream to the spacer) , and the second track was from the reverse primer (the genomic region upstream to the spacer) . The bottom two tracks represented signals from the wild-type samples (no Cas9-MMLV treatment) . The DNA models under each of the signal tracks show the direction of the spacer and PAM of each case.
Fig. 3F shows that the number of reads from the sites called by the PEAC-seq and GUIDE-seq techniques are highly correlated.
Fig. 3G shows that the off-target sites called by both the PEAC-seq and GUIDE-seq (grey bars) tend to have less mismatches, while the PEAC-seq unique sites (slash bars) and the GUIDE-seq unique sites (horizontal bars) tend to have more mismatches.
Fig. 3H shows mutation frequencies plotted at each position alongside the gRNA and PAM sequences (from 5’ to 3’ ) . From top to bottom are profiles of the PEAC-seq and GUIDE-seq techniques targeting the VEGFA TS1, TS2, and TS3, respectively.
Fig. 4 Identification of DNA translocations in the PEAC-seq embodiment
Fig. 4A shows signal tracks of one PEAC-seq site with unexpected upstream signals from F-primer amplicons. Dashed bar in the middle: cutting site; Arrows: unexpected upstream signals.
Fig. 4B shows proposed models of the generation of unexpected upstream signals. Both the Receiver site and the Donor site could generate DSBs and proximal to each other within the nucleus. Model (i) and Model (ii) joined the DSB ends from the same Receiver site. Model (iii) , Model (iv) and Model (v) joined one donor DSB and one Receiver DSB. If the donor DSB carried the PEAC-seq insertion, the unexpected upstream signal would be observed at the Receiver Site. In the models, the gRNA location was set on the top strand.
Fig. 4C shows the design of validation PCR to identify the genomic sequence of the Donor Sites. Two specific primers (Nest-F1 and Nest-F2) were designed upstream of the gRNA of the Receiver Site. The Nest-F1 and Nest-F2 were sequentially used with the downstream Tn5 primer, and two amplicons were generated. The 2nd amplicons were sent for Amplicon-NGS.
Fig. 4D shows the translocation cases identified by PEAC-seq + Amplicon-NGS.
Fig. 4E shows translocation scores of all sites. The arrow indicates the Receiver Site in Fig. 4D. A DNA translocation score was calculated as “translocation reads number” / ( “normal reads number” + “translocation reads number” + 10) .
Fig. 5 Identification of pcsk9 off-targets from an edited mouse embryo using PEAC-seq
Fig. 5A is a schematic representation of an in vivo PEAC-seq experiment.
Fig. 5B is a Venn diagram showing the overlap between the PEAC-seq on-target and off-targets of pcsk9 and the top18 cleavage sites identified by DISCOVER-seq (Wienert et al, 2019) . The identified one site from PEAC-seq was also identified by DISCOVER-seq.
Fig . 5C illustrates the sequence visualization of the pcsk9 on-target and off-targets by PEAC-seq. One off-target was identified from one of the two embryos. The site was also reported by DISCOVER-seq and validated by Amplicon-NGS. The color scale of “%insertion” represented the indel frequency reported by CRISPResso. The star symbol represented on-target site.
Fig. 5D show the signal track of the on-target and off-target sites identified by PEAC-seq in two different embryos and wild-type control. The signal of the WT control at chr4: 106463845 was 1000-fold lower than the embryo samples and was considered as background.
Fig. 6 Pilot study for insertion nucleotide preference
Fig. 6A is a comparison of editing efficiency of Cas9, Cas9-MMLV, and Cas9n-MMLV. The dinucleotide insertion efficiency of Cas9-MMLV is comparable to the indel frequency of Cas9, and both are much higher than Cas9n-MMLV.
Fig. 6B and Fig. 6C illustrate the distribution of insertion length when 10N (Fig. 6B) or 20N (Fig. 6C) random sequences were used as the insertion sequence templates.
Fig. 7 Preferences of nucleotides on the insertion sequence
At each position on the insertion sequence, the fractions of nucleotides that represented each of the four nucleotides were plotted.
Fig. 8 Library preparation and modified GUIDE-seq pipeline to generate six lists of candidate sites
Amplicons were enriched by PEAC-seq insertion-specific primers and Tn5 primers. Three forward primers and two reverse primers were used together with the upstream and downstream Tn5 primers. A total of five NGS libraries were generated and sequenced. A modified GUIDE-seq analysis pipeline was applied and six lists of candidate sites were generated from each pair of the forward and the reverse primers.
Fig. 9 Signal tracks of PEAC-seq at VEGFA TS1
The signal tracks of PEAC-seq at VEGFA TS1 are shown. Chromosome locations and the overlap with GUIDE-seq are shown.
Fig. 10 Signal tracks of PEAC-seq at VEGFA TS2
Fig. 10A is a Venn diagram showing the overlap of on-target and off-targets of VEGFA TS2 between the PEAC-seq and GUIDE-seq techniques. There was an overlap of eighty-one sites. Seventy-one sites were GUIDE-seq unique and thirty-four sites were PEAC-seq unique.
Fig. 10B is the GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS2. The star symbol represents the on-target site.
Fig. 10C shows the signal tracks of PEAC-seq sites at VEGFA TS2. Chromosome locations and the overlap with GUIDE-seq were also shown.
Fig. 11 Signal tracks of PEAC-seq at VEGFA TS3
Fig. 11A is a Venn diagram showing the overlap of on-target and off-targets of VEGFA TS3 between the PEAC-seq and GUIDE-seq. There was an overlap of thirty-five sites. Twenty-five sites were GUIDE-seq unique, and eight sites were PEAC-seq unique.
Fig. 11B is a GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS3. The star symbol represents the on-target site.
Fig. 11C shows the signal tracks of PEAC-seq sites at VEGFA TS3. Chromosome locations and the overlap with GUIDE-seq were also shown.
Fig. 12 Signal tracks of PEAC-seq at EMX1
Fig. 12A is a Venn diagram showing the overlap of on-target and off-targets of EMX1 between the PEAC-seq and GUIDE-seq. Four sites were overlapped. Twelve sites were GUIDE-seq unique.
Fig. 12B is a GUIDE-seq visualization output of PEAC-seq sites at EMX1.
Fig. 12C shows the signal tracks of PEAC-seq sites at EMX1. Chromosome locations and the overlap with GUIDE-seq were also shown.
Fig. 13 Signal tracks of PEAC-seq at RNF2
Fig. 13A is a Venn diagram showing the overlap of on-target and off-targets of RNF2 between the PEAC-seq and GUIDE-seq. One site was called by both methods.
Fig. 13B is a GUIDE-seq visualization output of PEAC-seq sites at RNF2. The star symbol represents the on-target site.
Fig. 13C shows the signal tracks of PEAC-seq sites at RNF2. Chromosome locations and the overlap with GUIDE-seq were also shown.
Fig. 14 Signal tracks of PEAC-seq at FANCF
Fig. 14A is a Venn diagram showing the overlap of on-target and off-targets of FANCF between the PEAC-seq and GUIDE-seq. There was an overlap of three sites. Six sites were GUIDE-seq unique.
Fig. 14B is a GUIDE-seq visualization output of PEAC-seq sites at FANCF. The star symbol represented on target site.
Fig. 14C shows the signal tracks of PEAC-seq sites at FANCF. Chromosome locations and the overlap with GUIDE-seq were also shown.
Fig. 15 Identification of mPnpla3 off-targets from edited mouse embryo in the PEAC-seq embodiment
Fig. 15A is a Venn diagram showing the overlap between the PEAC-seq on-target and off-targets of PnPla3 and the top21 off-targets validated by WGS (Anderson et al, 2018) . Three cleavage sites were identified from two different embryos. All three sites were reported previously.
Fig. 15B is a sequence visualization of the Pnpla3 on-target and off-targets. One off-target site was identified in both embryos, and in each embryo was identified an embryo-specific off-target. All three off-targets were reported previously and also verified by Amplicon-NGS. The star symbol represented on target site.
Fig. 15C shows the signal tracks of the on-target and off-targets sites identified by PEAC-seq in two different embryos and wild-type control.
Fig. 16 The translocation events call by PEAC-seq were validated by anchored multiplex PCR (AMP)
Circos plots show the chromosome rearrangements at the two receiver sites Translocation validation site1 (chr22: 37266776-37266799) (Fig. 16A) and Translocation validation site2 (chr14: 61612048-61612071) (Fig. 16B) . Both sites are off-targets of VEGFA TS3. Arcs were used to represent the rearrangements between the Translocation validation sites and other sites. The receiver sites were marked as diamonds, and the known VEGFA TS3 off target sites were marked as stars.
Fig. 17 Target sites identified by PEAC-seq
The Cas9 target sequences (i.e. cleavage sites) identified by PEAC-seq targeting six genes (VEGFA TS1 (Fig. 17A) , VEGFA TS2 (Fig. 17B) , VEGFA TS3 (Fig. 17C) , EMX1 (Fig. 17D) , RFN2 (Fig. 17E) and FANCF (Fig. 17F) ) , and their Chromosome locations. The number of mismatches is also shown. The cleavage site with 0 mismatch is the on-target site and the others are off-target sites.
Fig. 18 Target sites identified by PEAC-seq in vivo
Fig. 18A shows the Cas9 target sequences (i.e., cleavage sites) from Embryo # 5 and Embryo # 12 identified by PEAC-seq targeting Pcsk9.
Fig. 18B shows the Cas9 target sequences (i.e., cleavage sites) from Embryo # 21 and Embryo # 31 identified by PEAC-seq targeting Pnpla3.
All publications cited in this specification are herein incorporated by reference as though fully set forth. If certain content of a reference cited herein contradicts or is inconsistent with the present disclosure, the present disclosure controls.
For the purposes of promoting an understanding of the principles of the present disclosure, reference will now be made to the embodiments illustrated in the drawings, and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of this disclosure is thereby intended.
In the present disclosure, unless otherwise specified, the scientific and technical terms used herein have the meanings generally understood by a person skilled in the art. Although any methods and materials similar or equivalent to those described herein find use in the practice of the present disclosure, the preferred methods and materials are described herein. Accordingly, the terms defined herein are more fully described by reference to the Specification as a whole.
As used herein, the singular terms “a, ” “an, ” and “the” include the plural reference unless the context clearly indicates otherwise.
Unless otherwise indicated, nucleic acids are written left to right in the 5' to 3' orientation; and amino acid sequences are written left to right in amino to carboxy orientation, respectively.
It is to be understood that this disclosure is not limited to the particular methodology, protocols, and reagents described, as these may vary, depending upon the context in which they are used by those of skills in the art.
Unless the context requires otherwise, the terms “comprise, ” “comprises, ” and “comprising, ” or similar terms are intended to mean a non-exclusive inclusion, such that a recited list of elements or features does not include those stated or listed elements solely, but may include other elements or features that are not listed or stated.
As used herein, the term “variant” refers to varied form of a subject, which includes wild-type forms, naturally occurring or artificially mutant forms.
CRISPR technology holds significant promise to biological studies and gene therapies because of its high flexibility and efficiency when applying in mammalian cells. However, endonuclease, e.g., Cas9, may potentially generate undesired edits when wandering along the genome. Thus, there are urgent needs to comprehensively identify off-targeting sites so that the genotoxicities could be accurately assessed. To date, it is still challenging to streamline the process to specifically label and efficiently enrich the cleavage sites from unknown genomic locations. Especially there is a lack of compatible approaches that could work with in vivo CRISPR delivery. The detection of off-target is crucial to the biotechnological and clinical applications of the CRISPR technology. Over the past years, many designs have been applied to depict the off-target profiles in vitro and in cellulo. The common strategy is to label and enrich the cleavage sites without knowing their genomic locations, but the addition of extra exogenous sequences or chemicals in addition to the CRISPR-Cas system per se limits their applications. Besides experimental approaches, computational algorithms incorporated various features of the gRNA to generate candidate off-target list.
To bypass the addition of extra exogenous agents, which need to be delivered into cells by transfection, the present disclosure provides a method in which a label sequence is encoded within the CRISPR-Cas system and is inserted along with the cleavage sites.
In some embodiments, the uniqueness of the detection method disclosed herein is the simultaneous identification of off-targets and DNA translocations. It can be used in different delivery methods of CRISPR therapy, including in vivo delivery approaches (e.g., lipid nanoparticle and adeno-associated virus) . The method is also compatible with all variants and mutants of any of the variants of Cas9, including but not limited to the wild-type Cas9 and Cas9 nickase (Cas9n) .
In some embodiments, the detection method disclosed herein is referred to as Prime Editor-Assisted Off-Target Characterization by Sequencing (PEAC-seq) . The Prime Editing system is a “search-and-replace” genome editing technology that mediates targeted insertions, deletions, and base-to-base conversions and combinations thereof in human cells without the need for double strand breaks (DSBs) or donor DNA templates. Prime Editors (PEs) use a reverse transcriptase (RT) fused to an RNA-programmable nickase (e.g., Cas9 nickase) and a prime editing guide RNA (also known as pegRNA) to copy genetic information directly from an extension on the pegRNA into the target genomic locus (Anzalone et al., 2019) . The template sequence on the pegRNA extension will be reverse transcribed into DNA and hybridize to the unedited complementary strand with the help of another endonuclease (e.g. FEN1) .
In some embodiments, the PEAC-seq method in the present disclosure replaces the Cas9 nickase in the Prime Editing system with a Cas9, which creates DSBs in the genomic DNA. By creating DSBs, the newly reverse transcribed DNA sequences will be inserted into the cleavage site at a higher efficiency.
In some embodiments, to further increase the label insertion efficiency in the detection method disclosed herein, random sequence screen and polymers screen were conducted for appropriate insertion sequence compositions, where preferences of nucleotide incorporation were characterized. The negative impact of G as the last incorporated nucleotide has been reported earlier, and a recent study reported the disfavor of the last G is length dependent. (Kim et al., Nat. Biotechnol. 34, 198-206 (2001) ) The inventors discovered from an unbiased screen that the G nucleotide is disfavored, sometimes, strongly, across the entire insertion sequence, indicating that the G incorporation should be avoided across the entire insertion, especially at the 5' and 3' ends. When designing guide RNAs to use with the present detection method, this sequence preference can be considered, e.g., use synonymous codon to avoid G insertion and increase insertion efficiency. For instance, the higher insertion efficiency of His (6X) tag compared to the Flag and LoxP tags as reported in earlier research, might not only because its shorter sequence length, the depletion of G nucleotide could also contribute to its high insertion efficiency.
In some embodiments, a detection method disclosed herein employs a guide RNA comprising an insertion sequence RT template, as well as a reverse transcriptase and a Cas nuclease as a fusion protein. The reverse transcribed sequences (based on the RT template) inserted into the cleavage sites function as labels, marking the Cas cleavage sites. Since the reverse transcriptase and the Cas nuclease are fused together, the labeling process, i.e., inserting a sequence into a cleavage site, is performed right after the cleavage event and at the same location. By enriching and sequencing the portions of genomic DNA that contain the insertion sequence, the Cas cleavage sites can be identified on the genome.
As used herein, an insertion sequence refers to a DNA sequence that is encoded by the RT template comprised in a guide RNA and the products reverse transcribed from this RT template. Both partial and full-length products may exist in a reverse transcription. When “insertion sequence” is used to refer to the reverse transcription products, it includes both the partial and full-length products.
Compared to the indels caused by off-targets, DNA translocation is much more harmful to DNA stability. Recently, several papers reported that DNA translocations happened more frequently than people thought during Cas9 editing in vivo. (Zuccaro et al., 2020; Liang et al., 2020; Alanis-Lobato et al., 2021) Due to the potential severe consequences of the DNA rearrangements, both the translocation profiling methods and genotoxicity assessment need to be developed for CRISPR transitional applications. None of the widely used off-target detection techniques could detect the DNA translocations simultaneously. While the directional insertion sequence in the present method makes it possible to identify the aberrant ends joining from different DSB sites.
In order to detect DNA translocation, the portion of genomic DNA that comprises the insertion sequence needs to be targeted and enriched by PCR. In some embodiments, two separate reactions are performed, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site. Under this design, the forward primer (e.g., F1, F2, or F3 in Fig3B) amplifies regions downstream, but not upstream, of the Cas9 cleavage site. And the reverse primer (e.g., R1 or R2 in Fig3B) amplifies regions upstream, but not downstream, of the Cas9 cleavage site. When the sequencing result shows upstream signals from the forward primer amplicon, or downstream signals from the reverse primer amplicon, it means that DNA translocation occurs at this Cas9 cleavage site. As used herein, amplicon refers to the portion of genomic DNA that is amplified in a PCR.
The inventors also discovered that the occurrence of DNA translocation is independent to the frequency of DSB at a particular site, which indicated that other factors, e.g., DSB context sequences or local chromatin states might also contribute to translocation occurrence.
In some embodiments, a detection method disclosed herein does not rely on extra exogenous sequences besides the CRISPR-Cas system (e.g., a guide RNA comprising an insertion sequence RT template, a Cas nuclease, and a reverse transcriptase) to label the Cas cleavage sites, the method can not only be used in vitro and in cellulo, it can also be used in vivo. These applications will be described in detail below.
Guide RNA
In a CRISPR/Cas system, a guide RNA (gRNA) is a short synthetic RNA composed of a scaffold sequence necessary for Cas-binding and a user-defined nucleotide spacer that defines the gene target to be modified. The strand of genomic DNA that is bound by the spacer is typically referred to as the complementary strand. And the other strand of DNA is typically referred to as the non-complementary strand.
In some embodiments, the guide RNA used herein is made up of two RNA molecules which are a crRNA and a tracrRNA, wherein the crRNA is customized to bind a target gene, and the tracrRNA serves as a binding scaffold for a Cas enzyme. In some embodiments, the guide RNA used herein is a single guide RNA (sgRNA) , wherein the single RNA molecule comprises a custom-designed crRNA sequence fused to a scaffold tracrRNA sequence. In some embodiments, a single guide RNA is used to increase the editing efficiency.
In the present disclosure, the guide RNA further comprises an extension arm to its 3’ end. In some embodiments, the extension arm provides a DNA synthesis template sequence that encodes a single strand DNA flap that is to be inserted into a Cas cleavage site. In some embodiments, at the 3’ end of the extension arm is a primer binding site (PBS) that binds to the non-complementary strand of the target gene and serves as a primer for the reverse transcriptase.
The DNA synthesis template sequence comprised in a guide RNA is referred to as an insertion sequence RT template in the present disclosure. The inventors discovered that guanine is disfavored at nearly all positions of the insertion sequence, especially at the 5’ end and the 3’ end. And this preference of nucleotide composition against guanine is consistent regardless of the length of the insertion. Preferably, guanine should be avoided from the insertion sequence in order to increase the insertion efficiency, which means cytosine should be avoided from the insertion sequence RT template comprised in the guide RNA.
In an aspect, the present disclosure provides a guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template. In some embodiments, the insertion sequence RT template comprises a nucleotide sequence wherein cytosine is depleted, for example, cytosine represents less than 25% (e.g., less than 20%, less than 15%, less than 10%, less than 5%) of the nucleotides in the nucleotide sequence.
As used herein, an insertion sequence RT template is cytosine-depleted if it comprises a nucleotide sequence wherein cytosine represents not more than 25%, e.g., less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the insertion sequence RT template comprises a nucleotide sequence that is SEQ ID NO: 113.
In some embodiments, the insertion sequence template comprises a nucleotide sequence of any length, e.g., from about 10bp to 30bp. The insertion sequence could be of any length, including but not limited to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length.
In some embodiment, the guide RNA further comprises a primer binding site (PBS) that is capable of binding the non-complementary strand of the target gene.
In some embodiment, the insertion sequence RT template encodes one or more tags suitable for hybrid capture. Hybrid capture is a method used in target DNA enrichment, where a “bait” molecule is used to select target regions from DNA libraries. The hybrid capture method that could be used herein include, but not limited to, biotinylated oligonucleotide baits.
In another aspect, the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, and a guide RNA wherein the guide RNA comprises a spacer, a scaffold, and an insertion sequence template, wherein the insertion sequence template is a reverse transcriptase template comprising a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
In some embodiments, the Cas nuclease and the reverse transcriptase are formed as a fusion protein, optionally with a peptide linker in between the Cas nuclease and the reverse transcriptase. A fusion protein can be made from a fusion gene, e.g., created by joining parts of two different genes.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
In another aspect, the present disclosure provides a vector comprising a guide RNA comprising an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence, a nucleotide sequence encoding a Cas nuclease, and a nucleotide sequence encoding a reverse transcriptase.
In some embodiments, the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
In another aspect, the present disclosure provides a kit for detecting Cas nuclease cleavage sites and DNA translocation in genomic DNA, comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and the guide RNA comprising an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutants of any of the variants.
In some embodiments, the Cas9 nuclease and the reverse transcriptase are encoded as a fusion protein.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
CRISPR-Cas nucleases
Research has shown that clustered, regularly interspaced, short palindromic repeats (CRISPR) /CRISPR-associated (Cas) systems can serve as the basis of a simple and highly efficient method for performing genome editing in bacteria, yeast and human cells, as well as in vivo in whole organisms such as fruit flies, zebrafish, and mice.
In various embodiments, the present disclosure involves a Cas nuclease or a variant or a mutant of any of the variants thereof. All variants and mutants of Cas9 can be used in a method, composition, or kit disclosed herein, including but not limited to a wild-type Cas9 or a Cas9 nickase (Cas9n) . The Cas9 nuclease used herein could either be wild type or be genetically modified. The Cas9 nucleases to be used herein could be selected from SpCas9 (Cas9 isolated from Streptococcus pyogenes) , SaCas9 (Cas9 isolated from Staphylococcus aureus) , StCas9 (Cas9 isolated from Streptococcus thermophilus) , NmCas9 (Cas9 isolated from Neisseria meningitidis) , FnCas9 (Cas9 isolated from Francisella novicida) , CjCas9 (Cas9 isolated from Campylobacter jejuni) , ScCas9 (Cas9 isolated from Streptococcus canis) , and any variants and mutant forms of the Cas9 listed above, such as high-fidelity Cas9 (Kleinstiver et al., Nature. 2016 Jan 28) and enhanced SpCas9 (Slaymaker et al., Sciences. 2016 Jan 01) . This list is only to provide several exemplary options and is not exclusive.
Reverse Transcriptase
In various embodiments, the present disclosure involves a reverse transcriptase or a variant or a mutant of any of the variants thereof, which can be provided as a fusion protein with a Cas nuclease, or provided in trans. Reverse transcriptase (RT) , also known as RNA-dependent DNA polymerase, is a DNA polymerase enzyme that transcribes single-stranded RNA into DNA. Reverse transcriptase is found in many eukaryotic and prokaryotic systems like telomerase, retrotransposons, retrons, and are found abundantly in the genomes of plants and animals. Any of the wild type, variant, and mutant forms of reverse transcriptase which are known in the art or which can be made using methods known in the art are contemplated herein.
The reverse transcriptase that can be used herein include, but not limited to, Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants or mutants of any of the variants forms.
In some embodiments, the reverse transcriptase is fused directly to the Cas nuclease. In some embodiments, the reverse transcriptase is connected to the Cas nuclease with a linker.
It would be understood that a person skilled in the art is able to select conditions (e.g. optimal temperature, pH, reaction time, concentration) suitable for a reverse transcriptase to form the insertion double strand DNA and the like.
Method for labeling and detecting Cas9 nuclease cleavage sites
In an aspect, the present disclosure provides a method for labeling Cas9 nuclease cleavage sites in genomic DNA of a cell, comprising (a) providing a guide RNA that can bind to a target gene on the genome DNA and comprises an insertion sequence RT template, (b) providing a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA, and (c) contacting the genomic DNA with the complex in a condition to obtain a labeled genomic DNA, wherein the genome DNA is cleaved at one or more cleavage sites, and one or more insertion sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites.
It would be understood that a person skilled in the art is able to select a condition (e.g., optimal temperature, pH, reaction time, concentration) suitable for the Cas nuclease to cleave the DNA and for the reverse transcriptase to synthesize a new strand of DNA.
In some embodiments, the insertion sequence RT template is located at the 3’ end of the guide RNA.
In some embodiments, the Cas nuclease is Cas9 or its variants or mutants of any of the variants.
In some embodiments, the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants and mutant forms.
In some embodiments, the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
In some embodiments, the insertion sequence RT template is of SEQ ID NO: 113.
In some embodiments, the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
In some embodiments, the insertion sequence template comprises about 10 to 30 nucleotides.
In some embodiments, the guide RNA has a structure of spacer-scaffold-insertion sequence template-primer binding site (PBS) , from 5’ to 3’ end, in which the spacer is able to bind the target gene on the genome, the scaffold is able to bind the Cas nuclease, the primer binding site is able to bind the non-complementary strand of the target gene, and the insertion sequence template is the reverse transcription template for the reverse transcriptase.
In some embodiments, the Cas and the reverse transcriptase form a Cas-RT fusion protein.
In some embodiments, the Cas-RT fusion protein is encoded by a sequence of SEQ ID NO: 116.
In some embodiments, the Cas cleavage sites could be either on-target or off-target. When a Cas nuclease binds to a genetic locus that has a sequence exactly the same as the target gene, the cleavage site created there is an on-target cleavage site. Otherwise, the cleavage site is an off-target site.
In another aspect, the present disclosure provides a method for detecting Cas9 cleavage sites and detecting DNA translocation at those cleavage sites in genomic DNA of a cell, comprising (a) obtaining a labeled genomic DNA with a method described herein, (b) targeting and amplifying one or more labeled portions of the genomic DNA, (c) sequencing the amplified portion of the genomic DNA, and (d) analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocations at the Cas cleavage sites.
It would be understood that a person skilled in the art is able to select a condition (e.g., optimal temperature, pH, reaction time, concentration) suitable for the Cas nuclease to cleave the DNA and for the reverse transcriptase to synthesize a new strand of DNA.
In some embodiments, the one or more amplified portions of the genomic DNA in step (b) each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more insertion sequences.
In some embodiments, the method could be used to identify Cas nuclease off-target sites by comparing the Cas cleavage sites identified by the method disclosed herein with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site. It would be understood that, based on the method disclosed herein, those of ordinary skill in the art are able to locate the cleavage sites on the genome with readily available tools such as Burrows-Wheeler Aligner (BWA) .
In some embodiments, the genomic DNA is processed by Tn5 tagmentation before amplification. Tagmentation uses a hyperactive variant of the Tn5 transposase that mediates the fragmentation of double-stranded DNA and ligates synthetic oligonucleotides at both ends (Adey et al. 2010) . Wild-type Tn5 transposon is a composite transposon in which two near-identical insertion sequences (IS50L and IS50R) are flanking three antibiotic resistance genes (Reznikoff 2008) . Each IS50 contains two inverted 19-bp end sequences (ESs) , an outside end (OE) and an inside end (IE) . However, wild-type ESs have a relatively low activity and were replaced in vitro by hyperactive mosaic end (ME) sequences. A complex of the transposase with the 19-bp ME is thus all that is necessary for transposition to occur, provided that the intervening DNA is long enough to bring two of these sequences close together to form an active Tn5 transposase homodimer (Reznikoff 2003) . Transposition works through a “cut-and-paste” mechanism, where the Tn5 excises itself from the donor DNA and inserts into a target sequence, creating a 9-bp duplication of the target (Schaller 1979; Reznikoff 2008) . After tagmentation, the reduced-cycle PCR is then performed to amplify the fragments that are tagged by Tn5 at both ends. Any Tn5 tagmentation platform or kits, and their variants or mutants of any of the variants could be used in the present disclosure, such as Nextera DNA kits and on-bead tagmentation.
In some embodiments, the genomic DNA is processed by Tn5 tagmentation before amplification, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases. UMI is a type of molecular barcoding that provides error correction and increased accuracy in sequencing data analysis. The molecular barcodes are short sequences used to uniquely tag each molecule in a sample library. The UMI-included adapters are embedded into Tn5 so that dsDNA fragments after tagmentation are tagged with these UMI-included adapters, which could be used to eliminate PCR duplicates from the sequencing data.
In some embodiments, the genomic DNA comprises the insertion sequence or a portion of the insertion sequence is targeted and enriched by a method selected from PCR, or a hybrid capture-based target enrichment method. Hybrid capture-based target enrichment method that can be used herein includes, but not limited to, biotinylated oligonucleotide baits.
Enriching only the interested portion of the genomic DNA before sequencing could significantly save cost and increase accuracy of the sequencing result. So far, polymerase chain reaction (PCR) is widely used for target enrichment. A set of flanking primers anneal at the outer regions of the DNA sequence of interest, and therefore, unwanted DNA are not amplified. Another available group of methods for target enrichment is hybrid capture-based methods. One commonly used hybridization capture tag uses a biotinylated oligonucleotide bait. Any methods that can effectively enrich a targeted portion of the genomic DNA can be used herein.
In some embodiments, the enrichment is performed by two rounds of PCR, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
In some embodiments where PCR is performed, the 3’ end of the primers that bind to the insertion sequence are at least 2-bp away from the insertion boundary so that the extension sequence information can be used to filter out random priming reads (see Fig3B) . If the primer correctly binds to the insertion sequence, there would be at least 2 bp at the beginning of the extension sequence that are complementary to the insertion sequence. The insertion boundary described herein is the first and last base pair of the insertion sequence.
The methods disclosed herein could be used in vitro, in cellulo, or in vivo.
DNA translocation at Cas cleavage sites
DNA translocation is also referred to as chromosome translocation, or chromosome rearrangement. In a translocation, a segment from one chromosome is transferred to a nonhomologous chromosome or to a new site on the same chromosome. Chromosomal translocations appear to arise from improper repair of DNA double-strand breaks (DSBs) , which are highly toxic lesions. The “guardians” of genome integrity mostly ensure reliable repair of DSBs; also, unrepaired DSBs can lead to apoptosis or senescence. However, imprecise repair of DSBs has the potential to be highly deleterious, as it can lead to genome instability, including the formation of chromosomal rearrangements. In particular, chromosomal translocations can arise when DNA ends from DSBs on two heterologous chromosomes are improperly joined. (Scott et al., 2000)
As shown in certain proposed models (Fig4B) , in the currently disclosed methods where an insertion sequence is added to the cleavage site, DSB created may generate three types of ends from each cleavage site (Receiver Site or Donor Site) , including one original upstream end, one original downstream end, and one upstream end appended with a complete or partial tag insertion. If multiple DSBs simultaneously happened in the nucleus and physically proximal to each other, DSB ends from different breaking points might join together and cause genome rearrangements. In the proposed models, the upstream end of a Donor Site may bring a reversely placed insertion sequence to the upstream end of a Receiver Site (Fig4B, model (v) ) , and this joining would generate an amplicon from the upstream end of the Receiver Site, which usually won’ t be amplified by the F-primer.
These rearrangements, which are not detectable by other CRISPR off-target identification techniques, would cause severe chromosome aberrant, including large fragment deletion, inversion, and translocation. The directional insertion in the methods disclosed herein and the resulted PCR amplicons can be used as indicators for chromosome rearrangements, as it can distinguish whether the amplicon came from the expected DSB ends.
Other applications
In an aspect, the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising (a) identifying the off-target sites for Cas cleavage using each of the guide RNAs with a method disclosed herein, and (b) determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
In another aspect, the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants and mutants comprising (a) identifying the off-target cleavage site for each of the Cas nuclease variants and mutants with a method disclosed herein, and (b) determining the relative specificity of the Cas nuclease variants and mutants based on the total number of off-target sites identified for each of the Cas nuclease variants and mutants, wherein a Cas nuclease variant or mutant having fewer off-target sites is more specific than a Cas nuclease variant or mutant having more off-target sites.
In another aspect, the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising (a) identifying the off-target cleavage site and DNA translocation for each of the guide RNAs with a method disclosed herein, and (b) determining the relative genotoxicity of the guide RNAs based on the total number of off-target sites and DNA translocation identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and DNA translocation.
Sequencing
As used herein, “sequencing” includes any method of determining the sequence of a nucleic acid. Any method of sequencing can be used in the present disclosure, including chain terminator (Sanger) sequencing and dye terminator sequencing. In preferred embodiments, Next Generation Sequencing (NGS) is used. NGS is a high-throughput sequencing technology that performs thousands or millions of sequencing reactions in parallel. Although different NGS platforms use varying assay chemistries, they all generate sequence data from a large number of sequencing reactions run simultaneously on a large number of templates. Typically, the sequence data is collected using a scanner, and then assembled and analyzed bioinformatically. Thus, the sequencing reactions are performed, read, assembled, and analyzed in parallel. See e.g. Voelkerding et al., Clinical Chem., 55: 641-658 (2009) ; MacLean et al., Nature Rev. Microbiol., 7: 287-296 (2009) . Some NGS methods require template amplification and some do not. Amplification-requiring methods include pyrosequencing; the Solexa/Illumina platform, and the Supported Oligonucleotide Ligation and Detection (SOLID) platform. Methods that do not require amplification include single-molecule sequencing methods, nanopore sequencing, HeliScope, real-time sequencing by synthesis, single molecule real time (SMRT) DNA sequencing methods using zero-mode waveguides (ZMWs) and others. Alternatively, hybridization-based sequence methods or other high-throughput methods can also be used, e.g., microarray analysis, NANOSTRING, ILLUMINA, or other sequencing platforms.
It would also be understood that those of ordinary skill in the art are able to select suitable conditions, reagents, and instruments to carry out the sequencing.
Cells
The methods described herein can be used in any cell that is capable of repairing a DSB in genomic DNA and synthesizing new strand of DNA based on a template. The two major DSB repair pathways in eukaryotic cells are homologous recombination and non-homologous end joining (NHEJ) . The methods could be performed in cells capable of any of the repair pathways.
Examples
Example 1 Identification of preferred insertion sequences using a PEAC-seq technique
The off-target sites could happen anywhere in the genome and that brings big challenges to detect these sites comprehensively if without enrichment. Strategies like GUIDE-Seq, applying a unique exogenous sequence to tag and enrich the cleavage sites, have been proved effective to identify the off-targeting sites in cellulo. However, the addition of exogenous double-stranded oligodeoxynucleotides (dsODN) limited its application in vivo. In the present disclosure, the templated information on the 3’ of guide RNA were employed to tag and enrich the edited genomic sites, which avoided the dsODN transfection. It is contemplated that Cas9-MMLV should have higher insertion efficiency than Cas9n-MMLV since the double strand breaks can help incorporate the templated reverse transcription sequences. Starting with inserting two nucleotides via Cas9-MMLV, it was found that the insertion efficiency of Cas9-MMLV was indeed comparable to the indel efficiency of Cas9 and much higher than the Cas9n-MMLV (Fig6A) . To increase the length of the insertion sequence, the template sequences were optimized to identify appropriate insertion sequences.
To systematically identify preferred insertion sequences, an unbiased screen was conducted to investigate the sequence features that might influence the insertion efficiency (Fig1A) . Guide RNAs targeting the widely used VEGFA site3 (TS3) with random sequences were designed as the insertion template. Screens on guide RNAs with RT templates of 10 or 20 random nucleotides were performed. Since 20-bp should have provided enough nucleotides for specific primer binding for PCR enrichment, longer nucleotides were not tested further, although longer nucleotides sequences can be used. The length distribution of insertions across different libraries (Fig6B, 6C) was first examined. In the case of a 10N library (atemplate comprises 10 random nucleotides) , 4~6 bp insertions were most abundant and accounted for more than half of insertion reads. For the longer template 20N (atemplate comprises 20 random nucleotides) , one bp insertion was most abundant, and about 38.1%reads represented insertions around 15~22 bp. Next, the nucleotide composition of these insertions was examined. Interestingly, the guanine is disfavored at nearly all positions, especially at the 5' and 3’ of the insertion sequence (Fig1B, Fig7) . And the preference of nucleotide composition was consistent across the different length of insertion.
To further validate the nucleotide preferences, direct comparisons were conducted among polymers only composed of one kind of nucleotide. Ten consecutive nucleotides, including polyG, polyC and polyA were used as RT templates in different guide RNAs. PolyT was not included as it terminates the transcription of Polymerase III from U6 promoter. As expected, the G insertion (from polyC template) is significantly less efficient than the C and T insertions (Fig1C) , which echoed the observation of G depletion from the random sequence screen. Together, these results demonstrated strong sequence preferences to the inserted nucleotides, and when designing the guide RNA templates, C nucleotide should be avoided from the RT template preferably.
Preparation of insertion sequence library
A dinucleotide (CA) insertion test was conducted to target VEGFA TS3 site (SEQ ID NO: 127) . The HEK-293T cells were seeded in 12-well plate and grown until ~80%confluency. Each well was transfected with 2.5 μg plasmids (1.875 μg of the pCMV Cas9 plasmid (SEQ ID NO: 11) , or the pCMV Cas9-MMLV plasmid (SEQ ID NO: 12) , or the pCMV Cas9H840A-PE2 (SEQ ID NO: 13) plasmids and 0.625 μg PE-GUIDE test2 gRNA plasmids (SEQ ID NO: 10) ) by Lipofectamine 3000. The genomic DNA was extracted 48h post transfection. 1μg of gDNA was used as the template to amplify the insertion regions with primer T3 sanger F (SEQ ID NO: 7) and T3 sanger R (SEQ ID NO: 8) . The PCR products were sent for sanger sequencing after gel extraction. And the returned ab1 files were analyzed for gene editing results with Synthego ICE Analysis (https: //ice. synthego. com/) .
Oligos with different lengths of random sequences (10N (SEQ ID NO: 2) and 20N (SEQ ID NO: 1) ) were synthesized. In the following steps, the 10N and 20N were used as two independent samples. Each oligo template was amplified by opti-oligoF (SEQ ID NO: 3) and opti-oligoR (SEQ ID NO: 4) primers in four 50μL reactions. The products from four reactions were combined, size-selected on an agarose gel, and purified by 1.8x AMPure XP beads (Beckman #A63881) . The purified PCR products were cloned into a backbone vector (SEQ ID NO: 9) via Golden Gate assembly (GGA) . The GGA reaction was performed in 50μL volume with the following components: 50 fmol backbone, 150 fmol insertions, 0.5μL T4 DNA Ligase (Thermo #EL0014) , 1μL BsmBI (Thermo #ER0451) , and 1x T4 Buffer. The reaction was conducted as 90 cycles of 37℃ 5mins and 22℃ 5mins; 65℃ for 30mins; and 37℃ for 3 hours. The ligation products were purified by 0.8x AMPure XP beads and transformed into the NEB stable electroporation competent cells (NEB #C3040H) following the manufacturer’s instruction. The electroporation was performed on Eppendorf Eporator. The transformed cells were propagated on 24.5*24.5cm plates at 30℃ for 20 hours. Colonies were collected, and plasmids were extracted using the QIAGEN Plasmid Plus Midi Kit (QIAGEN #12943) following the manufacturer’s instructions.
HEK-293T cells were seeded in T75 flasks, and the transfection was conducted when the confluency was around 80%. For each T75 flask, 80μg plasmids were transfected by Lipofectamine 3000 (Thermo #L3000075) following the manufacturer’s instructions. Cells were collected after 48hrs of transfection. The gDNA was collected, and 1μg gDNA was used as the template in 50μL reaction to amplify the targeted sequence (VEGFA Site3) . In the nested PCR, site-specific primers T3-ON-F (SEQ ID NO: 5) and T3-ON-R (SEQ ID NO: 6) were used in the 1st step; then 2.5μL product from the 1st PCR was used as the template in the 2nd PCR and amplified by the universal primers seq-F2 (SEQ ID NO: 128) and seq-R2 (SEQ ID NO: 129) . The library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
The polyN library screen was conducted in similar conditions. The plasmids expressing polyG (10G) (SEQ ID NO: 14) , polyC (10C) (SEQ ID NO: 15) , and polyA (10A) (SEQ ID NO: 16) insertion gRNA was synthesized and transfected into HEK293T with the methods described above. PolyT (10T) was not included since it terminates the Pol III transcription from U6 promoter. The genomic DNA was extracted and used for insertion region amplification with nested PCR. Site-specific primers were used in the 1st round PCR, and 2.5μL product was used as the template in the 2nd round of PCR. The amplicons were purified by AMPure XP beads using 0.5x+0.6x double size selection. The library was sequenced on the Illumina Novaseq platform as paired-end 150bp. And the sequencing data was processed following the manual of CRISPResso2.
Data analysis
The Amplicon-NGS data was processed by CRISPResso2 (Pinnello et al., Nat Biotechnol 34, 2016) , with parameters “--max_paired_end_reads_overlap 140, --min_paired_end_reads_overlap 10, --exclude_bp_from_left 0, --exclude_bp_from_right 0, --plot_window_size 30, and --min_frequency_alleles_around_cut_to_plot 0.1” .
For library insertion assay, to visualize the patterns of sequences that were integrated into the genome, sequencing reads mapped to the targeted regions were used to quantify the base compositions of the integrated sequences in 20-nt windows with the customized scripts.
Example 2 Identification of preferred insertion sequences using Cas9n-MMLV
This experiment investigated whether the sequence preferences of the insertion also apply to the native Cas9n-MMLV Prime Editor system. Liu and colleagues tested the insertion efficiencies of three tags with varied lengths, including His (6x) tag, Flag tag, and LoxP tag, and the His (6x) exhibited the highest insertion efficiency in some sites (~80%) . Since it is also the shortest one among the three tags, the tag length was considered one contributor to sequence insertion efficiency. Surprisingly, when looking up these three tags' nucleotide composition, it was noticed that the His (6x) indeed has no guanine. To further examine the impact of sequence composition in these tags, both the wild-type (unmodified) and the nucleotide composition modified forms of His (6x) tag, Flag tag and LoxP tag were used as RT templates of guide RNA and their insertion efficiencies were quantified by Amplicon-NGS (Fig2A) . The insertion efficiency of the WT and the modified His tags at three different genomic loci (Fig2B) were first tested. Compared to His1-WT and His2, which have no G nucleotide, the His3 composed 44.4%G, and its insertion efficiency dramatically decreased to 35.4%, 5.9%, and 28.3%of its WT form at PRNP, RNF2, and EMX1 loci, respectively (Fig2B &Fig2D) . In comparison, the increase of C nucleotide (His2) didn’ t change the efficiencies significantly. To further validate the influence of G nucleotide, the inventors modified LoxP tag and Flag tag and compared the efficiencies of their WT and the modified forms at HEK3 loci (Fig2C) . The efficiencies of His tags and LoxP tags echoed the patterns at the previous sites. For the Flag tag, the inventors tuned the G%by slightly decreased (Flag2) , increased (Flag3) , and completely depleted (Flag4) . Interestingly, by completely removing G nucleotide in Flag4, it’s observed that the insertion efficiency dramatically boosted to 2-fold of its WT form, which again indicated that the C-depleted RT template of guide RNA tends to be highly efficient. It’s noticed that the insertion efficiency of the Flag3 at HEK3 site also increased, which implied other unrevealed factors might also contribute to the insertion outcome. Together, these results confirmed that the G-depleted insertion sequence is preferred at different genomic locations and at different sequence contexts, allowing us to utilize this feature to design the insertion sequence to enrich the off-target sites at unknown genomic loci. Noticeably, this conclusion can be applied to other PE-mediated genome insertion and it not limited to off-target detection applications.
Tag insertion NGS library preparation
Oligos were designed to insert both the wild-type and modified forms of His (6x) tag (SEQ ID NO: 117-119) , Flag tag (SEQ ID NO: 120-122) and LoxP tag (SEQ ID NO: 123-126) at four genomic loci (HEK3, PRNP, RNF2 and EMX1) . Sequences were modified to achieve different base contents, and the gRNA expression vector (Table 1) was manufactured by General Biol. All plasmids were prepared by a QIAprep Spin Miniprep Kit (QIAGEN #27106) . The HEK-293T cells were seeded in 12-well plate and grown until ~80%confluency. Each well was transfected with 5.3μg plasmids (4μg tag plasmids and 1.3μg nicking gRNA plasmids) by Lipofectamine 3000. The post-transfection cells were collected after 48 hours. The cell sorter (SONY MA900) was used to sort about 100,000 double-positive cells (mKate2 positive representing the tag plasmids and GFP positive representing the nicking gRNA plasmids) . For each sample, gDNA was extracted, and 1ug of gDNA was used as the template to amplify the insertion region by nested PCR. Site-specific primers (Table 1) were used in the 1st round PCR, and 2.5μL product was used as the template in the 2nd round of PCR. The primers used in the 2
nd round of PCR are chosen from a set of P5 index primers (SEQ ID NO: 48-55) and a set of P7 index primers (SEQ ID NO: 56-67) . One P5 index primer and one P7 index primer are used in the 2
nd round PCR. Any one of the P5 index primers could be used together with any one of the P7 index primers. The amplicons were purified by AMPure XP beads using 0.5x+0.6x double size selection. The library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
Example 3 Unbiased detection of Cas9 off-target sites by a PEAC-seq technique
A 21-nt cytosine-depleted sequence was designed based on the findings in Example 1 and Example 2 as an insertion sequence RT template at the 3’ of guide RNAs. It’s reasoned that the C-depleted template could result in high insertion efficiency, and the 21-nt would provide enough length for primer priming, which both will enhance the enrichment for the editing sites. In order to efficiently enrich the on-target and off-target sites from unknown locations around the genome, we adapted similar priming strategies as GUIDE-seq, but used Tn5 tagmentation instead of sonication for streamlined workflow and low starting DNA requirement (Fig3A) . The UMI-included adaptors were embedded in Tn5 to eliminate PCR duplications from the sequencing data.
During the library preparation, one of the biggest challenges was to effectively enrich the insertion sequences, whose length might be varied. The insertion was reverse transcribed and extended alongside the RT template, and both partial and full-length reverse transcribed products might exist. Therefore, primers need to be carefully designed to enrich sites with different insertion lengths. To tackle this challenge, different primer sets were designed, and their performances and consistency were evaluated. Three forward primers and two reverse primers with different extension starting points on the insertion were included (Fig3B) . It is worth pointing out that all the primers were designed at least 2-bp away from the insertion boundary so that the extension sequence information could be used to filter out random priming reads (Fig3B) . PCR enrichment was conducted using all these designed primers and the priming efficiency and specificity were evaluated.
The performance of PEAC-seq was evaluated by six sites (VEGFA TS1 (SEQ ID NO: 130) , VEGFA TS2 (SEQ ID NO: 131) , VEGFA TS3 (SEQ ID NO: 127) , EMX1 (SEQ ID NO: 132) , RNF2 (SEQ ID NO: 133) , and FANCF (SEQ ID NO: 134) ) that have been tested in multiple studies. (Kim et al., Nat Methods 12, 2015; Kim et al., Genom Res 28, 2018; Tsai et al., Nat Methods 14, 2017; Cameron et al., Nat Methods 14, 2017; Tsai et al., Nat Biotechnol 33, 2015) A modified GUIDE-seq analysis pipeline was used to rank and filter the called cleavage sites and compared the lists generated from the different primer sets. Since the GUIDE-seq pipeline used priming information from both the top and the bottom strands of the insertion, six candidate lists were generated from the three forward primers and two reverse primers (Fig. 8) . The F1/R2 list was chosen in this Example for the following analysis, as they showed the best consistency across all six lists (Fig 17A-F, Fig9-14) . Other primer sets can also be used for the method disclosed herein.
At VEGFA site1 (TS1) , PEAC-seq identified 16 cleavage sites including the on-target one, and 14 of cleavage sites were also reported by GUIDE-seq (Fig3C-E, Fig8) . Noticeably, top candidate sites were highly consistent between PEAC-seq and GUIDE-seq (Fig3D) , and the number of NGS reads that quantitatively represented the editing frequency were highly correlated (Fig3F) . Although the PEAC-seq-unique off-target sites (PEAC15 and PEAC16) were represented by a smaller number of reads, their signal tracks were very similar to the on-target and the shared off-target sites (Fig3E) . Also consistent with previous reports, the number of mismatches also contributed to the off-target editing. The off-target sites that both PEAC-seq and GUIDE-seq reported composed a smaller number of mismatches than off-target sites unique to one method (Fig3G) . Furthermore, the inventors also examined whether the position of mismatches on the gRNA sequence might affect the off-targets identification, especially in the PBS (primer binding site) that is crucial to reverse transcription. To do that, the inventors plotted the mutation frequency to the sites grouped as “Overlap” , the “PEAC-seq unique” , and the “GUIDE-seq unique” along the gRNA and PAM sequences (Fig3H) . The patterns across different groups were quite consistent in TS2 (81 sites) and TS3 (35 sites) , but a bit fluctuated in TS1 (24 sites) . Although TS1 sites had a fewer sample size than the other two, this result indicated that the sensitivity of PEAC-seq might be affected by mismatches in PBS region. Nevertheless, off-target identification of the TS3 gRNA seemed more tolerant to PBS mutations, which indicated that the extent of the influence might be site-specific.
PEAC-seq cell line data generation and NGS library preparation
The Cas9 nickase was replaced by wildtype Cas9 and GTATGAGGTTGGTGGATTGGT (SEQ ID NO: 113) was used as the RT template of gRNA. They were assembled into a single vector as the PEAC-seq backbone. The spacer sequences targeting VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RFN2 and FANCF were cloned into the PEAC-seq backbone individually, thus the plasmids of SEQ ID NO: 68-73 were obtained, respectively. To conduct PEAC-seq in living cells, HEK-293T cells were seeded in a 12-well plate and grown until ~80%confluency. Each well was transfected with 3μg plasmids by Lipofectamine 3000. The post-transfection cells were collected after 48 hours. The cell sorter (SONY MA900) was used to sort about 100,000 GFP positive cells. About 500ng extracted gDNA was digested with NotI then cleaned up with 0.5x AMPure XP beads to remove the carryover plasmids. The gDNA fragments were retained on the AMPure XP beads, and on-beads Tn5 digestion was performed at 55℃ for one hour and adaptors were inserted at the ends of the fragments. The Tn5 was expressed and embedded with the adaptors in-house. Tn5 primeA (SEQ ID NO: 74) and PegTn5 primeB (8N) (SEQ ID NO: 75) were used for Tn5 assembly. At the end of the Tn5 digestion, 6μL 0.2%SDS was added to terminate the reaction. The products were purified and size-selected by 1.5x AMPure XP beads and eluted in 50uL H
2O. The 21bp insertion sequence was used to enrich the editing sites (both on-target and off-target) in the NGS library preparation. In the 1st round of the PCR, two separate reactions were performed. Each reaction used a 20μL template in a total of 50uL volume at ~30 cycles. One used the PEAC-seq insertion sequence as the forward primer binding site and the downstream Tn5 adaptor as the reverse primer binding site. Another used the upstream Tn5 adaptor as the forward primer binding site and the PEAC-seq insertion sequence as the reverse primer binding site. 2.5μL1st round product was used as the template in the 2nd round amplification in a total of 50μL volume for 17 cycles, and Illumina adaptors were added. The amplicons were purified by AMPure XP beads using 0.6x+0.25x double size selection. The library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
Primers
Off target universal primer (SEQ ID NO: 76) is downstream Tn5 adaptor.
The performance of PEAC-seq was evaluated by six sites (VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RNF2, and FANCF) . For each of these sites, there are 5 primer sets for library preparation: PEAC 15F (F1) and Off target universal primer; PEAC 18R (R2) and Off target universal primer; PEAC 18R (R1) and Off target universal primer; PEAC 5F (F3) and Off target universal primer; PEAC 10F (F2) and Off target universal primer.
PEAC 15F (F1) , PEAC 18R (R2) , PEAC 18R (R1) (SEQ ID NO: 77-79) were fully contained in a PEAC 21bp insertion sequence and were shared within different sites. 15F means a forward primer matching the first 15 bp of the PEAC 21bp insertion sequence. 18R means a reverse primer matching the last 18 bp of the PEAC 21bp insertion sequence. 19R means a reverse primer matching the last 19 bp of the PEAC 21bp insertion sequence.
PEAC 5F (F3) and PEAC 10F (F2) were partially contained in the PEAC 21bp insertion sequence (5F means 5 bp contained, 10F means 10 bp contained) , while the remaining part of PEAC 5F (F3) and PEAC 10F (F2) matched the specific edit sites. So each of the six sites (VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, RNF2, and FANCF) had its own PEAC 5F (F3) and PEAC 10F (F2) primers and listed as SEQ ID NO: 80-91.
Data analysis
The PEAC-seq data was analyzed using a modified pipeline from GUIDE-seq. Firstly, adapters were trimmed using cutadapt (Martin M., EMBnet 17, 2011) , and reads without appropriate adapter was removed. Then the reads were mapped to the human or mouse genome (hg38, mm10) using bwa (Li H., Bioinformatics 28, 2012) . Reads mapped to the same location and shared the same UMI were considered as PCR duplicates and merged in the following analysis. In order to fit in the target identification pipeline from GUIDE-seq, the reads name from bam files were modified, and the bam files from the forward and backward PCR were labeled and merged. Modifications were made to the pipeline to remove reads originated from random priming. In summary, the reads number from the GUIDE-seq output file was normalized to reads per million and the number of reads were calculated with correct primer extension. The candidate sites meet the following criteria: (1) no signal in the wild-type control sample; (2) the number of reads with correct primer extension sequence >= 1 at least at one direction, and the geometric mean of the primer extension reads > 0; (3) correct reads strand information on both the upstream and downstream of the putative gRNA cutting site.
Example 4 Detection of DNA translocation at Cas9 Cleavage Sites by PEAC-seq
According to the PEAC-seq design, the forward primer ( “F-primer” hereafter, the F1, F2, or F3 in Fig3B) amplified regions downstream, but not upstream, of the double strand break (DSB) (Fig3A &Fig3D) . Surprisingly, in some cases, it’s noticed that there were upstream signals from the F-primer amplicon (Fig4A, Fig8) . With further analysis of these sites, these amplicon signals might have come from the joining of DSB ends from different breaking sites. As shown in the proposed models (Fig4B) , DSB could generate three types of ends from each cleavage site (Receiver Site or Donor Site) , including one original upstream end, one original downstream end, and one upstream end appended with a complete or partial tag insertion. If multiple DSBs simultaneously happened in the nucleus and physically proximal to each other, DSB ends from different breaking points might join together and cause genome rearrangements. In a proposed model, the upstream end of a Donor Site would bring a reversely placed PEAC-seq insertion to the upstream end of a Receiver Site (Fig4B, model (v) ) , and this joining would generate an amplicon from the upstream end of the Receiver Site, which usually won’ t be amplified by the F-primer.
These rearrangements, which are not detectable by other CRISPR off-target identification techniques, would cause severe chromosome aberrant, including large fragment deletion, inversion, and translocation. Theoretically, the directional PEAC-seq insertion and the resulted PCR amplicons could be used as indicators for chromosome rearrangements, as it could distinguish whether the amplicon came from the expected DSB ends. To validate this, primers located upstream of the F-primer were designed (SEQ ID NO: 92-95) , which could pair with downstream Tn5 primers to spot the sequences of the Donor sites (Fig4C) . Noteworthy, a successful amplification bridging the Donor and the Receiver sites do not require the existence of the PEAC-seq insertion, which allowed us to comprehensively estimate the various rearrangement patterns between the Donor and the Receiver sites. We conducted validation experiments by anchored multiplex PCR (AMP) at two sites (Fig16) . We identified large-scale chromosome rearrangements between the receiver sites and the PEAC-seq off-target sites. Among these rearrangements, we identified three types of translocations that matched the proposed models (Fig4D) . Both the upstream end (Fig4D (iii) ) and the downstream end (Fig4D (iv) ) of the Donor sites could join with the upstream end of the Receiver site. This joining could happen either with or without the PEAC-seq insertion (Fig4D (iii) , (iv) & (v) ) . Interestingly, the frequencies of DNA translocation were varied across different sites (Fig4E) , and it did not necessarily happen between DSB ends with high indel frequencies. For example, among the PEAC-seq sites of VEGFA TS3, the on-target site (chr6: 43769716-43769739) didn’ t show detectable translocation in our data, while at another off-target sites (chr22: 37266776-37266799) , 34.7%reads are from DNA translocations. These results indicated that the PEAC-seq technique can successfully identify chromosome translocations and help us to evaluate the CRISPR off-target more comprehensively, which could further enable the safety evaluation of CRISPR applications.
NGS library preparation
To identify the translocated sequences, two nested PCR primers upstream of the guide RNA were designed. The site-specific nested PCR primers were served as forward primers, and downstream Tn5 primer was served as reverse primer. The nested primers were sequentially used to amplify the adjacent sequences of translocated DSBs. About 300 ng PEAC-seq gDNA was fragmentized by Tn5, purified with 1.5x AMpure XP beads and eluted with 23μL H
2O. About 20μL purified DNA was used as template for the 1
st round PCR for 20 cycles. And 2.5μL products from the 1
st PCR was used as template for another 20 cycles in the 2
nd round of the nested PCR. Another 20-cycles PCR was conducted to add the sequencing adaptors. The amplicons were purified by 0.6x then 0.25x double-size beads selection. The library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
Data analysis
In the DNA translocation analysis, the reads number and reads orientation were summarized from the forward PCR libraries around the on-target and candidate off-target sites. A DNA translocation score was calculated as “translocation reads number” / ( “normal reads number” + “translocation reads number” + pseudocount) . A pseudocount10 was used in the dominator during the calculation.
Example 5 Detection of off-target Cas9 cleavage sites in vivo
PEAC-seq used the templated information on gRNA to insert tag sequences but not using exogenous tags. This straightforward procedure allowed us to investigate its application in vivo. To evaluate the in vivo performance, mice embryos were edited at the pronuclear stage by injecting in vitro transcribed Cas9-MMLV mRNA and guide RNAs targeting Pcsk9 and Pnpla3. Embryos were collected around E14.5 to E21 and off-target lists were generated by conducting the PEAC-seq (Fig5, Fig15) . One Pcsk9 on-target and one off-target from Embryo # 5 and Embryo # 12 were identified, which both had been previously reported by DISCOVER-seq9 (Fig5B-D, Fig 18A) . The PEAC-seq was also conducted on another in vivo CRISPR therapy target Pnpla3 using the same strategy. Three cleavage sites, including the on-target site, were identified by PEAC-seq from each embryo (Fig15, Fig 18B) . The small number of the cleavage sites called by PEAC-seq might be relevant to the short editing window when using mRNA injection in embryo. Besides, as the cells examined all came from the clonal expansion of a small number of cells during embryogenesis, the number of off-target editing events is limited so that the spectrum of editing couldn’ t be fully represented. Although other systems might need to be tested, these data have served as a proof-of-concept that PEAC-seq would be directly used in in vivo off-target identification.
RNA preparation
Both the guide RNA and the mRNA of Cas9-MMLV were prepared by in vitro transcription. The DNA template of guide RNA was amplified from the plasmids “pcsk9-gRNA” (SEQ ID NO: 96) and “mPnpla-gRNA” (SEQ ID NO: 97) by T7 forward primers (SEQ ID NO: 99 for Pcsk9, and SEQ ID NO: 101 for Pnpla3) and T7 reverse primers (SEQ ID NO: 100 for Pcsk9, and SEQ ID NO: 102 for Pnpla3) . The PCR products were gel purified using MinElute Gel Extraction Kit (QIAGEN #28606) , which was used as the template for in vitro transcription by HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB #E2050S) . The pCMV-Cas9-MMLV plasmid (SEQ ID NO: 98) was linearized by MssI (Thermo #FD1344) . According to the manufacturer’s instructions, 1μg linearized product was used as a template to generate Cas9-MMLV mRNA from in vitro transcription by HiScribe T7 ARCA mRNA Kit (NEB #E2060S) .
Superovulation and embryo collection
C57BL/6 and ICR mice were purchased and housed in the Laboratory Animal Resource Center (LARC) at the Westlake University. The LARC is a certified pathogen-free and environmental-control facility (21±2℃, 55±15%humidity and 12: 12-h light: dark cycle) . The C57BL/6 mice were used for embryo collection, and ICR females were used as recipients. All animal experiments were conducted under the protocol approved by the animal care and ethical committee of the Westlake University.
Six-week-old C57BL/6 female mice were superovulated by injecting 5 IU of PMSG (Pregnant Mare Serum Gonadotropin; ProSpec #HOR-272) , then followed by 5 IU of hCG (human chorionic gonadotropin; ProSpec #HOR-250) after 48 hours. The C57BL/6 females were then mated to 8-week-old C57BL/6 males. After 16 hours, fertilized embryos were collected and placed in EmbryoMax M2 Medium with Hyaluronidase (Millipore #MR-051-F) . After the cumulus cells fell off, embryos were transferred into a dish containing 2mL of fresh M2 medium (Millipore #MR-015-D) . Embryos were then flushed several times to rinse off the hyaluronidase and cumulus cells. Afterward, embryos were transferred into a dish with prewarmed KSOM medium (Millipore #MR-106-D) covered by mineral oil followed by three additional washes.
Zygote injection, embryo culturing, and embryo transplantation
The mixture of Cas9-PE2 mRNA (100ng/μL) and guide RNA (50ng/μL) was injected into the cytoplasm of the zygote in M2 medium. The injection was conducted using a microinjector (NARISHIGE #IM-400B) with constant flow settings. The injected embryos were cultured in KSOM medium with amino acids in a cell culture incubator at 37℃ and with 5%CO
2, then were transplanted into oviducts of pseudopregnant ICR females at 0.5 dpc. Pups were sacrificed at E14.5~E21, and organs were collected, dissected and snap-frozen in liquid nitrogen. Samples were stored at -80℃ until further analysis.
NGS library preparation
The gDNA from organs was extracted using TIANamp Genomic DNA Kit (TIANGEN #DP304-03) according to the manufacturer’s instructions. PCR was applied to amplify the targeting regions and attach the Illumina adaptors to amplicons. Primers used are SEQ ID NO: 103-112. The in vivo PEAC-seq library was constructed as described in Example 3 by Tn5 fragmentation.
Data access for Examples 1-5
GEO accessions: GSE179523, GSE179436, GSE179374
REFERENCE LIST
1. Kim, D. et al. Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat Methods 12, 237-243, 231 p following 243 (2015) .
2. Kim, D. &Kim, J.S. DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA. Genome Res 28, 1894-1900 (2018) .
3. Tsai, S.Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat Methods 14, 607-614 (2017) .
4. Cameron, P. et al. Site-seq: Mapping the genomic landscape of CRISPR-Cas9 cleavage. Nat Methods 14, 600-606 (2017) .
5. Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015) .
6. Yan, W.X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat Commun 8, 15058 (2017) .
7. Newby, G.A. et al. Base editing of haematopoietic stem cells rescues sickle cell disease in mice. Nature (2021) .
8. Musunuru, K. et al. In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates. Nature 593, 429-434 (2021) .
9. Wienert, B. et al. Unbiased detection of CRISPR off-targets in vivo using DISCOVER-Seq. Science 364, 286-289 (2019) .
10. Zuccaro, M.V. et al. Allele-Specific Chromosome Removal after Cas9 Cleavage in Human Embryos. Cell 183, 1650-1664 e1615 (2020) .
11. Liang, G. et al. Frequent gene conversion in human embryos induced by double strand breaks. BioRxiv (2020) .
12. Alanis-Lobato, G. et al. Frequent loss of heterozygosity in CRISPR-Cas9-edited early human embryos. Proc Natl Acad Sci U S A 118 (2021) .
13. Anzalone, A.V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019) .
14. Anderson, K.R. et al. CRISPR off-target analysis in genetically engineered rats and mice. Nat Methods 15, 512-514 (2018) .
15. Kim, H.K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat Biotechnol 39, 198-206 (2021) .
16. Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat Biotechnol 34, 695-697 (2016) .
17. Martin, M. Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads. EMBnet 17 (2011) .
18. Li, H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28, 1838-1844 (2012) .
19. Kleinstiver BP, Pattanayak V, Prew MS, et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature. 2016; 529 (7587) : 490-495. doi: 10.1038/nature16526
20. Slaymaker IM, Gao L, Zetsche B, Scott DA, Yan WX, Zhang F. Rationally engineered Cas9 nucleases with improved specificity. Science. 2016; 351 (6268) : 84-88. doi: 10.1126/science. aad5227
21. Adey A, Morrison HG, Asan, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, et al. 2010. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 11: R119.
22. Reznikoff WS. 2003. Tn5 as a model for understanding DNA transposition. Mol Microbiol 47: 1199–1206.
23. Reznikoff WS. 2008. Transposon Tn5. Annu Rev Genet 42: 269–286.
24. Schaller H. 1979. The intergenic region and the origins for filamentous phage DNA replication. Cold Spring Harb Symp Quant Biol 43: 401–408.
25. Karl V Voelkerding, Shale A Dames, Jacob D Durtschi, Next-Generation Sequencing: From Basic Research to Diagnostics, Clinical Chemistry, 55: 641-658 (2009)
26. MacLean et al., Application of ‘next-generation’ sequencing technologies to microbial genetics. Nat Rev Microbiol. Apr. 2009; 7 (4) : 287-96.
27. Scott et al., Chromosomal integration of the green fluorescent protein gene in lactic acid bacteria and the survival of marked strains in human gut simulations. FEMS Microbiol Lett 182, 23–27 (2000) .
Claims (41)
- A guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- The guide RNA of claim 1, wherein the insertion sequence RT template is located at the 3’ end of the guider RNA.
- The guide RNA of claim 1, wherein the insertion sequence RT template comprises a nucleotide sequence that is SEQ ID NO: 113.
- The guide RNA of claim 1, wherein the insertion sequence RT template is about 10 to 30 nucleotides.
- The guide RNA of claim 1, further comprising a primer binding site (PBS) that is capable of binding the non-complementary strand of a target gene.
- The guide RNA of claim 1, wherein the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- A complex comprising a Cas nuclease, a reverse transcriptase, and a guide RNA of any of claims 1-6.
- The complex of claim 7, wherein the Cas nuclease is selected from Cas9, its variants, and mutants of any of the variants.
- The complex of claim 7, wherein the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
- The complex of claim 7, wherein the Cas nuclease and the reverse transcriptase are formed as a fusion protein.
- The complex of claim 10, wherein the fusion protein is encoded by a sequence of SEQ ID NO: 116.
- A vector comprising a guide RNA of any of claims 1-6, a nucleotide sequence encoding a Cas nuclease, and a nucleotide sequence encoding a reverse transcriptase.
- The vector in claim 12, wherein the Cas nuclease is selected from Cas9, its variants and mutants of any of the variants.
- The vector in claim 12, wherein the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
- A kit for detecting Cas nuclease cleavage sites and DNA translocation in genomic DNA, comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and the guide RNA in any of the claims 1-6.
- The kit of claim 15, wherein the Cas nuclease is selected from Cas9, its variants and mutants of any of the variants.
- The kit of claim 15, wherein the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
- The kit in claim 15, wherein the Cas nuclease and the reverse transcriptase are encoded as a fusion protein.
- The kit of claim 18, wherein the fusion protein is encoded by a sequence of SEQ ID NO: 116.
- A method for labeling Cas9 nuclease cleavage sites in genomic DNA, comprising:a. providing a guide RNA wherein the guide RNA can bind to a target gene on the genome DNA and comprises an insertion sequence RT template,b. providing a complex comprising a Cas nuclease, a reverse transcriptase, and the guide RNA, andc. contacting the genomic DNA with the complex to obtain a labeled genomic DNA, wherein the genome DNA is cleaved at one or more cleavage sites, and one or more insertion sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites.
- The method in claim 20, wherein the Cas nuclease used is selected from Cas9, its variants, and mutants of any of the variants.
- The method in claim 20, wherein the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
- The method in claim 20, wherein the insertion sequence RT template comprises a nucleotide sequence wherein cytosine represents less than 25%of the nucleotides in the nucleotide sequence.
- The method in claim 20, wherein the insertion sequence RT template is located at the 3’ end of the guider RNA.
- The method in claim 23, wherein the insertion sequence template has a sequence of SEQ ID NO: 113.
- The method in claim 20, wherein the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
- The method in claim 20, wherein the insertion sequence template is about 10 to 30 nucleotides.
- The method in claim 20, wherein the guide RNA comprises, from the 5’ end to the 3’ end, a spacer, a scaffold, an insertion sequence template, and a primer binding site (PBS) .
- The method in claim 20, wherein the Cas nuclease and the reverse transcriptase are formed as a fusion protein.
- The method in claim 29, wherein the fusion protein is encoded by a sequence of SEQ ID NO: 116.
- The method in claim 20, wherein the Cas9 cleavage sites is on-target or off-target.
- A method of detecting Cas9 cleavage sites and detecting DNA translocation in genomic DNA, comprisinga. obtaining a labeled genomic DNA in accordance with any of the claims 20-31,b. targeting and amplifying one or more labeled portions of the genomic DNA,c. sequencing the amplified portion of the genomic DNA, andd. analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocations at the Cas cleavage sites.
- A method of claim 32, wherein the one or more amplified portions of the genomic DNA in step (b) each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more insertion sequences.
- A method for identifying off-target Cas cleavage sites, comprising comparing the Cas cleavage sites identified by the method in claim 32 with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site.
- The method of claim 32, wherein the labeled genomic DNA is processed by Tn5 tagmentation before enrichment, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
- The method of claim 32, wherein the labeled genomic DNA is targeted and enriched by PCR or a hybrid capture-based target enrichment method.
- The method of claim 36, wherein the enrichment is performed by two rounds of PCR, wherein in the first reaction the insertion sequence is used as the forward primer binding site and in the second reaction the insertion sequence is used as the reverse primer binding site.
- The method of claim 37, wherein the 3’ end of the primers that bind to the insertion sequence is at least 2-bp away from the insertion boundary.
- A method for determining the relative specificity of a plurality of guide RNAs comprisinga. identifying the off-target sites for Cas cleavage using each of the guide RNAs by a method in accordance with any of claims 32-38, andb. determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs,wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
- A method for determining the relative specificity of a plurality of Cas nuclease variants or mutants of any of the variants comprisinga. identifying the off-target cleavage site for each of the Cas nuclease variants or mutants of any of the variants by a method in accordance with any of the claims 32-38, andb. determining the relative specificity of the Cas nuclease variants or mutants of any of the variants based on the total number of off-target sites identified for each of the Cas nuclease variants or mutants of any of the variants, wherein a Cas nuclease variant or mutant having fewer off-target sites is more specific than a Cas nuclease variant or mutant having more off-target sites.
- A method for determining the relative genotoxicity of a plurality of guide RNAs comprisinga. identifying the off-target cleavage site and DNA translocation for each of the guide RNAs by a method in accordance with any of claims 32-38, andb. determining the relative genotoxicity of the guide RNAs based on the total number of off-target site and DNA translocation identified for each of the guide RNAs,wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and more DNA translocation.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2021/124025 WO2023060539A1 (en) | 2021-10-15 | 2021-10-15 | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2021/124025 WO2023060539A1 (en) | 2021-10-15 | 2021-10-15 | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023060539A1 true WO2023060539A1 (en) | 2023-04-20 |
Family
ID=78463327
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/124025 Ceased WO2023060539A1 (en) | 2021-10-15 | 2021-10-15 | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023060539A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024119461A1 (en) * | 2022-12-09 | 2024-06-13 | Westlake University | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019150203A1 (en) * | 2018-02-05 | 2019-08-08 | Crispr Therapeutics Ag | Materials and methods for treatment of hemoglobinopathies |
-
2021
- 2021-10-15 WO PCT/CN2021/124025 patent/WO2023060539A1/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019150203A1 (en) * | 2018-02-05 | 2019-08-08 | Crispr Therapeutics Ag | Materials and methods for treatment of hemoglobinopathies |
Non-Patent Citations (36)
| Title |
|---|
| ADEY AMORRISON HGASANXUN XKITZMAN JOTURNER EHSTACKHOUSE BMACKENZIE APCARUCCIO NCZHANG X ET AL.: "Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition", GENOME BIOL, vol. 11, 2010, pages R119, XP021091768, DOI: 10.1186/gb-2010-11-12-r119 |
| ALANIS-LOBATO, G. ET AL.: "Frequent loss of heterozygosity in CRISPR-Cas9-edited early human embryos", PROC NATL ACAD SCI U S A, vol. 118, 2021 |
| ANDERSON, K.R. ET AL.: "CRISPR off-target analysis in genetically engineered rats and mice", NAT METHODS, vol. 15, 2018, pages 512 - 514, XP036542157, DOI: 10.1038/s41592-018-0011-5 |
| ANZALONE ANDREW V ET AL: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, NATURE PUBLISHING GROUP UK, LONDON, vol. 576, no. 7785, 21 October 2019 (2019-10-21), pages 149 - 157, XP036953141, ISSN: 0028-0836, [retrieved on 20191021], DOI: 10.1038/S41586-019-1711-4 * |
| ANZALONE ANDREW V. ET AL: "Search-and-replace genome editing without double-strand breaks or donor DNA - SUPPLEMENTARY INFORMATION", NATURE, vol. 576, no. 7785, 21 October 2019 (2019-10-21), London, pages 149 - 157, XP055899878, ISSN: 0028-0836, Retrieved from the Internet <URL:http://www.nature.com/articles/s41586-019-1711-4> DOI: 10.1038/s41586-019-1711-4 * |
| ANZALONE, A.V. ET AL.: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, vol. 576, 2019, pages 149 - 157, XP036953141, DOI: 10.1038/s41586-019-1711-4 |
| CAMERON, P. ET AL.: "Site-seq: Mapping the genomic landscape of CRISPR-Cas9 cleavage", NAT METHODS, vol. 14, 2017, pages 600 - 606 |
| KARL V VOELKERDINGSHALE A DAMESJACOB D DURTSCHI: "Next-Generation Sequencing: From Basic Research to Diagnostics", CLINICAL CHEMISTRY, vol. 55, 2009, pages 641 - 658, XP055057879, DOI: 10.1373/clinchem.2008.112789 |
| KIM ET AL., GENOM RES, vol. 28, 2018 |
| KIM, D. ET AL.: "Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells", NAT METHODS, vol. 12, 2015, pages 237 - 243,234 |
| KIM, D.KIM, J.S.: "DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA", GENOME RES, vol. 28, 2018, pages 1894 - 1900 |
| KIM, H.K. ET AL.: "Predicting the efficiency of prime editing guide RNAs in human cells", NAT BIOTECHNOL, vol. 39, 2021, pages 198 - 206, XP037365130, DOI: 10.1038/s41587-020-0677-y |
| KLEINSTIVER BPPATTANAYAK VPREW MS ET AL.: "High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects", NATURE, vol. 529, no. 7587, 2016, pages 490 - 495, XP055650074, DOI: 10.1038/nature16526 |
| KLEINSTIVER ET AL., NATURE, 28 January 2016 (2016-01-28) |
| LI YANG ET AL: "A Tale of Two Moieties: Rapidly Evolving CRISPR/Cas-Based Genome Editing", TRENDS IN BIOCHEMICAL SCIENCES, vol. 45, no. 10, 1 October 2020 (2020-10-01), AMSTERDAM, NL, pages 874 - 888, XP055754337, ISSN: 0968-0004, DOI: 10.1016/j.tibs.2020.06.003 * |
| LI, H.: "Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly", BIOINFORMATICS, vol. 28, 2012, pages 1838 - 1844, XP055449241, DOI: 10.1093/bioinformatics/bts280 |
| LIANG, G. ET AL.: "Frequent gene conversion in human embryos induced by double strand breaks", BIORXIV, 2020 |
| MACLEAN ET AL., NATURE REV. MICROBIOL., vol. 7, 2009, pages 287 - 296 |
| MACLEAN ET AL.: "Application of 'next-generation' sequencing technologies to microbial genetics", NAT REV MICROBIOL., vol. 7, no. 4, April 2009 (2009-04-01), pages 287 - 96, XP008161044, DOI: 10.1038/nrmicro2088 |
| MARTIN, M.: "Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads", EMBNET, vol. 17, 2011, XP055737194, DOI: 10.14806/ej.17.1.200 |
| MUSUNURU, K. ET AL.: "In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates", NATURE, vol. 593, 2021, pages 429 - 434, XP037513148, DOI: 10.1038/s41586-021-03534-y |
| NEWBY, G.A. ET AL.: "Base editing of haematopoietic stem cells rescues sickle cell disease in mice", NATURE, 2021 |
| PETERKA MARTIN ET AL: "Harnessing DSB repair to promote efficient homology-dependent and -independent prime editing", BIORXIV, 10 August 2021 (2021-08-10), XP055901468, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2021.08.10.455572v1.full.pdf> [retrieved on 20220315], DOI: 10.1101/2021.08.10.455572 * |
| PINELLO, L. ET AL.: "Analyzing CRISPR genome-editing experiments with CRISPResso", NAT BIOTECHNOL, vol. 34, 2016, pages 695 - 697 |
| REZNIKOFF WS.: "Tn5 as a model for understanding DNA transposition", MOL MICROBIOL, vol. 47, 2003, pages 1199 - 1206 |
| REZNIKOFF WS.: "Transposon Tn5", ANNU REV GENET, vol. 42, 2008, pages 269 - 286, XP055252347, DOI: 10.1146/annurev.genet.42.110807.091656 |
| SCHALLER H.: "The intergenic region and the origins for filamentous phage DNA replication", COLD SPRING HARB SYMP QUANT BIOL, vol. 43, 1979, pages 401 - 408 |
| SCOTT ET AL.: "Chromosomal integration of the green fluorescent protein gene in lactic acid bacteria and the survival of marked strains in human gut simulations", FEMS MICROBIOL LETT, vol. 182, 2000, pages 23 - 27 |
| SLAYMAKER ET AL., SCIENCES, 1 January 2016 (2016-01-01) |
| SLAYMAKER IMGAO LZETSCHE BSCOTT DAYAN WXZHANG F: "Rationally engineered Cas9 nucleases with improved specificity", SCIENCE, vol. 351, no. 6268, 2016, pages 84 - 88, XP055551663, DOI: 10.1126/science.aad5227 |
| TSAI, S.Q. ET AL.: "CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets", NAT METHODS, vol. 14, 2017, pages 607 - 614, XP055424040, DOI: 10.1038/nmeth.4278 |
| TSAI, S.Q. ET AL.: "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases", NAT BIOTECHNOL, vol. 33, 2015, pages 187 - 197, XP055555627, DOI: 10.1038/nbt.3117 |
| VOELKERDING ET AL., CLINICAL CHEM., vol. 55, 2009, pages 641 - 658 |
| WIENERT, B. ET AL.: "Unbiased detection of CRISPR off-targets in vivo using DISCOVER-Seq", SCIENCE, vol. 364, 2019, pages 286 - 289, XP055787709, DOI: 10.1126/science.aav9023 |
| YAN, W.X. ET AL.: "BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks", NAT COMMUN, vol. 8, 2017, pages 15058, XP055485619, DOI: 10.1038/ncomms15058 |
| ZUCCARO, M.V. ET AL.: "Allele-Specific Chromosome Removal after Cas9 Cleavage in Human Embryos", CELL, vol. 183, 2020, pages 1650 - 1664 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024119461A1 (en) * | 2022-12-09 | 2024-06-13 | Westlake University | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Durrant et al. | Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome | |
| JP7229923B2 (en) | Methods for assessing nuclease cleavage | |
| CN107586835B (en) | Single-chain-linker-based construction method and application of next-generation sequencing library | |
| KR102425438B1 (en) | Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq) | |
| JP2018532419A (en) | CRISPR-Cas sgRNA library | |
| JP7426370B2 (en) | Preparative electrophoresis method for targeted purification of genomic DNA fragments | |
| US20190002920A1 (en) | Methods and kits for cloning-free genome editing | |
| US20230056763A1 (en) | Methods of targeted sequencing | |
| EP3812472B1 (en) | A truly unbiased in vitro assay to profile off-target activity of one or more target-specific programmable nucleases in cells (abnoba-seq) | |
| US20230257799A1 (en) | Methods of identifying and characterizing gene editing variations in nucleic acids | |
| Xu et al. | Randomly broken fragment PCR with 5′ end-directed adaptor for genome walking | |
| US20140065616A1 (en) | Isoltation of Factors Associated with Nucleic Acid | |
| WO2023060539A1 (en) | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation | |
| WO2024119461A1 (en) | Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation | |
| Schick et al. | CRISPR-Cas9 enables conditional mutagenesis of challenging loci | |
| US20190218544A1 (en) | Gene editing, identifying edited cells, and kits for use therein | |
| Arbab et al. | Self‐Cloning CRISPR | |
| US20210222171A1 (en) | Crispr/cas9 systems, and methods of use thereof | |
| JP2023538537A (en) | Methods for targeted removal of nucleic acids | |
| US20210395813A1 (en) | Multimer for sequencing and methods for preparing and analyzing the same | |
| Gómez-Saldivar et al. | Tissue-specific DamID protocol using nanopore sequencing | |
| US20250163407A1 (en) | Methods selectively depleting nucleic acid using rnase h | |
| Li et al. | Enrichment of prime-edited mammalian cells with surrogate PuroR reporters | |
| US20240287609A1 (en) | Compositions and methods for large-scale in vivo genetic screening | |
| EP4321630A1 (en) | Method of parallel, rapid and sensitive detection of dna double strand breaks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21785733 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21785733 Country of ref document: EP Kind code of ref document: A1 |