[go: up one dir, main page]

US20230357737A1 - Engineered cas9 variants - Google Patents

Engineered cas9 variants Download PDF

Info

Publication number
US20230357737A1
US20230357737A1 US18/201,537 US202318201537A US2023357737A1 US 20230357737 A1 US20230357737 A1 US 20230357737A1 US 202318201537 A US202318201537 A US 202318201537A US 2023357737 A1 US2023357737 A1 US 2023357737A1
Authority
US
United States
Prior art keywords
cas9
hnh domain
catalytic
protein
amino acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/201,537
Inventor
Jin Liu
Zhicheng ZUO
Yu-Chieh Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of North Texas Health Science Center
Original Assignee
University of North Texas Health Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of North Texas Health Science Center filed Critical University of North Texas Health Science Center
Priority to US18/201,537 priority Critical patent/US20230357737A1/en
Assigned to UNIVERSITY OF NORTH TEXAS HEALTH SCIENCE CENTER reassignment UNIVERSITY OF NORTH TEXAS HEALTH SCIENCE CENTER ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIU, JIN, ZUO, Zhicheng
Publication of US20230357737A1 publication Critical patent/US20230357737A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling

Definitions

  • the invention generally concerns an engineered Cas9 protein and method for producing and/or using the same.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • Cas9 CRISPR-associated protein 9
  • dsDNA double-stranded DNA
  • sgRNA chimeric single-guide RNA
  • recognition and cleavage of dsDNA strictly require the presence of a protospacer adjacent motif (PAM) in the non-target DNA strand (ntDNA) and depend on the base-pair complementarity of the target DNA strand (tDNA) to the RNA guide template (Jinek et al.
  • PAM protospacer adjacent motif
  • Cas9 adopts an overall bi-lobed architecture, in which the sgRNA:tDNA heteroduplex resides within the central channel between the ⁇ -helical recognition (REC) and nuclease (NUC) lobes, while the displaced ntDNA threads into a side channel within the NUC lobe ( FIG. 7 ) (Jiang et al. Science 351, 867-71, 2016; Jiang et al. Science 348, 1477-81, 2015; Nishimasu et al.
  • REC ⁇ -helical recognition
  • NUC nuclease
  • the NUC lobe comprises of two metal-ion-dependent nuclease domains, dubbed as HNH and RuvC, which are responsible for cutting the tDNA (via one-metal-ion mechanism) (Yang, Q. Rev. Biophys. 44, 1-93, 2011 ; Yang, Nat. Struct. Mol. Biol. 15, 1228-31, 2008) and ntDNA (via two-metal-ion mechanism (Yang, Q. Rev. Biophys. 44, 1-93, 2011 ; Yang, Nat. Struct. Mol. Biol. 15, 1228-31, 2008; Yang et al., Mol. Cell 22, 5-13, 2006), respectively.
  • CRISPR-Cas9 induced an unexpected high number of new mutations in a mouse model of gene therapy, involving thousands of single-nucleotide variants (SNVs) and hundreds of insertions and deletions (indels) (Schaefer et al. Nat. Methods 14, 547-548, 2017). Therefore, much effort is needed to increase the fidelity of CRISPR-Cas9 with regard to off-target mutation generation, especially in the clinical setting (Schaefer et al.
  • the Cas9 variants of the current invention provide a solution to the off-target/fidelity problems associated with native and current Cas9 variants.
  • the amino acid variants are in the HNH domain region of Cas9.
  • the inventors have discovered a process to model the structure of Cas 9 in an appropriate active state, which results in the identification and design of additional variants of Cas9 having appropriate activity that enhance fidelity. Without wishing to be bound by theory, it is believed that the use of these additional variants alone or in combination with other variants results in a high fidelity Cas9 protein for use in genetic engineering methods.
  • MD Molecular dynamics
  • Cas9 enhanced specificity by site-specific mutations stems from reduced binding affinities for the off-target sites.
  • the inventors propose that mutations designed for attenuating the activation of Cas9 HNH nuclease domain could also be employed for improving the Cas9 targeting accuracy, given the observation that HNH domain undergoes a substantial rotation of ⁇ 180 degrees during the inactive to active state transition.
  • the Cas9 residues (except the HNH domain) forming non-specific contacts with the HNH domain or the HNH domain residues forming non-specific contacts with other Cas9 domain and/or nucleic acids (target DNA and/or gRNA) comprise the additional promising mutation sites for rational Cas9 engineering. From a physiochemical perspective, these amino acid substitutions raise the threshold energy underlying HNH conformational activation against the off-target substrates, thereby requiring more stringent Watson-Crick base pair complementarity.
  • the substitution can be one or more of alanine (Ala, A), arginine (Arg, R), asparagine (Asn, N), aspartic acid (Asp, D), cysteine (Cys, C), glutamic acid (Glu, E), glutamine (Gln, Q), glycine (Gly, G), histidine (His, H), isoleucine (Ile, I), leucine (Leu, L), lysine (Lys, K), methionine (Met, M), phenylalanine (Phe, F), proline (Pro, P), serine (Ser, S), threonine (Thr, T), tryptophan (Trp, W), tyrosine (Tyr, Y), or valine (Val, V) in place of the native amino acid.
  • alanine Al, A
  • arginine Arg, R
  • asparagine Asn, N
  • aspartic acid Aspartic acid
  • the modified Cas9 protein comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 modifications including one or more modification or variant corresponding to Thr58, Glu60, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has at least two amino acid modifications.
  • the modified Cas9 protein can further comprise one or more modification that includes modification of Asn14, Lys268, Glu370, Arg447, Tyr450, Asn497, Lys500, Lys526, Lys528, Lys558, Asn588, Arg661, Asn692, Gln695, Arg780, Arg783, Asn803, Gln805, Lys810, Tyr812, Asp829, Asn831, Arg832, Asp835, Gln844, Lys848, Lys862, Arg925, Gln926, Lys929, His930, Lys961, Lys968, Tyr1013, Lys1031, Lys1244, or Lys1246 corresponding to SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Thr58 of SEQ ID NO:1 in combination with one or more modification corresponding to Glu60, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu60 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu223 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu396 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu370 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu371, Asp406, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu371 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Asp406 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Glu371, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu584 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Asp585 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg586 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg765 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Asn767 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg778 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Glu779 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Ser845 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to G1n844 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg859 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg780 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Arg783 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to G1n807 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modified Cas9 protein has a modification corresponding to Tyr812 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • the modification can be of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58Lys, Thr58Arg, Glu60A1a, Glu223A1a, Glu370A1a, Asp406A1a, Glu396A1a, Glu371A1a, Glu584A1a, Asp585A1a, Arg586A1a, Arg765A1a, Asn767A1a, Arg778A1a, Glu779A1a, Ser845Asp, Gln844Glu, Arg859Ala, Arg780Ala, Arg783Ala, Asn803Ala, Gln807Ala, Tyr812Ala, Lys918Ala, Arg864Ala or Lys866Ala modification corresponding to SEQ ID NO: 1.
  • Certain embodiments are directed to modified Cas9 protein having the Cas9 mofication selected from K526A/N588A/R765A/N767A; N588A/K929A/H930A/Y1013A; R447A/K526A/K929A; N588A/N767A/Y1013A/K866A; N588A/N767A/Y1013A/S845D; K268A/K526A/N588A/N767A; N14A/K526A/K866A/K1246A; N14A/R447A/Y1013A/K1246A; N588A/R765A/D835A/K1246A; or N14A/R447A/R765A/S845D.
  • the Cas9 mofication is N588A/R765A/D835A/K1246A or N14A/R447A/R765
  • the native Cas9/gRNA complex is able to cleave the target DNA and all the off-target DNA sequences.
  • the modified Cas9 protein reduces the cleavage of the off-target DNA sequence.
  • the specificity (fidelity) can be determined by measuring the number of off-target cleavage. The lower number of off-target site cleavages, the higher the specificity (fidelity). For example, if a designed Cas9 mutant yields cleavage only at 10% of the off-target sites compared to the wild type protein, meaning 90% fewer off-target events, the gene editing specificity can be regarded as improving by 90%.
  • the on-target activities of Cas9 proteins can be assessed using the human cell-based enhanced GFP (EGFP) disruption assay.
  • EGFP enhanced GFP
  • the wild type Cas9 guided by a fully matched gRNA induces 90% EGFP disruption, a certain Cas9 variant exhibiting a disruption percentage around that value (80%, 95%, for example) is considered as possessing the wild-type or near wild-type cleavage efficiency.
  • the criterion of >70% of wild-type activity is used for screening potential Cas9 variants for subsequent tests on a whole-genome level.
  • Certain embodiments are directed to a fusion protein comprising the modified Cas9 protein fused to a heterologous peptide or protein, with an optional intervening linker.
  • inventions are directed to an expression cassette encoding the modified Cas9 protein or fusion protein comprising the modified Cas9 protein.
  • Certain embodiments are directed to a host cell expressing an expression cassette of the invention.
  • the host cell is an isolated host cell or a host in culture.
  • Certain embodiments are directed to methods of using such a modified Cas9 protein. Certain aspects include methods of altering the genome of a cell, the method comprising expressing in the cell or contacting the cell with the modified Cas9 protein described herein. In a further aspect the modified Cas9 protein is linked to a guide RNA having a region complementary to a selected portion of the genome of the cell. The method resulting in the alteration of the genome of the cell.
  • inventions are directed to an active state model of the HNH domain of Cas9 comprising a divalent cation at the interface of a ⁇ motif and a scissile phosphate.
  • the divalent cation is Mg, Mn, Ca, or Co.
  • Still other embodiments are directed to methods of modeling an active state of a Cas9 HNH domain.
  • the methods can comprise at least the steps of (a) aligning a scissile phosphate and flanking nucleotides of a T4 Endo VII system (2QNC) to corresponding tDNA stretch in the Cas9 complex of the pre-catalytic state (5F9R); (b) calculating a tDNA transformation matrix from the paired ⁇ motifs in the two nucleases, resulting in a model of the HNH domain docked at the cleavage site; (c) repeating a and b, replacing the crystal structure (5F9R) with snapshot structures from the sets of long cMD trajectories; (d) replacing the ⁇ segment of the ⁇ -Me motif in the optimized Cas9 complex from c with the corresponding part in the Mg2+-bound apo-Cas9 structure (4CMP); (e) performing long cMD simulations to obtain active state of Cas9.
  • dsDNA double stranded DNA
  • Other embodiments are directed to methods of altering a double stranded DNA (dsDNA) molecule, the method comprising contacting the dsDNA molecule with the modified Cas9 protein described herein.
  • the modified Cas9 protein can be linked to a guide RNA having a region complementary to a selected portion of the dsDNA molecule, resulting in the alteration of the dsDNA molecule.
  • polypeptide refers to a polymer of the protein amino acids, or amino acid analogs, regardless of its size or function.
  • protein is often used in reference to relatively large polypeptides
  • peptide is often used in reference to small polypeptides, usage of these terms in the art overlaps and varies.
  • polypeptide refers to peptides, polypeptides, and proteins, unless otherwise noted.
  • protein polypeptide
  • polypeptide and “peptide” are used interchangeably herein when referring to a gene product.
  • exemplary polypeptides include gene products, homologs, orthologs, paralogs, fragments and other equivalents, variants, and analogs of the foregoing.
  • variant or mutant refers to an amino acid sequence that is different from the reference polypeptide by one or more amino acids, e.g., one or more amino acid substitutions.
  • a modified or variant Cas9 polypeptide differs from wild-type Cas9 (e.g., SEQ ID NO:1) by one or more amino acid substitutions, i.e., mutations.
  • Polynucleotide synonymously referred to as “nucleic acid molecule” or “nucleic acids,” refers to any polyribonucleotide or polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA.
  • Polynucleotides include, without limitation single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, double-stranded, or a mixture of single- and double-stranded regions.
  • “Substantially similar” with respect to nucleic acid or amino acid sequences means at least about 65% identity between two or more sequences.
  • the term refers to at least about 70% identity between two or more sequences, more preferably at least about 75% identity, more preferably at least about 80% identity, more preferably at least about 85% identity, more preferably at least about 90% identity, more preferably at least about 91% identity, more preferably at least about 92% identity, more preferably at least about 93% identity, more preferably at least about 94% identity, more preferably at least about 95% identity, more preferably at least about 96% identity, more preferably at least about 97% identity, more preferably at least about 98% identity, and more preferably at least about 99% or greater identity.
  • identity can be determined using algorithms known in the art, such as the mBLAST algorithm.
  • isolated can refer to a nucleic acid or polypeptide that is substantially free of cellular material, bacterial material, viral material, or culture medium (when produced by recombinant DNA techniques) of their source of origin, or chemical precursors or other chemicals (when chemically synthesized).
  • an isolated polypeptide refers to one that can be administered to a cell or a subject; in other words, the polypeptide may not simply be considered “isolated” if it is adhered to a column or embedded in an agarose gel.
  • an “isolated nucleic acid fragment” or “isolated peptide” is a nucleic acid or protein fragment that is not naturally occurring as a fragment and/or is not typically in the functional state.
  • the term “providing” is used according to its ordinary meaning “to supply or furnish for use.”
  • the protein is provided directly by administering the protein, while in other embodiments, the protein is effectively provided by administering a nucleic acid that encodes the protein.
  • the invention contemplates compositions comprising various combinations of nucleic acid, and/or peptides.
  • the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
  • compositions and methods of making and using the same of the present invention can “comprise,” “consist essentially of,” or “consist of” particular ingredients, components, blends, method steps, etc., disclosed throughout the specification.
  • FIGS. 1 a - 1 d Cas9 HNH domain motions and conformational flexibility characterized by principal component analysis.
  • (a-c) Visualization of the top three dominant motions for the HNH domain.
  • the first motional mode depicts a rotation motion around an axis perpendicular to the plane, while the second and third modes describe a translational movement toward the tDNA and REC2 domain, respectively.
  • the Ca atoms of the three HNH catalytic residues are represented as van der Walls spheres.
  • the pre-catalytic state (PDB code: 5F9R), its modeled “catalytic” state from the crystal structure of T4 Endo VII complex with a DNA substrate, and the start and end points for the targeted MD (tMD) simulations were also projected onto the subspace defined by the first two PCA modes, along with the targeted MD (tMD)- and ensemble conventional MD (cMD ens )-derived catalytic states (an average over 100 data points is reported). All the trajectories were best-fitted to the Cas9 protein (excluding the HNH domain) of the pre-catalytic crystal structure (PDB code: 5F9R), and the coordinate covariance matrix was computed over the HNH domain for subsequent analysis.
  • FIGS. 2 a - 2 d Catalytic state coordination at the interface of HNH ⁇ fold and tDNA (a,b) and comparison with the one-metal-ion catalysis by T4 Endo VII (c).
  • tMD post targeted MD
  • cMD ens conventional ensemble MD
  • FIGS. 2 a - 2 d Schematic representation of the one-metal-ion dependent catalysis in ⁇ -metal nucleases.
  • the pro-Sp and pro-Rp oxygens of the scissile phosphate are indicated with Sp and Rp, respectively.
  • the putative active residues in Cas9 HNH domain and the corresponding residues in T4 Endo VII are labeled in boldface.
  • Mg 2+ is shown as a cyan sphere and the potential nucleophilic water is denoted by an arrow.
  • the dashed lines indicate the coordination bonds involving Mg 2+ and hydrogen-bonds.
  • FIGS. 4 a - 4 c Mg 2+ -aided conformational transition to catalytic state.
  • FIGS. 5 a - 5 f New interactions established between the catalytic HNH domain and other components in the complex system identified from the post target-MD (tMD) simulations.
  • f Interactions with the sgRNA.
  • the HNH domain residues are highlighted. The dash lines denote the salt bridges and/or hydrogen bonds. Due to space limit, only interacting pairs with relatively high occupancy throughout the simulations are shown here, and the complete residue list is present in Table 4. This figure is comparable to FIG. 14 .
  • FIG. 6 Conformational activation pathway of Cas9 HNH nuclease domain.
  • the HNH domain and flanking liker regions i.e., L1 and L2 are highlighted.
  • the PAM and the three putative catalytic residues of HNH domain are represented as blue spheres.
  • the dash lines denote the disordered liker loops.
  • All of the solved Cas9 crystal structures in different binding forms assume an inactive state as for both RuvC and HNH nuclease domains.
  • Dagdas et al. Dagdas et al. (Dagdas et al., bioRxiv, 122242, 2017) identified three distinct conformational states of Cas9, designated state “R”, “I” and “D”, respectively.
  • FIGS. 7 a - 7 c spCas9-sgRNA complexed with PAM-containing double-stranded DNA (dsDNA) substrate.
  • dsDNA PAM-containing double-stranded DNA
  • Cas9 NUC lobe comprises of two nuclease domains (RuvC and HNH), C-terminal domain (CTD) and topoisomerase homology (Topo) domain, and REC lobe is spatially divided into three domains (REC1, REC2 and REC3); the two lobes are connected by an arginine-rich (bridge) helix (BH).
  • the target and non-target DNA strands are colored dark and light green, respectively, with the PAM duplex highlighted in crimson.
  • FIGS. 8 a - 8 f Molecular Dynamic (MD) simulations.
  • (a-c) Projections of the conventional MD simulations without ntDNA (cMD w/o ntDNA) (a), conventional MD simulations with ntDNA (cMD with ntDNA) (b) and accelerated MD simulations without ntDNA (aMD w/o ntDNA) (c) onto the first two eigen-vectors calculated from the whole trajectories for the HNH domain.
  • FIGS. 9 a - 9 b (a) FRET labeled residue pairs shown with 5F9R and (b) scatter plot of the distances for the labeled residue pairs calculated from conventional MD simulations without ntDNA (cMD w/o ntDNA, black dots) and with ntDNA (cMD with ntDNA, green dots). Ser355, Ser867 and Asn1054 are located in the REC1, HNH and RuvC domains, respectively. These residues were previously selected to characterize different conformational states of HNH domain in FRET experiments (Dagdas et al., bioRxiv, 122242, 2017).
  • FIGS. 11 a - 11 f Putative catalytic state of Cas9 HNH domain modeled from T4 Endonuclease VII (Endo VII)/DNA complex (PDB code: 2QNC).
  • Endo VII T4 Endonuclease VII
  • PDB code: 2QNC T4 Endonuclease VII
  • the Ca atoms of the active residues i.e., Asp40, His840 and Asn62
  • the coordinated Mg2+ is depicted as a sphere.
  • Cas9 HNH domain opposite to the target DNA strand PDB code: 5F9R).
  • the Ca atoms of the putative catalytic residues are represented as spheres and the HNH domain ⁇ -metal motif is shown.
  • the scissile phosphate and flanking nucleotides of the DNA substrate in a superimposed onto the corresponding stretch in b alongside the ⁇ -metal motif
  • the C ⁇ RMSD of the equivalent residues (shown as spheres) between the two nucleases is 1.2 ⁇ .
  • the catalytic residues appear to be spatially superimposed well.
  • Cas9 HNH domain oriented toward the target DNA strand based on the transformation matrix obtained from d.
  • Direct “docking” of the HNH domain starting from the pre-catalytic state results in a number of steric clashes with other components in the system.
  • the overlapping heavy atoms are shown as van der Walls spheres, using a distance cutoff of 1.4 ⁇ .
  • the pairwise RMSD for the HNH domain backbone is 25 ⁇ here.
  • FIG. 12 Relative binding strength of the residues on the HNH ⁇ fold and opposite tDNA with the coordinated Mg2+ computed via MM-GBSA approach. The energetic contribution of each residue is relative to Asp861 being of 100% binding strength. Positive and negative values indicate favorable and unfavorable binding, respectively.
  • FIG. 14 a - 14 f New interactions established between the catalytic HNH domain and other components in the complex system identified from the conventional ensemble MD (cMDens) simulations.
  • f Interactions with the sgRNA.
  • the HNH domain residues are highlighted. The dash lines denote the salt bridges and/or hydrogen bonds. Note that all the interacting pairs do not necessarily appear in one single snapshot, and the complete residue list is present in Table 4. This figure is comparable to FIG. 5 .
  • FIG. 15 a - 15 b Illustrates an active state of the Cas9 HNH domain identified by computer modeling and simulations can be responsible for the tDNA cleavage.
  • Site-directed mutagenesis experiments with four single mutations (D837A, D839A, D861A, and N863A) plus one double mutation (D861A/N863A) suggest that D839 and N863 are residues involved in Cas9 activity by directly coordinating the catalytic Mg 2+ at the interface between the HNH domain and tDNA, validating the newly identified active state.
  • FIG. 16 a - 16 c The gene-editing activity of two tetramutant variants of Cas9.
  • (a) The expression of different Cas9 variants in HEK293T-EGFP cells. WT: wild-type. Mut1.8: N588A/R765A/D835A/K1246A. Mut1.9: N14A/R447A/R765A/S845D.
  • (b) The representative histograms of flow cytometry analysis of EGFP-positive cells in the HEK293T-EGFP cells expressing the indicated Cas9 variants and EGFP gene-targeting sgRNA.
  • (c) The quantitative data of EGFP-positive cells in each sample. Similar to the wild-type Cas9, the Mut1.8 and Mut1.9 Cas9 variants were highly active in gene editing that led to the loss of EGFP expression in the cells. *P ⁇ 0.05 (Student's t-test).
  • the bacterial CRISPR-Cas9 system has been adapted as a powerful and versatile genome-editing toolbox.
  • the system holds immense promise for future therapeutic applications.
  • Cas9 structure/function little is known on the catalytic state of Cas9 HNH nuclease domain and it remains elusive how the divalent metal ions affect the HNH domain conformational transition.
  • a deep understanding of Cas9 activation and cleavage mechanism can enable further optimization of Cas9-based genome-editing specificity and efficiency.
  • the inventors Using two distinct molecular dynamics simulation techniques, the inventors obtained a cross-validated catalytically active state of Cas9 HNH domain primed for cutting the target DNA strand.
  • the inventors demonstrate at the atomic level the essential roles of the catalytic Mg 2+ for the active state formation and stability. Furthermore, the inventors show that the derived catalytic conformation of HNH domain can be exploited for rational engineering of Cas9 variants with enhanced specificity.
  • the ntDNA was not included in the inventors simulations.
  • cleavage assays suggest that a single-stranded tDNA substrate was cleaved two orders of magnitudes slower than a dsDNA substrate, despite comparable binding affinities of both substrates to Cas9-gRNA (Sternberg et al.
  • ntDNA accelerates the reaction rates probably by promoting the HNH domain rotation during strand unwinding ( FIG. 6 ) and/or by facilitating rapid interrogation and loading of the DNA target via PAM recognition (Sternberg et al. Nature 507, 62-67, 2014).
  • the Cas9 catalytic state might adopt a somewhat different conformation from that captured here.
  • the global orientation of HNH domain relative to the REC lobe and tDNA, and the coordination configuration at the binding interface should vary little.
  • the inventors have bridged the missing link of how the HNH domain transitions from the pre-catalytic to catalytic state.
  • another fundamental question remains open to be answered that what factors trigger ⁇ 180° rigid-body rotation of L1-HNH-L2 during the previously identified immediate (“I”) to pre-catalytic (“P”) state transition ( FIG. 6 ).
  • the inventor contemplate that there likely exists a functionally relevant transition state between the I and P states, which acts a conformational checkpoint determining the fates (cleaved or not) of bound on- or off-target substrates. By introducing a certain number of mismatches, this state might be captured through smFRET or crystallography, or identified with molecular dynamics free energy simulations (Giulia et al. Proc. Natl. Acad. Sci. U.S.A. 2017).
  • Mg 2+ is indispensable for the catalytic state formation and stability.
  • Mg 2+ it is conceivable that the HNH domain swings repeatedly toward and away from the tDNA but fails to visit an active conformation ( FIG. 4 ), as demonstrated by the smFRET experiments (Dagdas et al. bioRxiv, 122242, 2017). If Mg 2+ diffuses into the binding interface, the HNH domain readily docks onto and gets stable association with the opposite tDNA, accompanying new interactions formed with other components (especially REC lobe and sgRNA) in the system.
  • Mg 2+ also acts as a facilitator and stabilizer of the functional conformational state.
  • the inventors hold that the roles of Mg 2+ revealed here are common in other divalent metal ion dependent nucleases (Yang, Q. Rev. Biophys. 44, 1-93, 2011; Yang et al. Mol. Cell 22, 5-13, 2006).
  • Mg 2+ other metal ions like Mn 2+ , Ca 2+ and Co 2+ are also able to activate HNH conformation and stabilize its catalytic state (Zuo and Liu, Sci. Rep.
  • the derived catalytic state provides a different perspective on the sources of enhanced Cas9 specificity through alanine mutagenesis.
  • the four basic residues of L1 linker and HNH domain, Lys775, Arg832, Lys862 and Lys848, whose single alanine substitution was shown to reduce Cas9 off-target effects ( FIG. 7 c ), were previously supposed to make contacts with the phosphate backbone of ntDNA (Slaymaker et al. Science 351, 84-88, 2016).
  • Lys775, Arg832 and Lys862 form ionic/hydrogen-bonding interactions with the negatively charged residues on the REC3 (Glu584 and Asp585), REC2 (Glu223) and REC1 (Glu370 and Glu396) domain, respectively, while another residue Lys848 is simultaneously engaged to the residues on BH (Thr68 and Glu60) and sgRNA backbone ( FIG. 5 , FIG. 14 and Table 4).
  • these new interactions directly contribute to HNH domain docking onto tDNA, and neutralization of the basic residues could destabilize formation of the active HNH conformation, thereby entailing more stringent Watson-Crick base pair complementarity with sgRNA.
  • modified proteins can be tested using a human cell-based EGFP-disruption assays.
  • a human cell-based EGFP-disruption assay successful cleavage of a target site in the coding sequence of a single integrated, constitutively expressed EGFP gene leds to the induction of mutations and disruption of EGFP activity, which can be quantitatively assessed by flow cytometry (see, for example, Reyon et al., Nat Biotechnol. 30(5):460-5, 2012).
  • Substitutional variants typically contain the exchange of one amino acid for another at one or more sites within the protein, and may be designed to modulate one or more properties of the polypeptide, with or without the loss of other functions or properties. Substitutions may be conservative, that is, one amino acid is replaced with one of similar shape and charge.
  • Conservative substitutions are well known in the art and include, for example, the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to arginine; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or phenylalanine; and valine to isoleucine or leucine.
  • substitutions may be non-conservative such that a function or activity of the polypeptide is affected.
  • Non-conservative changes typically involve substituting a residue with one that is chemically dissimilar, such as a polar or charged amino acid for a nonpolar or uncharged amino acid, and vice versa.
  • Proteins may be recombinant, or synthesized in vitro.
  • a non-recombinant or recombinant protein may be isolated from bacteria or other host cell expression system.
  • Codons include: Alanine (Ala, A) GCA, GCC, GCG, and GCU; Cysteine (Cys, C) UGC and UGU; Aspartic acid (Asp, D) GAC and GAU; Glutamic acid (Glu, E) GAA and GAG; Phenylalanine (Phe, F) UUC and UUU; Glycine (Gly, G) GGA, GGC, GGG, and GGU; Histidine (His, H) CAC and CAU; Isoleucine (Ile, I) AUA, AUC, and AUU; Lysine (Lys, K) AAA and AAG; Leucine (Leu, L) UUA, UUG, CUA, CUC, CUG, and CUU; Methi
  • amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids, or 5′ or 3′ sequences, respectively, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned.
  • the addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either of the 5′ or 3′ portions of the coding region.
  • amino acids of a protein may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as binding sites on substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be made in a protein sequence, and in its underlying DNA coding sequence, and nevertheless produce a protein with like properties.
  • the hydropathic index of amino acids may be considered.
  • the importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte and Doolittle, 1982). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, and the like.
  • amino acid substitutions generally are based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like.
  • substitutions that take into consideration the various foregoing characteristics are well known and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
  • Embodiments involve polypeptides, peptides, proteins and fragments thereof for use in various aspects described herein.
  • all or part of proteins described herein can also be synthesized in solution or on a solid support in accordance with conventional techniques.
  • Various automatic synthesizers are commercially available and can be used in accordance with known protocols.
  • recombinant DNA technology may be employed wherein a nucleotide sequence that encodes a peptide or polypeptide is inserted into an expression vector, transformed or transfected into an appropriate host cell and cultivated under conditions suitable for expression.
  • One embodiment includes the use of gene transfer to cells, including microorganisms, for the production and/or presentation of proteins.
  • the gene for the protein of interest may be transferred into appropriate host cells followed by culture of cells under the appropriate conditions.
  • fusion proteins can include individual fusion proteins as a fusion protein with heterologous sequences such as a provider of purification tags, for example: ⁇ -galactosidase, glutathione-S-transferase, green fluorescent proteins (GFP), epitope tags such as FLAG, myc tag, or polyhistidine.
  • heterologous sequences such as a provider of purification tags, for example: ⁇ -galactosidase, glutathione-S-transferase, green fluorescent proteins (GFP), epitope tags such as FLAG, myc tag, or polyhistidine.
  • GFP green fluorescent proteins
  • the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
  • Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
  • an amino acid designated as “X” refers to any amino acid residue. However, when in the context of an amino acid substitution it is to be understood that “X” followed by a number refers to an amino acid residue at a particular location in a reference sequence.
  • an amino acid residue of an amino acid sequence of interest that “corresponds to” or is “corresponding to” or in “correspondence with” an amino acid residue of a reference amino acid sequence indicates that the amino acid residue of the sequence of interest is at a location homologous or equivalent to an enumerated residue in the reference amino acid sequence.
  • One skilled in the art can determine whether a particular amino acid residue position in a polypeptide corresponds to that of a homologous reference sequence.
  • the sequence of a modified or related Cas9 protein can be aligned with that of a reference sequence (e.g., SEQ ID NO: 1 using known techniques (e.g., basic local alignment search tool (BLAST), ClustalW2, Structure based sequences alignment program (STRAP), or the like).
  • BLAST basic local alignment search tool
  • ClustalW2 ClustalW2
  • STRAP Structure based sequences alignment program
  • crystal structure coordinates of a reference sequence may be used as an aid in determining a homologous polypeptide residue's three dimensional structure.
  • the amino acid residues of a polypeptide can be numbered according to the corresponding amino acid residue position numbering of the reference sequence.
  • the amino acid sequence of SEQ ID NO: 1 may be used for determining amino acid residue position numbering of each amino acid residue of a variant of interest.
  • nucleic acids or polypeptide sequences refers to the residues in the two sequences that are the same when aligned for maximum correspondence, as measured using one of the following sequence comparison or analysis algorithms.
  • the percent sequence identity between a reference sequence and a test sequence of interest may be readily determined by one skilled in the art.
  • the percent identity shared by polynucleotide or polypeptide sequences is determined by direct comparison of the sequence information between the molecules by aligning the sequences and determining the identity by methods known in the art.
  • An example of an algorithm that is suitable for determining sequence similarity is the BLAST algorithm, (see Altschul, et al., J. Mol. Biol., 215:403-410 [1990]).
  • Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information.
  • This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence.
  • HSPs high scoring sequence pairs
  • These initial neighborhood word hits act as starting points to find longer HSPs containing them.
  • the word hits are expanded in both directions along each of the two sequences being compared for as far as the cumulative alignment score can be increased. Extension of the word hits is stopped when: the cumulative alignment score falls off by the quantity X from a maximum achieved value; the cumulative score goes to zero or below; or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • the BLAST program uses as defaults a wordlength (W) of 11, the BLOSUM62 scoring matrix (see Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 [1992]) alignments (B) of 50, expectation (E) of 10, M′5, N′-4, and a comparison of both strands.
  • the BLAST algorithm then performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, supra).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • Percent “identical” or “identity” in the context of two or more nucleic acid or polypeptide sequences refers to two or more sequences that are the same or have a specified percentage of nucleic acid residues or amino acid residues, respectively, that are the same, when compared and aligned for maximum similarity, as determined using a sequence comparison algorithm or by visual inspection.
  • Percent sequence identity” or “% identity” or “% sequence identity or “% amino acid sequence identity” of a subject amino acid sequence to a reference amino acid sequence means that the subject amino acid sequence is identical (i.e., on an amino acid-by-amino acid basis) by a specified percentage to the reference amino acid sequence over a comparison length when the sequences are optimally aligned.
  • 80% amino acid sequence identity or 80% identity with respect to two amino acid sequences means that 80% of the amino acid residues in two optimally aligned amino acid sequences are identical.
  • the biggest challenge is to sample enough conformational space in a reasonably short time-scale. From initial MD simulations and structural observation (Jiang et al. Science 351, 867-871, 2016; Zuo and Liu, Sci. Rep. 5, 2016), the inventors contemplated that the ntDNA might impose spatial constraints on the conformational dynamics of HNH domain in the pre-catalytic state ( FIG. 7 a ). In other words, the HNH domain could exhibit enhanced flexibility in the absence of ntDNA, thereby increasing the probability to reach or get closer to the catalytic state.
  • FIG. 1 c Compared to cMD, aMD explored much broader conformational space, especially along the first PC ( FIG. 1 d and FIG. 8 ) that depicts a rotation motion of the HNH domain ( FIG. 1 a ). However, the third motional mode is more prominent in cMD than in aMD ( FIG. 8 f ), suggesting the HNH domain displaying a larger-scale translation toward the REC lobe in cMD ( FIG. 1 c ).
  • HNH Domain samples larger conformational space in the absence of ntDNA and cMD is more appropriate in searching for HNH domain active state as aMD brings appreciable internal structural distortion ( FIG. 10 and Table 2).
  • FIG. 10 and Table 2 As the microsecond time-scale samplings with cMD and aMD were unable to obtain an HNH conformation in sufficiently close proximity to the cleavage site on tDNA for catalysis, in the following sections, two different strategies to capture the converged catalytically active state of HNH domain are presented.
  • Targeted-MD Revealed the Catalytically Active State of HNH domain.
  • One of the strategies used is the targeted MD (tMD) simulation (Schlitter et al., J. Mol. Graphics 12, 84-89, 1994; Schlitter et al., Mol. Simul. 10, 291-308, 1993).
  • This approach can enable conformational transition between two known states by application of external forces.
  • homologous T4 Endonuclease VII (Endo VII) complex with a DNA Holliday junction (Biertumpfel et al., Nature 449, 616-U614, 2007) were selected as the template to build the target conformation of HNH domain, which is the putative “active” conformation model ( FIG. 11 ).
  • the Mg 2+ at the catalytic center formed a favorable octahedral coordination with six surrounding oxygen atoms from different species ( FIG. 2 a ).
  • the residues Asp839 and Asp861 on the ⁇ motif and the scissile phosphate (pro-Sp oxygen involved) between the nucleotides +3 and +4 of tDNA each contributes a coordination ligand ( FIG. 2 a ).
  • the above observation is consistent with the per-residue energy decomposition data by MM-GBSA approach ( FIG.
  • His840 contributes marginally to Mg 2+ binding, which is in line with its major role as the general base activating the nucleophile.
  • the His840 side chain hydrogen-bonded to a potential nucleophilic water molecule that is aligned for in-line attack on the scissile bond.
  • Tyr823 and Arg864 appeared to play a structural role in stabilizing the catalytic Asp839 side chain by hydrogen-bonding. Such interactions were presumed to aid proper orientation of Asp839 for coordination and catalysis.
  • amino acid Tyrosine is strictly conserved among different types of CRISPR-Cas9 by primary sequence analysis, while the basic amino acid Arginine (or Lysine) is highly conserved among the Type II-A Cas9 orthologs (Jinek et al., Science 343, 1247997, 2014).
  • the coordination composition and geometry captured here closely match those present in the T4 Endo VII/DNA complex, indicating the formation of catalytically active state of Cas9 HNH domain and consistent with previously identified tDNA cleavage site being of 3 nucleotides from the PAM (Jinek et al., Science 337, 816-821, 2012; Gasiunas et al., Proc. Natl. Acad. Sci. U.S.A. 109, E2579-2586, 2012).
  • the basic idea behind this method is to extract the structure that mostly resembles the active state from a set of MD simulations as the new starting point for a new set of the simulations. Step by step, one can efficiently sample the desired conformational space without any artificial forces.
  • the actual catalytic state is not known, it is challenging to choose the structure that mostly resembles the catalytic state.
  • the inventors used the geometric mean of the distances of +4P (the scissile phosphate) to two catalytic residues His840 and Asp861 ( FIG. 4 a ) as a metric to monitor the conformational transition of HNH domain. Apparently the smaller this value is, the closer the conformation is to the target active conformation ( FIG. 4 b ).
  • the Cas9 protein underwent prominent conformational changes, as observed from either of the post tMD and cMD ens simulations.
  • the overall C ⁇ RMSD from the initial crystal structure is near to 6 ⁇ , in which the HNH domain displayed a largest RMSD of ⁇ 11 ⁇ as expected, followed by the CTD and REC2 domains with a RMSD around 7-8 ⁇ (Table 3).
  • the CTD domain moved outward markedly, resulting in wide opening of the side channel within the NUC lobe poised for substrate loading ( FIG. 3 c and FIG. 13 ).
  • Mg 2+ is Indispensable for Activation of the Catalytic State.
  • the inventors' previous work with Cas9 RuvC domain revealed that Mg 2+ is able to induce the formation of the active state for cleaving the ntDNA (Zuo and Liu, Sci. Rep. 5, 2016).
  • Mg 2+ could also facilitate conformational activation of the HNH domain.
  • the inventors removed the coordinated Mg 2+ from the above catalytic conformation ( FIG. 2 a ) and performed microsecond-level conventional MD simulations (G7, Table 2). In the absence of Mg 2+ , two distinct consequences on the HNH domain are envisioned, i.e., either departing from the tDNA or staying docked at the tDNA without noticeable reorganization.
  • the inventors first monitored the changes in the distance pair of +4P to His840 (d +4P ⁇ H840 ) and to Asp861 (d +4P ⁇ D861 ) at the cleavage interface ( FIG. 4 a ). Their geometric mean increased from 6.0 ⁇ in the catalytic state simulations to 10.5 ⁇ on average, indicating detachment of the HNH domain from the tDNA. Further comparison with the cMD simulations starting from the pre-catalytic state clearly showed that absence of Mg 2+ leads the HNH domain to a transition state between the catalytic and pre-catalytic state ( FIG. 4 b ).
  • the Catalytic State Provides New Structural Information for Specificity Enhancement.
  • the HNH domain established a plenty of new interactions with the REC lobe (including REC1, REC2 and REC3), bridge helix (BH), tDNA and sgRNA, predominantly involving the charged and polar residues ( FIG. 5 , FIG. 14 and Table 4).
  • the two basic residues of HNH ⁇ motif, Lys862 and Lys866 formed alternative ionic interactions with the three acidic residues Glu370, Glu371 and Glu396 on REC1, respectively ( FIG. 5 a and FIG. 14 a ).
  • Lys775, Arg778 and Glu779 (on HNH flanking linker 1, L1) competed for binding to Glu584, Asp585, Arg586 and Lys558 of REC3, respectively ( FIG. 5 c and FIG. 14 c ).
  • the HNH loop immediately preceding the ⁇ motif made numerous side chain and backbone hydrogen bonds with REC2, such as Asn831 with Thr249/Asn251, and Ser834 with Gly247/Thr249 ( FIG. 5 b and FIG. 14 b ).
  • Asp835 alone hydrogen-bonded to one helical turn of Ser217, Lys218 and Ser219 on REC2.
  • Arg832 and/or Arg859 (on ⁇ motif) formed charged interactions with the REC2 Glu223 ( FIG. 5 b and FIG. 14 b ). Lying on the long loop between the two R elements of ⁇ motif, Gln844 and Lys848 were engaged to Glu60 on BH and Thr58 (on the loop linking BH and RuvC) via hydrogen-bond and ionic interactions, respectively ( FIG. 5 d and FIG. 14 d ). Another adjacent residue Ser845 was implicated in hydrogen-bonding to the +3P of tDNA, a position only 1-nt from the cleavable site ( FIG. 5 e and FIG. 14 e ).
  • the HNH domain formed a number of polar contacts with the backbone of sgRNA (primarily at its middle guide segment). Located on the N-terminal ⁇ motif flanking helices, the residue pair of Asn803 and Gln807, and the triplet of Arg780, Arg783 and Tyr812 firmly caught the two nucleotides 8 and 9 of sgRNA (numbered 1 from the most PAM-distal end), respectively, through hydrogen-bonds and/or salt bridges ( FIG. 5 f and FIG. 14 f ).
  • the initial configurations of the two Cas9 complex systems viz. Cas9-sgRNA-dsDNA (with tDNA) and Cas9-sgRNA-tDNA (without ntDNA) were derived from the recently solved crystal structure at 3.4 ⁇ resolution (PDB accession code: 5F9R (Jiang et al., Science 351, 867-871, 2016)).
  • the ntDNA-free system was built by removing the entire non-target DNA strand from the intact structure, while for the dsDNA-bound system, the ntDNA 5′-end cleavage product was excluded based on previous study (Zuo and Liu, Sci. Rep. 5, 2016).
  • the TIP3P model (Jorgensen et al., J. Chem. Phys. 79, 926-35, 1983) was selected for water and the recently developed ion parameter sets optimized in TIP3P water were employed for the mono- and divalent ions (Li et al., J. Chem. Theory Comput. 11, 1645-57, 2015; Li et al., J. Chem. Theory Comput. 9, 2733-48, 2013). It should be mentioned that none of the available non-bonded models for metal ions, especially the multivalent ions, is able to reproduce various experimental properties simultaneously (Panteva et al., J. Comput. Chem.
  • the Mg 2+ parameter set here as previously used for the same enzyme (Zuo and Liu, Sci. Rep. 5, 2016), represent the best possible compromise targeting the experimental coordination number, Mg 2+ -O distance and hydration free energy (Li et al., J. Chem. Theory Comput. 9, 2733-48, 2013).
  • the short-range non-boned interaction were truncated at 10 ⁇ , and the long-range electrostatics were treated via the particle mesh Eward summation (PME) method (Darden et al., J. Chem. Phys. 98, 10089-92, 1993) using a grid spacing of 1 ⁇ .
  • PME particle mesh Eward summation
  • aMD Accelerated Molecular Dynamics
  • aMD is an enhanced sampling technique by adding a non-negative potential [ ⁇ V(r)] to the original potential energy surface [V(r)] when it falls below a threshold energy (E), as
  • the aMD simulations were started from the last snapshots of the above short cMD simulations and were performed also in NVT ensemble, lasting 650 ns and 1000 ns for the dihedral and dual modes, respectively (G3 and G4, Table 1).
  • the new variant GaMD Gasian accelerated MD
  • results appreciable loss of protein secondary structures were found, thereby not applying this approach herein.
  • tMD Targeted Molecular Dynamics
  • tMD induces conformational transition between two known states by means of steering forces (Schlitter et al., J. Mol. Graphics 12, 84-89, 1994; Schlitter et al., Mol. Simul. 10, 291-308, 1993).
  • RMSD root-mean-square deviation
  • RMSD(t) is the instantaneous best-fit RMSD of the current coordinates from the target conformation
  • RMSD*(t) evolves linearly from the initial RMSD at the first tMD step to the final value at the last step.
  • the two start structures for tMD were extracted from the replicated long cMID simulations (Table 1), based on the HNH domain closeness to the putative catalytic state modeled from the crystal structure of T4 endonuclease VII (Endo VII) complexed with a DNA Holliday junction (See below and FIG. 11 )(Biertumpfel et al., Nature 449, 616-U614, 2007).
  • the inventors did not employ the tMD end structures (i.e., at 100 ns) as the start points for Mg 2+ introduction, given that the modeled target coordinates used in tMD do not necessarily represent a true catalytic state, and importantly, that the Mg 2+ might assist further conformation change to bridge the distance gap for catalysis as we previously demonstrated (Zuo and Liu, Sci. Rep. 5, 2016).
  • This consideration allowed for spontaneous adaptation of the system to the catalytic conformation, thereby eliminating the potential artifacts from tMD.
  • the inventors proceeded to perform a set of conventional simulations started from the derived catalytic state, in which the above placed Mg 2+ was moved from the active center to the bulk solution (G7, Table 1).
  • PCA Principal Component Analysis
  • step 1 the scissile phosphate and flanking nucleotides in the T4 Endo VII system (2QNC) was aligned to the corresponding tDNA stretch in the Cas9 complex of the pre-catalytic state (5F9R).
  • step 2 Cas9 HNH domain was moved toward the tDNA with the transformation matrix calculated from the paired ⁇ motifs in the two nucleases, resulting in a model of the HNH domain docked at the cleavage site.
  • the equivalent residues between the above ⁇ motifs for transformation matrix calculation were determined based on topology-independent structure superposition by the CLICK algorithm (Nguyen et al., Nucleic Acids Res.
  • the backbone RMSD of HNH domain between the pre-catalytic Cas9 state (5F9R) and the modeled “active” state is 25 ⁇ ( FIG. 11 f ).
  • step 3 the inventors repeated step 1 and step 2, replacing the crystal structure (5F9R) with snapshot structures from the sets of long cMD trajectories (G1 and G2, Table 1).
  • a modeled “active” state was obtained for every snapshot of the simulations.
  • the inventors calculated RMSD between the snapshot structure and its corresponding “active” state and used it as a metric to evaluate how close the snapshot conformation to its putative active state.
  • the backbone RMSD differences for the HNH domain from the target structures are about 10 ⁇ , which are remarkably reduced as compared with that of 25 ⁇ if using the pre-catalytic crystal structure (5F9R) as the starting point. Accordingly, the tMD stating points were much closer to the corresponding end points in the subspace defined by the first two principal components with regard to the crystal structure ( FIG. 1 d ). Docking of the HNH domain toward the putative catalytic state inevitably brings about numerous steric clashes with the other components in the complex system ( FIG. 11 f ), indicating considerable conformational rearrangements in Cas9 must be implicated during the pre-catalytic to catalytic state transition.
  • the RMSD between the initial and target coordinates declined to ⁇ 0.8 ⁇ , indicating completion of the anticipated conformation change.
  • the inventors selected two structure snapshots that are at near the end of tMD for subsequent cMID (G6, Table 1), in which one Mg 2+ was introduced at the interface between the HNH domain and tDNA in the framework of the one-metal-ion mechanism ( FIG. 2 d )(Yang, Q. Rev. Biophys. 44:1-93, 2011 ; Yang, Nat. Struct. Mol. Biol. 15:1228-31, 2008).
  • the inventors did not employ the tMD end structures (i.e., at 100 ns) as the start points for Mg 2+ introduction, given that the modeled target coordinates used in tMD do not necessarily represent a true catalytic state, and importantly, that the Mg 2+ may assist further conformation change to bridge the distance gap for catalysis as previously demonstrated with the RuvC domain (Zuo and Liu, Sci. Rep. 5, 2016).
  • This consideration allowed for spontaneous adaptation of the system to the catalytic conformation.
  • the deliberate building procedures could ensure least perturbation on the system and hence eliminate potential artificial effects by the tMD that is readily subjected to question.
  • the basic idea is as follows: (i) pre-define an a priori metric (or multiple if necessary) like distance, angle and RMSD; (ii) use this metric to track conformational transition and screen a structure most approximate to expected target state; (iii) perform ensemble conventional MD simulations (cMD ens ) starting from the above extracted structure; (iv) screen another closest structure snapshot from previous cMD ens and initiate a new cycle of ensemble simulations. Ideally, the inventors get closer to or even hit the target conformation through several or more cycles, depending on the energetic barrier height between the initial and target states and the sampling length accessible to each independent run.
  • the inventors used the geometric mean of the distances of +4P (the scissile phosphate) to the two active residues His840 and Asp861 ( ⁇ square root over (d +4P ⁇ H840 *d + 4P ⁇ D861 ) ⁇ ) as a metric to monitor the HNH domain conformational change: the smaller this value, the closer to the target active state ( FIGS. 4 a and 4 b ). From the sets of long cMD trajectories (G1 and G2, Table 1), a structure bearing a minimum value of ⁇ 9 ⁇ was extracted as the starting point for ensemble simulations ( FIG. 3 a ), where one Mg 2+ was placed around the reaction center as done for the post tMD simulations.
  • MM-GBSA Molecular Mechanics-Generalized Born Surface Area
  • Non-bonded Interaction Energy Calculation The non-bonded interaction energies of the HNH ⁇ motif with the scissile phosphate and flanking nucleotides (+3P to +5P) were calculated by the software NAMD (version 2.12)(Phillips et al., J. Comput. Chem. 26:1781-1802, 2005), employing the same structural ensemble as mentioned above.
  • NAMD version 2.12
  • the truncation cutoff was set to 10 ⁇ , consistent with that used in MD simulations.
  • the inventors have identified two states, the pseudo active state and the active state, using computational techniques. These two states have similar global conformations. The major distinction lies in the local conformation involving the residues N863 and D861.
  • the active state of the Cas9 HNH domain identified by computer modeling and simulations is responsible for the tDNA cleavage.
  • the inventors have performed site-directed mutagenesis experiments to validate this newly identified active state. Four single mutations (D837A, D839A, D861A, and N863A) plus one double mutation (D861A/N863A) was performed ( FIG. 15 ).
  • the combined experimental and computational data suggest that D839 and N863 are the essential residues for Cas9 activity by directly coordinating the catalytic Mg 2+ at the interface between the HNH domain and tDNA, validating the newly identified active state.
  • the initial model for the active Cas9 complex was constructed by replacing the ⁇ segment of the ⁇ -Me motif in the optimized catalytic Cas9 complex with the corresponding part in the Mg 2+ -bound apo-Cas9 structure (PDB code: 4CMP).
  • the catalytic Cas9 complex structure was taken from the above production simulation, as described in ⁇ [137], near 100 ns (i.e., about half of the simulation time), and the Mg2+-bound apo-Cas9 structure from the simulation trajectory was selected based on the observation of reasonable bonding with the connecting residues and minimal steric clashes after replacement of the a segment.
  • the structural model was subjected to multi-stage equilibration: an initial 20-ns relaxation of the ⁇ segment and surrounding residues, an another 20-ns equilibration with the inter-atomic distances within the metal center retrained relative to the T4 Endo VII system, followed by a 20-ns equilibration with the restraints gradually released. Subsequently, two independent replicas were performed (250 ns/run)under the same simulation conditions set for the pseudo-active system above.
  • Cas9 variants were designed and synthesized to test its activity and specificity. (Table 5). The mutation designed in each variant followed the combination of five rationales, including (1) weakening Cas9 binding affinity with tDNA; (2) weakening Cas9 binding affinity with ntDNA; (3) weakening Cas9 binding affinity with sgRNA; (4) raising threshold energy for Cas9 HNH domain conformational activation; (5) destabilizing the formation of Cas9 HNH domain active conformation.
  • Two variants include mutations designed on all of the five rationales. These two mutants are N588A/R765A/D835A/K1246A (Mut1.8) and N14A/R447A/R765A/S845D (Mut1.9) (Table 5, FIG. 16 a - 16 c ).
  • the gene-editing activities and specificity assays of these two tetramutant variants of Cas9 ( FIG. 16 a - 16 c ) were performed. Using HEK293T-EGFP cells, the above two tetramutants exhibit similar protein expression level and comparable gene-editing efficiency compared to the wild type Cas9 ( FIG. 16 a - 16 c ), indicating these two designed variants do not significantly alter the on-target activity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Physiology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Plant Pathology (AREA)
  • Computing Systems (AREA)

Abstract

Certain embodiments are directed to modified or variant Cas9 proteins, and/or methods of using the same.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 16/645,254 filed Mar. 6, 2020, which is a national phase under 35 U.S.C. § 371 of International Application No. PCT/US2018/050279 filed Mar. 14, 2019, which claims priority to U.S. Provisional Application No. 62/555,873 filed Sep. 8, 2017. Each disclosure is incorporated herein by reference in its entirety.
  • SEQUENCE LISTING
  • The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety.
  • BACKGROUND OF THE INVENTION A. Field of the Invention
  • The invention generally concerns an engineered Cas9 protein and method for producing and/or using the same.
  • B. Description of Related Art
  • The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) system from Streptococcus pyogenes has recently been repurposed as a powerful and versatile genome-editing toolbox used in various living cells and organisms, demonstrating an enormous potential toward future therapeutic applications (Jiang and Doudna, Annu. Rev. Biophys., 2017; Charpentier and Doudna, Nature 495, 50-51, 2013; Mali et al. Science 339, 823-26, 2013; Cong et al. Science 339, 819-23, 2013). Guided by a chimeric single-guide RNA (sgRNA), the endonuclease Cas9 generates site-specific breaks in the double-stranded DNA (dsDNA) target (Jinek et al. Science 337, 816-21, 2012; Gasiunas et al. Proc. Natl. Acad. Sci. U.S.A. 109, E2579-86, 2012). Recognition and cleavage of dsDNA strictly require the presence of a protospacer adjacent motif (PAM) in the non-target DNA strand (ntDNA) and depend on the base-pair complementarity of the target DNA strand (tDNA) to the RNA guide template (Jinek et al. Science 337, 816-21, 2012; Gasiunas et al. Proc. Natl. Acad. Sci. U.S.A. 109, E2579-86, 2012). Cas9 adopts an overall bi-lobed architecture, in which the sgRNA:tDNA heteroduplex resides within the central channel between the α-helical recognition (REC) and nuclease (NUC) lobes, while the displaced ntDNA threads into a side channel within the NUC lobe (FIG. 7 ) (Jiang et al. Science 351, 867-71, 2016; Jiang et al. Science 348, 1477-81, 2015; Nishimasu et al. Cell 156, 935-49, 2014; Anders et al. Nature 513, 569-73, 2014). The NUC lobe comprises of two metal-ion-dependent nuclease domains, dubbed as HNH and RuvC, which are responsible for cutting the tDNA (via one-metal-ion mechanism) (Yang, Q. Rev. Biophys. 44, 1-93, 2011; Yang, Nat. Struct. Mol. Biol. 15, 1228-31, 2008) and ntDNA (via two-metal-ion mechanism (Yang, Q. Rev. Biophys. 44, 1-93, 2011; Yang, Nat. Struct. Mol. Biol. 15, 1228-31, 2008; Yang et al., Mol. Cell 22, 5-13, 2006), respectively.
  • Capturing catalytic metal ion-containing nuclease/substrate complexes has been nontrivial for experimental means like X-ray crystallography and NMR spectroscopy, as the reaction generally occurs instantly (Yang et al., Mol. Cell 22, 5-13, 2006). It is thus not surprising that none of the Cas9 crystal structures in different binding forms solved over the past few years assumes a fully active state for either RuvC or HNH domain (Jiang et al. Science 351, 867-71, 2016; Jiang et al. Science 348, 1477-81, 2015; Nishimasu et al. Cell 156, 935-49, 2014; Anders et al. Nature 513, 569-73, 2014; Jinek et al. Science 343, 1247997, 2014).
  • In the inventors' recent work, using molecular dynamics simulations, the catalytically competent state of RuvC domain primed for cleaving the ntDNA was reported (Zuo and Liu, Sci. Rep. 5, 2016). However, the inventors were unable to capture the catalytic conformation of the HNH domain for cleaving the tDNA in the previous study (Zuo and Liu, Sci. Rep. 5, 2016). In contrast with the RuvC domain, the active center of HNH domain is surprisingly distant from the scissile phosphate on the tDNA in all available structures (Jiang et al. Science 351, 867-71, 2016; Jiang et al. Science 348, 1477-81, 2015; Nishimasu et al. Cell 156, 935-49, 2014; Anders et al. Nature 513, 569-73, 2014), with a separation of ˜13 Å in the complete DNA duplex bound pre-catalytic state (FIG. 7 a-7 b ) to ˜46 Å in the RNA-only bound inactive state. In this respect, how to obtain a reliable catalytic state of Cas9 HNH domain has been of special focus to the experimental biologists and the computational biophysicists, as this structure can bridge one important missing link in understanding Cas9 binding, activation and cleavage mechanism and guides structure-based Cas9 engineering with enhanced specificity (Slaymaker et al. Science 351, 84-88, 2016; Kleinstiver et al. Nature 529, 490-95, 2016). A most recent single-molecule Förster resonance energy transfer (smFRET) study suggested that divalent metal ions are necessary for Cas9 conformational activation toward catalysis (Dagdas et al. bioRxiv, 122242, 2017). At the atomic level, however, how the metal ions aid HNH domain transition to the catalytic state remains elusive.
  • The knowledge of structure and dynamics of the catalytic state of HNH domain is critical for Cas9 specificity improvement. The off-target effects pose a major challenge for Cas9-mediated genome-editing applications requiring a high level of precision. Remarkably, a recent study found that CRISPR-Cas9 induced an unexpected high number of new mutations in a mouse model of gene therapy, involving thousands of single-nucleotide variants (SNVs) and hundreds of insertions and deletions (indels) (Schaefer et al. Nat. Methods 14, 547-548, 2017). Therefore, much effort is needed to increase the fidelity of CRISPR-Cas9 with regard to off-target mutation generation, especially in the clinical setting (Schaefer et al. Nat. Methods 14, 547-548, 2017). Recently, two works proposed that Cas9-guide RNA possesses more energy than needed for optimal recognition of its intended target sequence, thereby enabling cleavage at mismatched off-target sites (Slaymaker et al. Science 351, 84-88, 2016; Kleinstiver et al. Nature 529, 490-95, 2016). Based on the inactive structure of Cas9-sgRNA complex with a partial dsDNA target (Anders et al. Nature 513, 569-573, 2014), several high-fidelity Cas9 variants have been designed and validated for elimination of off-target effects, demonstrating the structure-guided Cas9 engineering as a robust strategy for specificity improvement (Slaymaker et al. Science 351, 84-88, 2016; Kleinstiver et al. Nature 529, 490-49, 2016). Given that all the previous efforts were based on an inactive structure, structural information of other Cas9 conformational states, especially the catalytic state, could enable further optimization of the CRISPR-Cas9 genome-editing toolbox.
  • SUMMARY OF THE INVENTION
  • The Cas9 variants of the current invention provide a solution to the off-target/fidelity problems associated with native and current Cas9 variants. In particular aspects, the amino acid variants are in the HNH domain region of Cas9. By way of example, the inventors have discovered a process to model the structure of Cas 9 in an appropriate active state, which results in the identification and design of additional variants of Cas9 having appropriate activity that enhance fidelity. Without wishing to be bound by theory, it is believed that the use of these additional variants alone or in combination with other variants results in a high fidelity Cas9 protein for use in genetic engineering methods.
  • Molecular dynamics (MD) is a powerful computer simulation method and has been proven to be especially useful for elucidating the structure-function relationships of biological macromolecules (Shaw et al. Science 330, 341-46, 2010). With two distinct MD simulation techniques, the inventors show a cross-validated catalytically active state of Cas9 HNH nuclease domain not amenable to experiments. Meanwhile, the inventors demonstrate at the atomic level the roles of Mg2+ for formation and stability of the catalytic state. The derived catalytic model provides novel valuable structure information that can be exploited for rational engineering of high-fidelity Cas9 variants.
  • Generally, it has been assumed that Cas9 enhanced specificity by site-specific mutations stems from reduced binding affinities for the off-target sites. In this invention, the inventors propose that mutations designed for attenuating the activation of Cas9 HNH nuclease domain could also be employed for improving the Cas9 targeting accuracy, given the observation that HNH domain undergoes a substantial rotation of ˜180 degrees during the inactive to active state transition. Thus, the Cas9 residues (except the HNH domain) forming non-specific contacts with the HNH domain or the HNH domain residues forming non-specific contacts with other Cas9 domain and/or nucleic acids (target DNA and/or gRNA) comprise the additional promising mutation sites for rational Cas9 engineering. From a physiochemical perspective, these amino acid substitutions raise the threshold energy underlying HNH conformational activation against the off-target substrates, thereby requiring more stringent Watson-Crick base pair complementarity.
  • Remarkably, the concept described herein expands the mutation range and mutation types for Cas9. For instance, the residues beyond the previously identified DNA-binding regions can be considered for modifications. Hence, the residues of interest are no longer limited to the polar and positively charged types. In some embodiments here, the Cas9 variants contain alterations to the acidic residues, and also, the substitutions are not limited to alanine, depending on design needs. In certain aspects the substitution can be one or more of alanine (Ala, A), arginine (Arg, R), asparagine (Asn, N), aspartic acid (Asp, D), cysteine (Cys, C), glutamic acid (Glu, E), glutamine (Gln, Q), glycine (Gly, G), histidine (His, H), isoleucine (Ile, I), leucine (Leu, L), lysine (Lys, K), methionine (Met, M), phenylalanine (Phe, F), proline (Pro, P), serine (Ser, S), threonine (Thr, T), tryptophan (Trp, W), tyrosine (Tyr, Y), or valine (Val, V) in place of the native amino acid.
  • In certain embodiments, the spCas9 variants comprise one, two, three, four or more simultaneous mutations at the following positions of SEQ ID NO:1: T13, N14, S15, S55, T58, E60, R63, R66, T67, R70, R71, Y72, R74, R78, Y136, K163, R165, H167, S217, K218, S219, E223, N235, K234, D261, K263, Q265, S267, K268, T249, N251, T270, E370, E371, E396, Q402, R403, T404, D406, N407, S409, H415, R447, Y450, Y451, R461, R494, T496, N497, K500, K510, Y515, T519, N522, K526, K528, K558, S581, E584, D585, R586, N588, T624, Y656, T657, R661, N692, Q695, H698, S730, K734, R765, N767, Q768, T769, T770, Q771, K772, Q774, K775, N776, S777, R778, E779, R780, K782, R783, N803, Q805, Q807, K810, Y812, D829, N831, R832, S834, D835, Q844, S845, K848, R859, K862, R864, K866, K890, T893, Q894, R895, D898, N899, K902, K913, K918, R919, Q920, T924, R925, Q926, T928, K929, H930, S960, K961, S964, K968, R976, H982, H983, Y1013, K1031, T1033, 51106, K1107, 51109, Y1237, Y1242, K1244, and/or K1246.
  • Certain embodiments are directed to modified or variant Cas9 proteins. The modified Cas9 protein comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 modifications including one or more modification or variant corresponding to Thr58, Glu60, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1. In certain aspects the modified Cas9 protein has at least two amino acid modifications. The modified Cas9 protein can further comprise one or more modification that includes modification of Asn14, Lys268, Glu370, Arg447, Tyr450, Asn497, Lys500, Lys526, Lys528, Lys558, Asn588, Arg661, Asn692, Gln695, Arg780, Arg783, Asn803, Gln805, Lys810, Tyr812, Asp829, Asn831, Arg832, Asp835, Gln844, Lys848, Lys862, Arg925, Gln926, Lys929, His930, Lys961, Lys968, Tyr1013, Lys1031, Lys1244, or Lys1246 corresponding to SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Thr58 of SEQ ID NO:1 in combination with one or more modification corresponding to Glu60, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu60 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu223, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Gln844, Arg859, Arg780, Arg783, Asn803, Gln807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu223 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu396, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu396 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Glu371, Asp406, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu370 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu371, Asp406, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu371 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Asp406 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Glu371, Glu396, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu584 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Asp585 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg586 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg765 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Asn767 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg778 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Glu779 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Ser845 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to G1n844 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg859 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg780 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg783, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg783 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Asn803, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Asn803 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, G1n807, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to G1n807 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, Tyr812, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Tyr812 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Arg864, Lys866, or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Lys866 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864 or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Arg864 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Lys866 or Lys918 of SEQ ID NO:1.
  • In certain embodiments the modified Cas9 protein has a modification corresponding to Lys918 of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58, Glu60, Glu223, Glu370, Asp406, Glu396, Glu371, Glu584, Asp585, Arg586, Arg765, Asn767, Arg778, Glu779, Ser845, G1n844, Arg859, Arg780, Arg783, Asn803, G1n807, Tyr812, Arg864 or Lys866 of SEQ ID NO:1.
  • The modification can be any amino acid other than the amino acid present in a corresponding position in SEQ ID NO:1. In a further aspect the modification can be an alanine, glycine, lysine, arginine, aspartic acid, or glutamic acid substitution. In certain aspects the modification can be of SEQ ID NO:1 in combination with one or more modification corresponding to Thr58Lys, Thr58Arg, Glu60A1a, Glu223A1a, Glu370A1a, Asp406A1a, Glu396A1a, Glu371A1a, Glu584A1a, Asp585A1a, Arg586A1a, Arg765A1a, Asn767A1a, Arg778A1a, Glu779A1a, Ser845Asp, Gln844Glu, Arg859Ala, Arg780Ala, Arg783Ala, Asn803Ala, Gln807Ala, Tyr812Ala, Lys918Ala, Arg864Ala or Lys866Ala modification corresponding to SEQ ID NO: 1.
  • In certain embodiments, the spCas9 variants include, but not are limited to, the following combination of mutations: N588A/R765A/N767A; N588A/Q695A/R765A/N767A; N588A/N692A/R765A/N767A; N588A/N692A/R765A/R925A; N588A/N692A/N767A/R925A; N692A/R765A/N767A/R925A; Q695A/R765A/N767A/R925A; N588A/N692A/R765A/K929A; N588A/N692A/N767A/K929A; N692A/R765A/N767A/K929A; Q695A/R765A/N767A/K929A; N497A/Q695A/R765A/N767A; K526A/K528A/N497A/Q926A; K526A/K528A/K929A; K526A/R765A/N767A/Y1013A; K528A/R765A/N767A/Y1013A; K526A/R765A/N767A/Q926A; N497A/K526A/R765A/N767A; N497A/K528A/R765A/N767A; N497A/K526A/R765A/Q926A; N497A/K528A/R765A/Q926A; N588A/R765A/N767A/S845D; N588A/R765A/N767A/R832A; N588A/R765A/N767A/K862A; N588A/R765A/N767A/K866A; N588A/R765A/N767A/R859A; N588A/R765A/N767A/Q844A; N588A/R765A/N767A/K810A; N588A/R765A/N767A/K848A; N588A/R765A/N767A/E370A; N588A/R765A/N767A/E223A; N497A/N692A/K1031A/S845D; N497A/N692A/K1031A/R832A; N497A/N692A/K1031A/K862A; N497A/N692A/K1031A/K866A; N497A/N692A/K1031A/R859A; N497A/N692A/K1031A/Q844A; N497A/N692A/K1031A/K810A; N497A/N692A/K1031A/K848A; N497A/N692A/K1031A/E370A; N497A/N692A/K1031A/E223A; N497A/N695A/K1031A/S845D; N497A/N695A/K1031A/R832A; N497A/N695A/K1031A/K862A; N497A/N695A/K1031A/K866A; N497A/N695A/K1031A/R859A; N497A/N695A/K1031A/Q844A; N497A/N695A/K1031A/K810A; N497A/N695A/K1031A/K848A; N497A/N695A/K1031A/E370A; N497A/N695A/K1031A/E223A; K526A/N695A/K1031A/S845D; K526A/N695A/K1031A/R832A; K526A/N695A/K1031A/K862A; K526A/N695A/K1031A/K866A; K526A/N695A/K1031A/R859A; K526A/N695A/K1031A/Q844A; K526A/N695A/K1031A/K810A; K526A/N695A/K1031A/K848A; K526A/N695A/K1031A/E370A; K526A/N695A/K1031A/E223A; K528A/N695A/K1031A/S845D; K528A/N695A/K1031A/R832A; K528A/N695A/K1031A/K862A; K528A/N695A/K1031A/K866A; K528A/N695A/K1031A/R859A; K528A/N695A/K1031A/Q844A; K528A/N695A/K1031A/K810A; K528A/N695A/K1031A/K848A; K528A/N695A/K1031A/E370A; K528A/N695A/K1031A/E223A; N692A/R765A/Y1013A; N692A/R765A/S845D/Y1013A; N692A/R765A/R832A/Y1013A; N692A/R765A/K862A/Y1013A; N692A/R765A/K866A/Y1013A; N692A/R765A/R859A/Y1013A; N692A/R765A/Q844A/Y1013A; N692A/R765A/K810A/Y1013A; N692A/R765A/K848A/Y1013A; N692A/R765A/E370A/Y1013A; N692A/R765A/E223A/Y1013A; N692A/R765A/Y1013A; N692A/Q695A/K810A/Y1013A; N692A/Q695A/K848A/Y1013A; K526A/K528A/Y1013A; K526A/K528A/K268A/Y1013A; R447A/K526A/K528A/Y1013A; R765A/K929A/H930A; R765A/K929A/S845D/Y1013A; R765A/K929A/R832A/Y1013A; R765A/K929A/K862A/Y1013A; R765A/K929A/K866A/Y1013A; R765A/K929A/R859A/Y1013A; R765A/K929A/Q844A/Y1013A; R765A/K929A/K810A/Y1013A; R765A/K929A/K848A/Y1013A; R765A/K929A/E370A/Y1013A; R765A/K929A/E223A/Y1013A; R765A/Q926A/K929A/H930A; R447A/K500A/R661A; K500A/N695A/K929A/S845D; K500A/N695A/K929A/R832A; K500A/N695A/K929A/K862A; K500A/N695A/K929A/K866A; K500A/N695A/K929A/R859A; K500A/N695A/K929A/Q844A; K500A/N695A/K929A/K810A; K500A/N695A/K929A/K848A; K500A/N695A/K929A/E370A; K500A/N695A/K929A/E223A; R765A/R925/Q926A; R765A/R925/Q926/Y1013A; N14A/K961A/K968A; N14A/K961A/K968A/S845D; N14A/K961A/K968A/K848A; R447A/R765A/Y1013A; K526A/N588A/R765A/N767A; N588A/K929A/H930A/Y1013A; R447A/K526A/K929A; N588A/N767A/Y1013A/K866A; N588A/N767A/Y1013A/S845D; K268A/K526A/N588A/N767A; N14A/K526A/K866A/K1246A; N14A/R447A/Y1013A/K1246A; N588A/R765A/D835A/K1246A; N14A/R447A/R765A/S845D; K1244A/K1246A/K848A; K1244A/K1246A/K810A; K1244A/K1246A/R832A; K1244A/K1246A/K862A; K1244A/K1246A/K866A; K1244A/K1246A/R859A; K1244A/K1246A/E370A; K1244A/K1246A/E223A; K1244A/K1246A/S845D; K1244A/K1246A/Q844A; K1244A/K1246A/Q844A/K1031A; K1244A/K1246A/Q844A/Y1013A; K1244A/K1246A/Q844A/N695A; K1244A/K1246A/Q844A/N692A; K1244A/K1246A/Q844A/N588A; K1244A/K1246A/Q844A/N767A; K1244A/K1246A/Q844A/Q926A; K268A/R447A/Y450A/K1031A; K268A/R447A/Y450A/Y1013A; K268A/R447A/Y450A/N695A; K268A/R447A/Y450A/N692A; K268A/R447A/Y450A/N588A; K268A/R447A/Y450A/N767A; K268A/R447A/Y450A/Q926A; N14A/K268A/R447A/Y450A; N14A/Y450A/K526A/K528A; N14A/Y450A/R765A/S845D; N14A/Y450A/R765A/R832A; N14A/Y450A/R765A/K862A; N14A/Y450A/R765A/K866A; N14A/Y450A/R765A/R859A; N14A/Y450A/R765A/Q844A; N14A/Y450A/R765A/K810A; N14A/Y450A/R765A/K848A; N14A/Y450A/R765A/E370A; N14A/Y450A/R765A/E223A; R447A/Y450A/R765A/S845D; R447A/Y450A/R765A/R832A; R447A/Y450A/R765A/K862A; R447A/Y450A/R765A/K866A; R447A/Y450A/R765A/R859A; R447A/Y450A/R765A/Q844A; R447A/Y450A/R765A/K810A; R447A/Y450A/R765A/K848A; R447A/Y450A/R765A/E370A; R447A/Y450A/R765A/E223A; K268A/R447A/R765A/S845D; K268A/R447A/R765A/R832A; K268A/R447A/R765A/K862A; K268A/R447A/R765A/K866A; K268A/R447A/R765A/R859A; K268A/R447A/R765A/Q844A; K268A/R447A/R765A/K810A; K268A/R447A/R765A/K848A; K268A/R447A/R765A/E370A; K268A/R447A/R765A/E223A; Q805A/D829A/N831A/D835A; R765A/D829A/D835A/Y1013A; R918A/D829A/D835A/Y1013A; R895A/D829A/D835A/Y1013A; K500A/D829A/D835A/Y1013A; K929A/D829A/D835A/Y1013A; R780A/D829A/D835A/Y1013A; R783A/D829A/D835A/Y1013A; R765A/D829A/D835A/N695A; R918A/D829A/D835A/N695A; R895A/D829A/D835A/N695A; K500A/D829A/D835A/N695A; K929A/D829A/D835A/N695A; R780A/D829A/D835A/N695A; R783A/D829A/D835A/N695A; N695A/R780A/R783A/S845D; N695A/R780A/R783A/R832A; N695A/R780A/R783A/K862A; N695A/R780A/R783A/K866A; N695A/R780A/R783A/R859A; N695A/R780A/R783A/Q844A; N695A/R780A/R783A/K810A; N695A/R780A/R783A/K848A; N695A/R780A/R783A/E370A; N695A/R780A/R783A/E223A; N692A/R780A/R783A/S845D; N692A/R780A/R783A/R832A; N692A/R780A/R783A/K862A; N692A/R780A/R783A/K866A; N692A/R780A/R783A/R859A; N692A/R780A/R783A/Q844A; N692A/R780A/R783A/K810A; N692A/R780A/R783A/K848A; N692A/R780A/R783A/E370A; N692A/R780A/R783A/E223A; N692A/R780A/N803A/S845D; N692A/R780A/N803A/R832A; N692A/R780A/N803A/K862A; N692A/R780A/N803A/K866A, N692A/R780A/N803A/R859A; N692A/R780A/N803A/Q844A; N692A/R780A/N803A/K810A; N692A/R780A/N803A/K848A; N692A/R780A/N803A/E370A; N692A/R780A/N803A/E223A; N692A/R783A/N803A/S845D; N692A/R783A/N803A/R832A; N692A/R783A/N803A/K862A; N692A/R783A/N803A/K866A; N692A/R783A/N803A/R859A; N692A/R783A/N803A/Q 844A; N692A/R783A/N803A/K810A; N692A/R783A/N803A/K848A; N692A/R783A/N803A/E370A; N692A/R783A/N803A/E223A; N695A/R783A/N803A/S845D; N695A/R783A/N803A/R832A; N695A/R783A/N803A/K862A; N695A/R783A/N803A/K866A; N695A/R783A/N803A/R859A; N695A/R783A/N803A/Q844A; N695A/R783A/N803A/K810A; N695A/R783A/N803A/K848A; N695A/R783A/N803A/E370A; N695A/R783A/N803A/E223A; N695A/R783A/Y812A/S845D; N695A/R783A/Y812A/R832A; N695A/R783A/Y812A/K862A; N695A/R783A/Y812A/K866A; N695A/R783A/Y812A/R859A; N695A/R783A/Y812A/Q 844A; N695A/R783A/Y812A/K810A; N695A/R783A/Y812A/K848A; N695A/R783A/Y812A/E370A; N695A/R783A/Y812A/E223A; K500A/N588A/S845D/Y1013A; K500A/N588A/R832A/Y1013A; K500A/N588A/K862A/Y1013A; K500A/N588A/K866A/Y1013A; K500A/N588A/R859A/Y1013A; K500A/N588A/Q844A/Y1013A; K500A/N588A/K810A/Y1013A; K500A/N588A/K848A/Y1013A; K500A/N588A/E370A/Y1013A; K500A/N588A/E223A/Y1013A; K500A/N588A/S845D/Y1013A; N588A/N692A/K1244A/K1246A; R447A/R765A/N497A; R447A/R765A/K929A; R447A/R765A/N767A; R447A/R765A/N767A/K558A; R447A/R765A/N767A/R586A; R447A/R765A/N767A/K1244A; R447A/R765A/N767A/K1246A; R447A/R765A/N767A; R447A/N695A/R765A/N767A; R447A/R765A/N695A/K558A; R447A/R765A/N695A/R586A; R447A/R765A/N695A/K1244A; R447A/R765A/N695A/K1246A; R447A/R765A/N767A/K1246A; R447A/N695A/R765A/N767A; R447A/R765A/N695A/K558A; R447A/R765A/N695A/R586A; R447A/R765A/N695A/K1244A; R447A/R765A/N695A/K1246A; R447A/N692A/R765A/N767A; R447A/R765A/N692/K558A; R447A/R765A/N692/R586A; R447A/R765A/N692/K1244A; or R447A/R765A/N692/K1246A.
  • Certain embodiments are directed to modified Cas9 protein having the Cas9 mofication selected from K526A/N588A/R765A/N767A; N588A/K929A/H930A/Y1013A; R447A/K526A/K929A; N588A/N767A/Y1013A/K866A; N588A/N767A/Y1013A/S845D; K268A/K526A/N588A/N767A; N14A/K526A/K866A/K1246A; N14A/R447A/Y1013A/K1246A; N588A/R765A/D835A/K1246A; or N14A/R447A/R765A/S845D. In particular aspects the Cas9 mofication is N588A/R765A/D835A/K1246A or N14A/R447A/R765A/S845D.
  • The modified Cas9 protein can be coupled or fused with a heterologous polypeptide or peptide. In certain aspects the modified Cas9 protein can include a nuclear localization signal, a cell penetrating amino acid sequence, or an affinity tag.
  • In certain aspects the modified Cas9 protein is a modified Streptococcus pyogenes Cas9 protein. In a further aspect the modified Cas9 protein can be 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% identical to SEQ ID NO:1, while retaining at least some of the Cas9 function of the protein of SEQ ID NO:1. The modified Cas9 protein can have at least 20, 30, 40, 50, 60, 70, 80, 90% fewer off-target events as compared to non-modified Cas9. Furthermore, the modified Cas9 protein can cleave at least 60, 65, 70, 75, 80, 85, 90, 95, to 100%, including all values and ranges there between, of the target sites as compared to non-modified Cas9, thus maintaining sufficient activity. The modified Cas9 protein can have a frequency of off-site events that is at least 20, 30, 40, 50, 60, 70, 80, 90% lower than off-target events as compared to non-modified Cas9. Specificity (fidelity) and cleavage activity of Cas9 variant are quantified as compared with the wild type protein. A gRNA targets a specific gene sequence, therefore there are a certain number of known off-target sequences. The native Cas9/gRNA complex is able to cleave the target DNA and all the off-target DNA sequences. The modified Cas9 protein reduces the cleavage of the off-target DNA sequence. The specificity (fidelity) can be determined by measuring the number of off-target cleavage. The lower number of off-target site cleavages, the higher the specificity (fidelity). For example, if a designed Cas9 mutant yields cleavage only at 10% of the off-target sites compared to the wild type protein, meaning 90% fewer off-target events, the gene editing specificity can be regarded as improving by 90%. The on-target activities of Cas9 proteins can be assessed using the human cell-based enhanced GFP (EGFP) disruption assay. For example, the wild type Cas9 guided by a fully matched gRNA induces 90% EGFP disruption, a certain Cas9 variant exhibiting a disruption percentage around that value (80%, 95%, for example) is considered as possessing the wild-type or near wild-type cleavage efficiency. In certain aspects of the invention, the criterion of >70% of wild-type activity is used for screening potential Cas9 variants for subsequent tests on a whole-genome level.
  • Certain embodiments are directed to a fusion protein comprising the modified Cas9 protein fused to a heterologous peptide or protein, with an optional intervening linker.
  • Other embodiments are directed to an expression cassette encoding the modified Cas9 protein or fusion protein comprising the modified Cas9 protein.
  • Still other embodiments are directed to an expression vector comprising the expression cassette encoding the modified Cas9 protein or fusion protein comprising the modified Cas9 protein.
  • Certain embodiments are directed to a host cell expressing an expression cassette of the invention. In certain aspects the host cell is an isolated host cell or a host in culture.
  • Other embodiments are directed to a host cell comprising a modified Cas9 protein described herein.
  • Certain embodiments are directed to methods of using such a modified Cas9 protein. Certain aspects include methods of altering the genome of a cell, the method comprising expressing in the cell or contacting the cell with the modified Cas9 protein described herein. In a further aspect the modified Cas9 protein is linked to a guide RNA having a region complementary to a selected portion of the genome of the cell. The method resulting in the alteration of the genome of the cell.
  • Other embodiments are directed to an active state model of the HNH domain of Cas9 comprising a divalent cation at the interface of a ββα motif and a scissile phosphate. In certain aspects the divalent cation is Mg, Mn, Ca, or Co.
  • Still other embodiments are directed to methods of modeling an active state of a Cas9 HNH domain. The methods can comprise at least the steps of (a) aligning a scissile phosphate and flanking nucleotides of a T4 Endo VII system (2QNC) to corresponding tDNA stretch in the Cas9 complex of the pre-catalytic state (5F9R); (b) calculating a tDNA transformation matrix from the paired ββα motifs in the two nucleases, resulting in a model of the HNH domain docked at the cleavage site; (c) repeating a and b, replacing the crystal structure (5F9R) with snapshot structures from the sets of long cMD trajectories; (d) replacing the α segment of the ββα-Me motif in the optimized Cas9 complex from c with the corresponding part in the Mg2+-bound apo-Cas9 structure (4CMP); (e) performing long cMD simulations to obtain active state of Cas9.
  • Other embodiments are directed to methods of altering a double stranded DNA (dsDNA) molecule, the method comprising contacting the dsDNA molecule with the modified Cas9 protein described herein. The modified Cas9 protein can be linked to a guide RNA having a region complementary to a selected portion of the dsDNA molecule, resulting in the alteration of the dsDNA molecule.
  • Other embodiments of the invention are discussed throughout this application. Any embodiment discussed with respect to one aspect of the invention applies to other aspects of the invention as well and vice versa. Each embodiment described herein is understood to be embodiments of the invention that are applicable to all aspects of the invention. It is contemplated that any embodiment discussed herein can be implemented with respect to any method or composition of the invention, and vice versa.
  • The terms “polypeptide”, “protein”, and “peptide”, which are used interchangeably herein, refer to a polymer of the protein amino acids, or amino acid analogs, regardless of its size or function. Although “protein” is often used in reference to relatively large polypeptides, and “peptide” is often used in reference to small polypeptides, usage of these terms in the art overlaps and varies. The term “polypeptide” as used herein refers to peptides, polypeptides, and proteins, unless otherwise noted. The terms “protein”, “polypeptide”, and “peptide” are used interchangeably herein when referring to a gene product. Thus, exemplary polypeptides include gene products, homologs, orthologs, paralogs, fragments and other equivalents, variants, and analogs of the foregoing.
  • The term “variant” or “mutant” refers to an amino acid sequence that is different from the reference polypeptide by one or more amino acids, e.g., one or more amino acid substitutions. For example a modified or variant Cas9 polypeptide differs from wild-type Cas9 (e.g., SEQ ID NO:1) by one or more amino acid substitutions, i.e., mutations.
  • “Polynucleotide,” synonymously referred to as “nucleic acid molecule” or “nucleic acids,” refers to any polyribonucleotide or polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. “Polynucleotides” include, without limitation single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, double-stranded, or a mixture of single- and double-stranded regions.
  • “Substantially similar” with respect to nucleic acid or amino acid sequences, means at least about 65% identity between two or more sequences. Preferably, the term refers to at least about 70% identity between two or more sequences, more preferably at least about 75% identity, more preferably at least about 80% identity, more preferably at least about 85% identity, more preferably at least about 90% identity, more preferably at least about 91% identity, more preferably at least about 92% identity, more preferably at least about 93% identity, more preferably at least about 94% identity, more preferably at least about 95% identity, more preferably at least about 96% identity, more preferably at least about 97% identity, more preferably at least about 98% identity, and more preferably at least about 99% or greater identity. Such identity can be determined using algorithms known in the art, such as the mBLAST algorithm.
  • The term “isolated” can refer to a nucleic acid or polypeptide that is substantially free of cellular material, bacterial material, viral material, or culture medium (when produced by recombinant DNA techniques) of their source of origin, or chemical precursors or other chemicals (when chemically synthesized). Moreover, an isolated polypeptide refers to one that can be administered to a cell or a subject; in other words, the polypeptide may not simply be considered “isolated” if it is adhered to a column or embedded in an agarose gel. Moreover, an “isolated nucleic acid fragment” or “isolated peptide” is a nucleic acid or protein fragment that is not naturally occurring as a fragment and/or is not typically in the functional state.
  • The term “providing” is used according to its ordinary meaning “to supply or furnish for use.” In some embodiments, the protein is provided directly by administering the protein, while in other embodiments, the protein is effectively provided by administering a nucleic acid that encodes the protein. In certain aspects the invention contemplates compositions comprising various combinations of nucleic acid, and/or peptides.
  • The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.”
  • The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.”
  • As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
  • The compositions and methods of making and using the same of the present invention can “comprise,” “consist essentially of,” or “consist of” particular ingredients, components, blends, method steps, etc., disclosed throughout the specification.
  • Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of the specification embodiments presented herein.
  • FIGS. 1 a-1 d . Cas9 HNH domain motions and conformational flexibility characterized by principal component analysis. (a-c) Visualization of the top three dominant motions for the HNH domain. The first motional mode depicts a rotation motion around an axis perpendicular to the plane, while the second and third modes describe a translational movement toward the tDNA and REC2 domain, respectively. The Ca atoms of the three HNH catalytic residues are represented as van der Walls spheres. (d) Overlap of the projections of the conventional MD simulations without ntDNA (cMD w/o ntDNA, blank dots), conventional MD simulations with ntDNA (cMD with ntDNA, green dots) and accelerated MD simulations without ntDNA (aMD w/o ntDNA, red dots) onto the first two eigen-vectors calculated from the whole trajectories for the HNH domain. The pre-catalytic state (PDB code: 5F9R), its modeled “catalytic” state from the crystal structure of T4 Endo VII complex with a DNA substrate, and the start and end points for the targeted MD (tMD) simulations were also projected onto the subspace defined by the first two PCA modes, along with the targeted MD (tMD)- and ensemble conventional MD (cMDens)-derived catalytic states (an average over 100 data points is reported). All the trajectories were best-fitted to the Cas9 protein (excluding the HNH domain) of the pre-catalytic crystal structure (PDB code: 5F9R), and the coordinate covariance matrix was computed over the HNH domain for subsequent analysis.
  • FIGS. 2 a-2 d . Catalytic state coordination at the interface of HNH ββα fold and tDNA (a,b) and comparison with the one-metal-ion catalysis by T4 Endo VII (c). (a,b) The representative coordination configurations derived from post targeted MD (tMD) simulations (a) and conventional ensemble MD (cMDens) simulations (b) through cluster analysis. (c) Close-up view of the active center of T4 Endo VII (N62D) resolving a DNA Holliday junction (PDB code: 2QNC). (d) Schematic representation of the one-metal-ion dependent catalysis in ββα-metal nucleases. The pro-Sp and pro-Rp oxygens of the scissile phosphate are indicated with Sp and Rp, respectively. The putative active residues in Cas9 HNH domain and the corresponding residues in T4 Endo VII are labeled in boldface. Mg2+ is shown as a cyan sphere and the potential nucleophilic water is denoted by an arrow. The dashed lines indicate the coordination bonds involving Mg2+ and hydrogen-bonds.
  • FIGS. 3 a-3 d . Comparison of the targeted MD (tMD)-derived and ensemble conventional MD (cMDens)-derived catalytic Cas9 conformations and Comparison with the crystal structure in pre-catalytic state. (a) Variation of the minimum geometric mean of the distances of +4P to His840 (d+4P-H840) and to Asp861 (d+4P-D861) as a function of the cycle number in the course of the ensemble simulations. (b) Variation of minimum binding interface RMSDs from the tMD-derived catalytic state as the cycle number increases. An average over fifty data points is reported in a and b. (c) Structural superposition between the tMD-derived catalytic state and crystal pre-catalytic state (PDB code: 5F9R). The crystal structure is represented and the largest domain movement (involving HNH, CTD and REC2) is denoted by a arrow. See also FIG. 13 for the result with cMDens. (d) Structural alignment between the tMD- and cMD′-derived catalytic Cas9 conformations.
  • FIGS. 4 a-4 c . Mg2+-aided conformational transition to catalytic state. (a) Comparison of the representative HNH conformations from the cMD simulations with Mg2+ bound (left) and with Mg2+ removed (right) at the reaction interface. The bound Mg2+ is shown as a sphere and the HNH active residues are represented in a stick model. (b) Scatter plot of the +4P distances to His840 (d+4p-H840) and Asp861 (d+4P-D861) calculated from different sets of cMD simulations. The Cγ and P atoms were selected for measurement. (c) Scatter plot of the distance pair for Ser867/Asn1054 (dS867-Nio54) and Ser355/Ser867 (dS355-s867) from different sets of cMD simulations. The Cα atoms were calculated here. The residue pairs of Ser867/Asn1054 and Ser355/Ser867 were used to characterize the conformational states of HNH domain in previous FRET experiments (FIG. 9 ). If available, the corresponding distance pairs obtained from different Cas9 complex crystal structures are mapped on each plot. Of note, in 4UN3, the loop where Asn1054 resides is disordered and we report an average of the distances calculated from respective modeled structures using 4ZTO, 4ZT9 and 5F9R as a template. The pentagrams indicate the catalytic state derived from the conventional ensemble MD (cMDens) simulations (an average over 100 data points is reported).
  • FIGS. 5 a-5 f . New interactions established between the catalytic HNH domain and other components in the complex system identified from the post target-MD (tMD) simulations. (a) Interactions with the REC1 domain. (b) Interactions with the REC2 domain. (c) Interactions with the REC3 domain. (d) Interactions with the bridge helix (BH) and sgRNA. (e) Interactions with the tDNA. (f) Interactions with the sgRNA. The HNH domain residues are highlighted. The dash lines denote the salt bridges and/or hydrogen bonds. Due to space limit, only interacting pairs with relatively high occupancy throughout the simulations are shown here, and the complete residue list is present in Table 4. This figure is comparable to FIG. 14 .
  • FIG. 6 . Conformational activation pathway of Cas9 HNH nuclease domain. The HNH domain and flanking liker regions (i.e., L1 and L2) are highlighted. The PAM and the three putative catalytic residues of HNH domain are represented as blue spheres. The dash lines denote the disordered liker loops. All of the solved Cas9 crystal structures in different binding forms assume an inactive state as for both RuvC and HNH nuclease domains. Using single-molecule FRET, Dagdas et al. (Dagdas et al., bioRxiv, 122242, 2017) identified three distinct conformational states of Cas9, designated state “R”, “I” and “D”, respectively. In the inventors study, with the involvement of Mg2+ and absence of ntDNA, they addressed how the HNH domain is “docked” toward the catalytically competent state. However, another fundamental question remains open to be answered that what factors trigger ˜180° rigid-body rotation of HNH domain alongside its two flanking likers during I→P state transition. The inventors propose that there likely exists a functionally important transition state (T1) between I and P states that acts a conformational checkpoint determining the fates (cleaved or not) of bound on- or off-target substrates.
  • FIGS. 7 a-7 c . spCas9-sgRNA complexed with PAM-containing double-stranded DNA (dsDNA) substrate. (a) Overall architecture of the ternary complex of spCas9, sgRNA and dsDNA (PDB code: 5F9R). Cas9 NUC lobe comprises of two nuclease domains (RuvC and HNH), C-terminal domain (CTD) and topoisomerase homology (Topo) domain, and REC lobe is spatially divided into three domains (REC1, REC2 and REC3); the two lobes are connected by an arginine-rich (bridge) helix (BH). The target and non-target DNA strands (tDNA and ntDNA) are colored dark and light green, respectively, with the PAM duplex highlighted in crimson. (b) Close-up view of the HNH domain active center (PDB code: 5F9R). The putative catalytic residues are depicted in a stick model. (c) Structured-guided protein engineering to improve spCas9 specificity (PDB code: 4UN3). Neutralization of the selected basic residues on the HNH domain were shown to reduce spCas9 off-target effects while maintaining off-target activity. The cleavage site on tDNA is denoted with a scissor.
  • FIGS. 8 a-8 f . Molecular Dynamic (MD) simulations. (a-c) Projections of the conventional MD simulations without ntDNA (cMD w/o ntDNA) (a), conventional MD simulations with ntDNA (cMD with ntDNA) (b) and accelerated MD simulations without ntDNA (aMD w/o ntDNA) (c) onto the first two eigen-vectors calculated from the whole trajectories for the HNH domain. (d-f) overlap of the histograms of the first (d), second (e) and third (f) PC projections for the conventional MD simulations without ntDNA (cMD w/o ntDNA, black line), conventional MD simulations with ntDNA (cMD with ntDNA, green line) and accelerated MD simulations without ntDNA (aMD w/o ntDNA, red line) and. See also FIG. 1 .
  • FIGS. 9 a-9 b . (a) FRET labeled residue pairs shown with 5F9R and (b) scatter plot of the distances for the labeled residue pairs calculated from conventional MD simulations without ntDNA (cMD w/o ntDNA, black dots) and with ntDNA (cMD with ntDNA, green dots). Ser355, Ser867 and Asn1054 are located in the REC1, HNH and RuvC domains, respectively. These residues were previously selected to characterize different conformational states of HNH domain in FRET experiments (Dagdas et al., bioRxiv, 122242, 2017).
  • FIG. 10 . Cα RMSD distributions for the HNH and 00a fold calculated from the conventional and accelerated MD simulations relative to the starting crystal structure (PDB code: 5F9R). The average pairwise RMSDs for the HNH domain and 00a motif among the available Cas9 crystal structures in different binding forms is 1.4±0.6 and 1.4±0.7 Å, respectively (Table 2a-2b), which are comparable to the corresponding peak values calculated from the cMD simulations. In contrast, aMD shows significantly larger RMSD values peaking at 4 Å, indicating the enhanced sampling accompanies considerable internal structural change.
  • FIGS. 11 a-11 f . Putative catalytic state of Cas9 HNH domain modeled from T4 Endonuclease VII (Endo VII)/DNA complex (PDB code: 2QNC). (a) T4 Endo VII ββα-metal motif complexed with a DNA substrate. The Ca atoms of the active residues (i.e., Asp40, His840 and Asn62) are rendered as spheres, and the coordinated Mg2+ is depicted as a sphere. (b) Cas9 HNH domain opposite to the target DNA strand (PDB code: 5F9R). The Ca atoms of the putative catalytic residues (i.e., Asp839, His840 and Asp861) are represented as spheres and the HNH domain ββα-metal motif is shown. (c) The scissile phosphate and flanking nucleotides of the DNA substrate in a superimposed onto the corresponding stretch in b alongside the ββα-metal motif (d) Topology-independent structural alignment between Cas9 and Endo VII ββα-metal motifs (PDB codes: 5F9R and 2QNC) using the CLICK algorithm (Nguyen et al., Nucleic Acids Res. 39, W24—W28, 2011). The Cα RMSD of the equivalent residues (shown as spheres) between the two nucleases is 1.2 Å. The catalytic residues appear to be spatially superimposed well. (e) Cas9 HNH domain oriented toward the target DNA strand based on the transformation matrix obtained from d. (f) Direct “docking” of the HNH domain starting from the pre-catalytic state (PDB code: 5F9R) results in a number of steric clashes with other components in the system. The overlapping heavy atoms are shown as van der Walls spheres, using a distance cutoff of 1.4 Å. The pairwise RMSD for the HNH domain backbone is 25 Å here.
  • FIG. 12 . Relative binding strength of the residues on the HNH ββα fold and opposite tDNA with the coordinated Mg2+ computed via MM-GBSA approach. The energetic contribution of each residue is relative to Asp861 being of 100% binding strength. Positive and negative values indicate favorable and unfavorable binding, respectively.
  • FIG. 13 . Structural superposition between the cMDens-derived catalytic state and crystal pre-catalytic state (PDB code: 5F9R). The crystal structure is represented, and the largest domain movement (involving HNH, CTD and REC2) is dented by an arrow. This figure is comparable to FIG. 3 c.
  • FIG. 14 a-14 f . New interactions established between the catalytic HNH domain and other components in the complex system identified from the conventional ensemble MD (cMDens) simulations. (a) Interactions with the REC1 domain. (b) Interactions with the REC2 domain. (c) Interactions with the REC3 domain. (d) Interactions with the bridge helix (BH) and sgRNA. (e) Interactions with the tDNA. (f) Interactions with the sgRNA. The HNH domain residues are highlighted. The dash lines denote the salt bridges and/or hydrogen bonds. Note that all the interacting pairs do not necessarily appear in one single snapshot, and the complete residue list is present in Table 4. This figure is comparable to FIG. 5 .
  • FIG. 15 a-15 b . Illustrates an active state of the Cas9 HNH domain identified by computer modeling and simulations can be responsible for the tDNA cleavage. Site-directed mutagenesis experiments with four single mutations (D837A, D839A, D861A, and N863A) plus one double mutation (D861A/N863A) suggest that D839 and N863 are residues involved in Cas9 activity by directly coordinating the catalytic Mg2+ at the interface between the HNH domain and tDNA, validating the newly identified active state.
  • FIG. 16 a-16 c . The gene-editing activity of two tetramutant variants of Cas9. (a) The expression of different Cas9 variants in HEK293T-EGFP cells. WT: wild-type. Mut1.8: N588A/R765A/D835A/K1246A. Mut1.9: N14A/R447A/R765A/S845D. (b) The representative histograms of flow cytometry analysis of EGFP-positive cells in the HEK293T-EGFP cells expressing the indicated Cas9 variants and EGFP gene-targeting sgRNA. (c). The quantitative data of EGFP-positive cells in each sample. Similar to the wild-type Cas9, the Mut1.8 and Mut1.9 Cas9 variants were highly active in gene editing that led to the loss of EGFP expression in the cells. *P<0.05 (Student's t-test).
  • DETAILED DESCRIPTION OF THE INVENTION
  • The bacterial CRISPR-Cas9 system has been adapted as a powerful and versatile genome-editing toolbox. The system holds immense promise for future therapeutic applications. Despite recent advances in Cas9 structure/function, little is known on the catalytic state of Cas9 HNH nuclease domain and it remains elusive how the divalent metal ions affect the HNH domain conformational transition. A deep understanding of Cas9 activation and cleavage mechanism can enable further optimization of Cas9-based genome-editing specificity and efficiency. Using two distinct molecular dynamics simulation techniques, the inventors obtained a cross-validated catalytically active state of Cas9 HNH domain primed for cutting the target DNA strand. Moreover, the inventors demonstrate at the atomic level the essential roles of the catalytic Mg2+ for the active state formation and stability. Furthermore, the inventors show that the derived catalytic conformation of HNH domain can be exploited for rational engineering of Cas9 variants with enhanced specificity.
  • The Cas9 crystal structures in different binding forms have been solved over the past few years (Jiang et al. Science 351, 867-871, 2016; Jiang et al. Science 348, 1477-1481, 2015; Nishimasu et al. Cell 156, 935-949, 2014; Anders et al. Nature 513, 569-573, 2014; Jinek et al. Science 343, 1247997, 2014), however, none of them assumes a functionally fully active state as for either of its two nuclease domains (FIG. 6 ). In recent work, the inventors reported the catalytically competent state of Cas9 RuvC domain primed for cutting the ntDNA by molecular dynamics (MD) simulations (Zuo and Liu, Sci. Rep. 5, 2016). Using two distinct sampling strategies, i.e., the biased tMD and unbiased cMDens, well-converged catalytic conformations for the HNH domain were obtained, especially in terms of HNH domain orientation (FIG. 1 and FIG. 4 ), Mg2+ coordination geometry (FIG. 2 ) and newly established interactions with HNH domain (FIG. 5 and FIG. 14 ). The success of cMDens here can be ascribed to: (i) enhanced flexibility of HNH domain by removal of ntDNA (FIG. 1 and FIG. 8-9 ); (ii) Mg2+-mediated electrostatic attraction at the binding interface (FIG. 2 and FIG. 4 ); and (iii) favorable charged and polar interactions between HNH domain and other components (FIG. 5 and FIG. 14 ). Apparently, these factors largely lower the energetic barrier between the pre-catalytic and catalytic states, thereby making it possible that a large conformational change of HNH domain (FIG. 3 and FIG. 13 and Table 3) could be accessible within dozens of microseconds (Table 1). The cMDeens-based sampling approach might be applied to other systems provided the conformational transition pathway can be defined.
  • In order to enhance the conformational dynamics of HNH domain, the ntDNA was not included in the inventors simulations. The inventors contemplate that the ntDNA might stabilize the catalytic conformation by interactions with the linker 2 (L2) region flanking C-terminus of the HNH domain (Jiang et al. Science 351, 867-871, 2016; Zuo and Liu, Sci. Rep. 5, 2016; Palermo et al. ACS Cent. Sci., 2016). Noticeably, cleavage assays suggest that a single-stranded tDNA substrate was cleaved two orders of magnitudes slower than a dsDNA substrate, despite comparable binding affinities of both substrates to Cas9-gRNA (Sternberg et al. Nature 507, 62-67, 2014). Concerning the cleavage of tDNA in the duplex context, the inventors reason that the ntDNA accelerates the reaction rates probably by promoting the HNH domain rotation during strand unwinding (FIG. 6 ) and/or by facilitating rapid interrogation and loading of the DNA target via PAM recognition (Sternberg et al. Nature 507, 62-67, 2014). In the presence of ntDNA, the Cas9 catalytic state might adopt a somewhat different conformation from that captured here. Yet the global orientation of HNH domain relative to the REC lobe and tDNA, and the coordination configuration at the binding interface should vary little. The inventors have bridged the missing link of how the HNH domain transitions from the pre-catalytic to catalytic state. However, another fundamental question remains open to be answered that what factors trigger ˜180° rigid-body rotation of L1-HNH-L2 during the previously identified immediate (“I”) to pre-catalytic (“P”) state transition (FIG. 6 ). The inventor contemplate that there likely exists a functionally relevant transition state between the I and P states, which acts a conformational checkpoint determining the fates (cleaved or not) of bound on- or off-target substrates. By introducing a certain number of mismatches, this state might be captured through smFRET or crystallography, or identified with molecular dynamics free energy simulations (Giulia et al. Proc. Natl. Acad. Sci. U.S.A. 2017).
  • The two distinct conformational activation pathways for the HNH domain, implemented respectively by tMD and cMDens, strongly suggest Mg2+ is indispensable for the catalytic state formation and stability. In the absence of Mg2+, it is conceivable that the HNH domain swings repeatedly toward and away from the tDNA but fails to visit an active conformation (FIG. 4 ), as demonstrated by the smFRET experiments (Dagdas et al. bioRxiv, 122242, 2017). If Mg2+ diffuses into the binding interface, the HNH domain readily docks onto and gets stable association with the opposite tDNA, accompanying new interactions formed with other components (especially REC lobe and sgRNA) in the system. Therefore, beyond its catalytic role, Mg2+ also acts as a facilitator and stabilizer of the functional conformational state. Combining with the inventors previous study with Cas9 RuvC domain (Zuo and Liu, Sci. Rep. 5, 2016), more generally, the inventors hold that the roles of Mg2+ revealed here are common in other divalent metal ion dependent nucleases (Yang, Q. Rev. Biophys. 44, 1-93, 2011; Yang et al. Mol. Cell 22, 5-13, 2006). Besides Mg2+, other metal ions like Mn2+, Ca2+ and Co2+ are also able to activate HNH conformation and stabilize its catalytic state (Zuo and Liu, Sci. Rep. 5, 2016; Dagdas et al. bioRxiv, 122242, 2017), which might be explained by the fact that these ions can assume a similar octahedral ordination geometry and a comparable effective radius to that of Mg2+ as observed here (FIG. 2 )(Shannon, Acta crystallographica section A: crystal physics, diffraction, theoretical and general crystallography 32, 751-767, 1976). Intriguingly, Co2+ does not support HNH nuclease activity (Zuo and Liu, Sci. Rep. 5, 2016; Dagdas et al. bioRxiv, 122242, 2017). Hence the catalytic conformation might be crystalized with wild-type Cas9 and Co2+. This strategy could be more effective than using Cas9 nickase mutants and Mg2+, as the active residue substitution inevitably destabilizes the enzyme/substrate complex.
  • The derived catalytic state provides a different perspective on the sources of enhanced Cas9 specificity through alanine mutagenesis. The four basic residues of L1 linker and HNH domain, Lys775, Arg832, Lys862 and Lys848, whose single alanine substitution was shown to reduce Cas9 off-target effects (FIG. 7 c ), were previously supposed to make contacts with the phosphate backbone of ntDNA (Slaymaker et al. Science 351, 84-88, 2016). From simulations, Lys775, Arg832 and Lys862 form ionic/hydrogen-bonding interactions with the negatively charged residues on the REC3 (Glu584 and Asp585), REC2 (Glu223) and REC1 (Glu370 and Glu396) domain, respectively, while another residue Lys848 is simultaneously engaged to the residues on BH (Thr68 and Glu60) and sgRNA backbone (FIG. 5 , FIG. 14 and Table 4). Apparently, these new interactions directly contribute to HNH domain docking onto tDNA, and neutralization of the basic residues could destabilize formation of the active HNH conformation, thereby entailing more stringent Watson-Crick base pair complementarity with sgRNA. This view is in contrast with the hypothesis that the improved specificity exclusively results from diminished interactions with the ntDNA (Slaymaker et al. Science 351, 84-88, 2016). Remarkably, the catalytic model described herein accounts for why the identified Cas9 (K848A/K1003A/R1060) variant [referred to as eSpCas9(1.1) in Slaymaker et al. Science 351, 84-88, 2016] exhibits genome-wide high editing specificity, which is rooted in a combined effect involving simultaneous weakened binding with the two DNA strands, sgRNA and Cas9 BH. Meanwhile, the inventors highlight that it cannot be ruled out that the basic residues of HNH domain change interacting partners (e.g. from ntDNA to tDNA) during different stages of conformational activation, given the striking flexibility of HNH domain (see FIG. 6 ). Moreover, our model could also explain the decrease in specificity upon converse Ser845Lys replacement (Slaymaker et al. Science 351, 84-88, 2016), which arises from strengthened interaction of HNH domain with the tDNA backbone at a position only 1-bp from the cleavage site (FIG. 5 e and FIG. 14 e ).
  • In the framework of the “excess energy” hypothesis proposed for Cas9-sgRNA (Slaymaker et al. Science 351, 84-88, 2016; Kleinstiver et al. Nature 529, 490-495, 2016), likewise, the new structural information here can be exploited to rationally design more Cas9 variants with improved specificity. After careful inspection of the locations of the identified residues and their interactions within the whole complex, the inventors suggest more than a dozen sites to be mutated (See Table 4). Further integration with previously screened candidate sites, it is believed that different versions of high-fidelity Cas9 mutants could be customized specially for minimizing the off-target effects occurring at the PAM proximal or distal ends, or even at the non-standard repetitive sites. It would make more sense, as there is no one versatile Cas9 nuclease capable of eliminating all sorts of off-target cleavage.
  • In summary, a cross-validated catalytically active model of Cas9 HNH nuclease domain poised for cutting the tDNA was discovered and demonstrate the essential roles of divalent metal ions in facilitating and stabilizing the active conformation formation. More importantly, the derived catalytic state provides novel structure information for Cas9 specificity enhancement. Further studies on more different conformational states as well as the binding and cleavage mechanism of Cas9 would contribute to additional refinement of the CRISPR-Cas9 genome-editing toolbox.
  • Activities of modified Cas9 polypetpides can be assessed in a bacterial cell-based system with survival percentages between 50-100% usually indicating robust cleavage, whereas 0% survival indicated that the enzyme had been functionally compromised.
  • To further determine whether the Cas9 variants described herein function efficiently in human cells, modified proteins can be tested using a human cell-based EGFP-disruption assays. In this assay, successful cleavage of a target site in the coding sequence of a single integrated, constitutively expressed EGFP gene leds to the induction of mutations and disruption of EGFP activity, which can be quantitatively assessed by flow cytometry (see, for example, Reyon et al., Nat Biotechnol. 30(5):460-5, 2012).
  • All of the variants described herein can be incorporated into existing vectors
  • Substitutional variants typically contain the exchange of one amino acid for another at one or more sites within the protein, and may be designed to modulate one or more properties of the polypeptide, with or without the loss of other functions or properties. Substitutions may be conservative, that is, one amino acid is replaced with one of similar shape and charge. Conservative substitutions are well known in the art and include, for example, the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to arginine; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or phenylalanine; and valine to isoleucine or leucine. Alternatively, substitutions may be non-conservative such that a function or activity of the polypeptide is affected. Non-conservative changes typically involve substituting a residue with one that is chemically dissimilar, such as a polar or charged amino acid for a nonpolar or uncharged amino acid, and vice versa.
  • Proteins may be recombinant, or synthesized in vitro. Alternatively, a non-recombinant or recombinant protein may be isolated from bacteria or other host cell expression system.
  • The term “functionally equivalent codon” is used herein to refer to codons that encode the same amino acid, such as the six codons for arginine or serine, and also refers to codons that encode biologically equivalent amino acids. Codons include: Alanine (Ala, A) GCA, GCC, GCG, and GCU; Cysteine (Cys, C) UGC and UGU; Aspartic acid (Asp, D) GAC and GAU; Glutamic acid (Glu, E) GAA and GAG; Phenylalanine (Phe, F) UUC and UUU; Glycine (Gly, G) GGA, GGC, GGG, and GGU; Histidine (His, H) CAC and CAU; Isoleucine (Ile, I) AUA, AUC, and AUU; Lysine (Lys, K) AAA and AAG; Leucine (Leu, L) UUA, UUG, CUA, CUC, CUG, and CUU; Methionine (Met, M) AUG; Asparagine (Asn, N) AAC and AAU; Proline (Pro, P) CCA, CCC, CCG, and CCU; Glutamine (Gln, Q) CAA and CAG; Arginine (Arg, R) AGA, AGG, CGA, CGC, CGG, and CGU; Serine (Ser, S) AGC, AGU, UCA, UCC, UCG, and UCU; Threonine (Thr, T) ACA, ACC, ACG, and ACU; Valine (Val, V) GUA, GUC, GUG, and GUU; Tryptophan (Trp, W) UGG; and Tyrosine (Tyr, Y) UAC and UAU.
  • It also will be understood that amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids, or 5′ or 3′ sequences, respectively, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned. The addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either of the 5′ or 3′ portions of the coding region.
  • The following is a discussion based upon changing of the amino acids of a protein to create an equivalent, or even an improved, second-generation molecule. For example, certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as binding sites on substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be made in a protein sequence, and in its underlying DNA coding sequence, and nevertheless produce a protein with like properties.
  • In making such changes, the hydropathic index of amino acids may be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte and Doolittle, 1982). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, and the like.
  • It also is understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still produce a biologically equivalent protein.
  • As outlined above, amino acid substitutions generally are based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like. Examples of substitutions that take into consideration the various foregoing characteristics are well known and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
  • Embodiments involve polypeptides, peptides, proteins and fragments thereof for use in various aspects described herein. In specific embodiments, all or part of proteins described herein can also be synthesized in solution or on a solid support in accordance with conventional techniques. Various automatic synthesizers are commercially available and can be used in accordance with known protocols. Alternatively, recombinant DNA technology may be employed wherein a nucleotide sequence that encodes a peptide or polypeptide is inserted into an expression vector, transformed or transfected into an appropriate host cell and cultivated under conditions suitable for expression.
  • One embodiment includes the use of gene transfer to cells, including microorganisms, for the production and/or presentation of proteins. The gene for the protein of interest may be transferred into appropriate host cells followed by culture of cells under the appropriate conditions.
  • Also included are fusion proteins. Embodiments can include individual fusion proteins as a fusion protein with heterologous sequences such as a provider of purification tags, for example: β-galactosidase, glutathione-S-transferase, green fluorescent proteins (GFP), epitope tags such as FLAG, myc tag, or polyhistidine.
  • For purposes of the present invention, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
  • Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. As used herein an amino acid designated as “X” refers to any amino acid residue. However, when in the context of an amino acid substitution it is to be understood that “X” followed by a number refers to an amino acid residue at a particular location in a reference sequence.
  • As used herein, an amino acid residue of an amino acid sequence of interest that “corresponds to” or is “corresponding to” or in “correspondence with” an amino acid residue of a reference amino acid sequence indicates that the amino acid residue of the sequence of interest is at a location homologous or equivalent to an enumerated residue in the reference amino acid sequence. One skilled in the art can determine whether a particular amino acid residue position in a polypeptide corresponds to that of a homologous reference sequence. For example, the sequence of a modified or related Cas9 protein can be aligned with that of a reference sequence (e.g., SEQ ID NO: 1 using known techniques (e.g., basic local alignment search tool (BLAST), ClustalW2, Structure based sequences alignment program (STRAP), or the like). In addition, crystal structure coordinates of a reference sequence may be used as an aid in determining a homologous polypeptide residue's three dimensional structure. Using such methods, the amino acid residues of a polypeptide can be numbered according to the corresponding amino acid residue position numbering of the reference sequence. For example, the amino acid sequence of SEQ ID NO: 1 may be used for determining amino acid residue position numbering of each amino acid residue of a variant of interest.
  • The term “identical” in the context of two nucleic acids or polypeptide sequences refers to the residues in the two sequences that are the same when aligned for maximum correspondence, as measured using one of the following sequence comparison or analysis algorithms.
  • The percent sequence identity between a reference sequence and a test sequence of interest may be readily determined by one skilled in the art. The percent identity shared by polynucleotide or polypeptide sequences is determined by direct comparison of the sequence information between the molecules by aligning the sequences and determining the identity by methods known in the art. An example of an algorithm that is suitable for determining sequence similarity is the BLAST algorithm, (see Altschul, et al., J. Mol. Biol., 215:403-410 [1990]). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. These initial neighborhood word hits act as starting points to find longer HSPs containing them. The word hits are expanded in both directions along each of the two sequences being compared for as far as the cumulative alignment score can be increased. Extension of the word hits is stopped when: the cumulative alignment score falls off by the quantity X from a maximum achieved value; the cumulative score goes to zero or below; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLAST program uses as defaults a wordlength (W) of 11, the BLOSUM62 scoring matrix (see Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 [1992]) alignments (B) of 50, expectation (E) of 10, M′5, N′-4, and a comparison of both strands.
  • The BLAST algorithm then performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, supra). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • Percent “identical” or “identity” in the context of two or more nucleic acid or polypeptide sequences refers to two or more sequences that are the same or have a specified percentage of nucleic acid residues or amino acid residues, respectively, that are the same, when compared and aligned for maximum similarity, as determined using a sequence comparison algorithm or by visual inspection. “Percent sequence identity” or “% identity” or “% sequence identity or “% amino acid sequence identity” of a subject amino acid sequence to a reference amino acid sequence means that the subject amino acid sequence is identical (i.e., on an amino acid-by-amino acid basis) by a specified percentage to the reference amino acid sequence over a comparison length when the sequences are optimally aligned. Thus, 80% amino acid sequence identity or 80% identity with respect to two amino acid sequences means that 80% of the amino acid residues in two optimally aligned amino acid sequences are identical.
  • EXAMPLES
  • The following examples as well as the figures are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples or figures represent techniques discovered by the inventors to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
  • Example 1 A. Results
  • HNH Domain Samples Larger Conformational Space in the Absence of ntDNA. To obtain HNH domain active state from inactive state structure using molecular dynamics simulations, the biggest challenge is to sample enough conformational space in a reasonably short time-scale. From initial MD simulations and structural observation (Jiang et al. Science 351, 867-871, 2016; Zuo and Liu, Sci. Rep. 5, 2016), the inventors contemplated that the ntDNA might impose spatial constraints on the conformational dynamics of HNH domain in the pre-catalytic state (FIG. 7 a ). In other words, the HNH domain could exhibit enhanced flexibility in the absence of ntDNA, thereby increasing the probability to reach or get closer to the catalytic state. To confirm, the inventors performed three groups of long time-scale conventional MD (cMD) simulations starting from the pre-catalytic structure (Jiang et al. Science 351, 867-871, 2016), in which the ntDNA was removed (G1 and G2, Table 1) or retained (G9, Table 1). Meanwhile, the accelerated MD (aMD) method (Hamelberg et al. J. Chem. Phys. 120, 11919-11929, 2004; Pierce et al. J. Chem. Theory Comput. 8, 2997-3002, 2012) was implemented to enhance the sampling of the system without ntDNA at two different boost levels (G3 and G4, Table 1). The effective sampling time adds up to 14.3 s, including 11 s of cMD and 3.3 s of aMD.
  • To compare the conformational spaces sampled with the two different systems and the two different simulation approaches, the inventors first performed the principal component analysis (PCA) to determine the dominant motions of the HNH domain. PCA is a multivariate statistical technique applied to systematically reduce the number of dimensions needed to describe protein essential dynamics (David and Jacobs, Methods Mol. Biol. 1084, 193-226, 2014; Amadei et al., Proteins 17, 412-425, 1993). The first three PCA modes, accounting for 70% (37%+23%+10%) of the overall motion, revealed a rotational motion along an axis perpendicular to the central channel between the two Cas9 lobes (FIG. 1 a ), and translational movements toward the tDNA (FIG. 1 b ) and the REC2 domain (FIG. 1 c ), respectively. Apparently, a combination of these dominant motions towards the REC lobe and tDNA would lead the HNH domain toward the cleavage site on the tDNA. Subsequently, the inventors projected individual sets of simulation trajectories onto the subspace defined by these three PCA vectors (FIG. 8 a-8 c ). As contemplated, the accessible conformational space of the HNH domain in the ntDNA-bound system was approximately a subset of that in the ntDNA-free system (FIG. 1 d and FIG. 8 ). Moreover, the distances of Ser867 (on HNH domain) to Ser355 (on REC1 domain) and to Asn1054 (on RuvC domain) that were selected for labeling in previous smFRET experiments were calculated (Dagdas et al., bioRxiv, 122242, 2017), and obtained similar results to the PCA (FIG. 9 ). In the inventors' dsDNA-bound model, the ntDNA 5′-end cleavage product was not included, and thus the sampling space is likely to be further confined in the context of full-length ntDNA due to interactions between the 5′-end stretch and HNH domain (Jiang et al., Science 351, 867-871, 2016; Zuo and Liu, Sci. Rep. 5, 2016; Palermo et al., ACS Cent. Sci., 2016).
  • Compared to cMD, aMD explored much broader conformational space, especially along the first PC (FIG. 1 d and FIG. 8 ) that depicts a rotation motion of the HNH domain (FIG. 1 a ). However, the third motional mode is more prominent in cMD than in aMD (FIG. 8 f ), suggesting the HNH domain displaying a larger-scale translation toward the REC lobe in cMD (FIG. 1 c ).
  • To this end, the inventors demonstrated that HNH Domain samples larger conformational space in the absence of ntDNA and cMD is more appropriate in searching for HNH domain active state as aMD brings appreciable internal structural distortion (FIG. 10 and Table 2). As the microsecond time-scale samplings with cMD and aMD were unable to obtain an HNH conformation in sufficiently close proximity to the cleavage site on tDNA for catalysis, in the following sections, two different strategies to capture the converged catalytically active state of HNH domain are presented.
  • Targeted-MD Revealed the Catalytically Active State of HNH domain. One of the strategies used is the targeted MD (tMD) simulation (Schlitter et al., J. Mol. Graphics 12, 84-89, 1994; Schlitter et al., Mol. Simul. 10, 291-308, 1993). This approach can enable conformational transition between two known states by application of external forces. First, homologous T4 Endonuclease VII (Endo VII) complex with a DNA Holliday junction (Biertumpfel et al., Nature 449, 616-U614, 2007) were selected as the template to build the target conformation of HNH domain, which is the putative “active” conformation model (FIG. 11 ). Instead of a single static target, multiple targets were built based on each snapshot structure from the above sets of long cMD simulations. A snapshot structure was selected with a minimum root-mean-square deviation (RMSD) (˜10 Å) from its own “target” as the starting point of the tMD. With a small force constant and a low RMSD decreasing rate, tMD simulations were carried out and observed the expected conformational transition of HNH domain, largely due to its intrinsic global flexibility as well as internal structural rigidity (FIG. 10 and Table 2). In the framework of one-metal-ion mechanism (FIG. 2 d )(Yang, Q. Rev. Biophys. 44, 1-93, 2011; Yang, Nat. Struct. Mol. Biol. 15, 1228-1231, 2008), one Mg2+ was then introduced at the reaction interface between the HNH domain and tDNA. After performing thorough post tMD simulations using conventional MD (G6, Table 1), a reasonable catalytically active conformation was obtained.
  • The Mg2+ at the catalytic center formed a favorable octahedral coordination with six surrounding oxygen atoms from different species (FIG. 2 a ). In addition to the three water molecules, the residues Asp839 and Asp861 on the ββα motif and the scissile phosphate (pro-Sp oxygen involved) between the nucleotides +3 and +4 of tDNA each contributes a coordination ligand (FIG. 2 a ). The above observation is consistent with the per-residue energy decomposition data by MM-GBSA approach (FIG. 12 ), confirming the role of the residues in stabilizing Mg2+ In contrast, His840 contributes marginally to Mg2+ binding, which is in line with its major role as the general base activating the nucleophile. Notably, the His840 side chain hydrogen-bonded to a potential nucleophilic water molecule that is aligned for in-line attack on the scissile bond. Specially, Tyr823 and Arg864 appeared to play a structural role in stabilizing the catalytic Asp839 side chain by hydrogen-bonding. Such interactions were presumed to aid proper orientation of Asp839 for coordination and catalysis. Indeed, the amino acid Tyrosine is strictly conserved among different types of CRISPR-Cas9 by primary sequence analysis, while the basic amino acid Arginine (or Lysine) is highly conserved among the Type II-A Cas9 orthologs (Jinek et al., Science 343, 1247997, 2014).
  • Overall, the three active resides Asp839, His840 and Asp861, and the other two residues, Tyr823 and Arg864 (FIG. 2 a ), are spatially and functionally analogous to the corresponding residues, Asp40, His41, Asn62, Tyr94 (on the other subunit) and Arg54, in the T4 Endo VII (FIG. 2 c )(Biertumpfel et al., Nature 449, 616-U614, 2007). Despite the similarities, the Mg2+ here was not positioned so proximal to the leaving group 3′-O as in the Endo VII system (Biertumpfel et al., Nature 449, 616-U614, 2007), which was also observed at the reaction interface between the Cas9 RuvC domain and ntDNA in a prior inventor study with the same force fields (Zuo and Liu, Sci. Rep. 5, 2016). Apart from the potential issue with Mg2+ parameters, this deviation might be partly related to the subtle differences between the two enzymes beyond the coordination center. In Endo VII, for instance, there exists an additional acidic residue (Glu65) hydrogen-bonded to a coordinating water molecule above the bound Mg2+ (FIG. 3 c ). In summary, the coordination composition and geometry captured here closely match those present in the T4 Endo VII/DNA complex, indicating the formation of catalytically active state of Cas9 HNH domain and consistent with previously identified tDNA cleavage site being of 3 nucleotides from the PAM (Jinek et al., Science 337, 816-821, 2012; Gasiunas et al., Proc. Natl. Acad. Sci. U.S.A. 109, E2579-2586, 2012).
  • Conventional Ensemble MD Simulations Revealed the Same Catalytic State as tMD Derived. The above tMD-based strategy to capture the catalytic state in essence is based on a modeled putative “target” state. Although the building process was treated with special considerations, the potential artificial effects underlying the tMD-derived catalytic model cannot be definitely ruled out. Therefore, the inventors performed a series of conventional MD ensemble simulations (cMDens) starting from the original pre-catalytic crystal structure (PDB code: 5F9R) to check if the same catalytic state could be reached using the unbiased MD approach. The inventors developed a method called “Step-by-step MD”. The basic idea behind this method is to extract the structure that mostly resembles the active state from a set of MD simulations as the new starting point for a new set of the simulations. Step by step, one can efficiently sample the desired conformational space without any artificial forces. As the actual catalytic state is not known, it is challenging to choose the structure that mostly resembles the catalytic state. Here, the inventors used the geometric mean of the distances of +4P (the scissile phosphate) to two catalytic residues His840 and Asp861 (FIG. 4 a ) as a metric to monitor the conformational transition of HNH domain. Apparently the smaller this value is, the closer the conformation is to the target active conformation (FIG. 4 b ). From the sets of long cMD trajectories (G1 and G2, Table 1), a structure bearing a minimum value of ˜9 Å was extracted as the starting point for the ensemble simulations (FIG. 3 a ), where one Mg2+ was located at the reaction center. In each cycle, the ensemble simulations were seeded from a structure snapshot from previous cycle bearing a lowest value of the above geometric mean (FIG. 3 a ), which is the core of the sampling approach here.
  • Through four cycles (G8.1-G8.4, Table 1), the above geometric mean stabilized at ˜6 Å (FIG. 3 a ), which is comparable to that observed for the tMD-derived catalytic state (FIG. 4 b ). Accordingly, the RMSD of the reaction interface from the tMD-derived catalytic state declined from initial ˜3 Å to ˜1 Å (FIG. 3 b ). Moreover, the Mg2+-involved coordination composition and configuration here (FIG. 2 b ) are essentially the same to those derived from tMD (FIG. 2 a ), except that Tyr823 was engaged to Asp839 via an intercalated water molecule, again confirming the structural role of Tyr823 around the reaction center. These observations therefore demonstrated formation of the cMDeens-derived catalytic state.
  • With the active state formation, the Cas9 protein underwent prominent conformational changes, as observed from either of the post tMD and cMDens simulations. The overall Cα RMSD from the initial crystal structure is near to 6 Å, in which the HNH domain displayed a largest RMSD of ˜11 Å as expected, followed by the CTD and REC2 domains with a RMSD around 7-8 Å (Table 3). In the absence of ntDNA, the CTD domain moved outward markedly, resulting in wide opening of the side channel within the NUC lobe poised for substrate loading (FIG. 3 c and FIG. 13 ). When RMS fitting to themselves, the HNH and REC2 domains exhibited a much smaller RMSD of less than 2 Å (Table 3), indicating concerted motion of REC2 domain with the HNH domain. By contrast, the RMSD of CTD domain was down by a relatively small range of 3.5 Å, suggesting considerable variation in its internal conformation in addition to the above large-scale reorientation. Taken together, the results reveal the highly mobile nature of individual Cas9 domains, consistent with previous experimental and computational studies (Jiang et al., Science 351, 867-871, 2016; Jiang et al., Science 348, 1477-1481, 2015; Nishimasu et al., Cell 156, 935-949, 2014; Anders et al., Nature 513, 569-573, 2014; Jinek et al. Science 343, 1247997, 2014; Palermo et al., ACS Cent. Sci., 2016).
  • Overall, the two different derived catalytic conformations were well superimposable (FIG. 3 d ). The global RMSD between them is around 2.6 Å (Table 3), partly contributed by the flexible CTD domain and relative domain movements. In line with these results, the HNH domain assumed a similar orientation and conformational state between the two catalytic states, as characterized by the principal component analysis (FIG. 1 d ) and the distance pair between the FRET-labeled residues (FIG. 4 c ). Furthermore, the vast majority of newly formed interactions with the HNH domain are common between the two catalytic conformations (FIG. 5 , FIG. 14 and Table 4) as mentioned below. In aggregate, all these data suggest good convergence of the tMD- and cMDens-derived catalytic models.
  • Mg2+ is Indispensable for Activation of the Catalytic State. The inventors' previous work with Cas9 RuvC domain revealed that Mg2+ is able to induce the formation of the active state for cleaving the ntDNA (Zuo and Liu, Sci. Rep. 5, 2016). Likewise, beyond its catalytic role, Mg2+ could also facilitate conformational activation of the HNH domain. To confirm, the inventors removed the coordinated Mg2+ from the above catalytic conformation (FIG. 2 a ) and performed microsecond-level conventional MD simulations (G7, Table 2). In the absence of Mg2+, two distinct consequences on the HNH domain are envisioned, i.e., either departing from the tDNA or staying docked at the tDNA without noticeable reorganization.
  • The inventors first monitored the changes in the distance pair of +4P to His840 (d+4P−H840) and to Asp861 (d+4P−D861) at the cleavage interface (FIG. 4 a ). Their geometric mean increased from 6.0 Å in the catalytic state simulations to 10.5 Å on average, indicating detachment of the HNH domain from the tDNA. Further comparison with the cMD simulations starting from the pre-catalytic state clearly showed that absence of Mg2+ leads the HNH domain to a transition state between the catalytic and pre-catalytic state (FIG. 4 b ). The inventors contemplated that with a longer sampling time, the HNH domain would ultimately reach the pre-catalytic state as observed in the crystal structure (Jiang et al., Science 351, 867-871, 2016). A similar trend was also observed with the FRET residue pairs (Ser867/Asn1054 and Ser355/Ser867)(FIG. 4 c ), yet the states are relatively less distinguishable than with the reaction interfacial residues (FIG. 4 b ), probably due to a longer time needed for remote conformational relaxation. Consequently, the binding free energy of Cas9 to tDNA reduced by ˜30 kcal/mol compared to the catalytic state (not including the entropic contribution). More specially, the non-bonded interaction energy of the HNH ββα motif with the scissile phosphate and flanking nucleotides decreased by ˜64 kcal/mol. Given the stable Mg2+-mediated catalytic conformation, the inventors argue that the HNH domain are least likely separated from its opposite cleavage site unless the reaction is over. Taken together, these results provide evidence that Mg2+ is essential for the formation and stability of Cas9 HNH domain active state, as observed for the RuvC domain (Zuo and Liu, Sci. Rep. 5, 2016). The findings here are in good accordance with the most recent smFRET experiments (Dagdas et al., bioRxiv, 122242, 2017).
  • The Catalytic State Provides New Structural Information for Specificity Enhancement. Accompanying the active state formation, remarkably, the HNH domain established a plenty of new interactions with the REC lobe (including REC1, REC2 and REC3), bridge helix (BH), tDNA and sgRNA, predominantly involving the charged and polar residues (FIG. 5 , FIG. 14 and Table 4). In detail, the two basic residues of HNH ββα motif, Lys862 and Lys866, formed alternative ionic interactions with the three acidic residues Glu370, Glu371 and Glu396 on REC1, respectively (FIG. 5 a and FIG. 14 a ). Meanwhile, Lys775, Arg778 and Glu779 (on HNH flanking linker 1, L1) competed for binding to Glu584, Asp585, Arg586 and Lys558 of REC3, respectively (FIG. 5 c and FIG. 14 c ). The HNH loop immediately preceding the ββα motif made numerous side chain and backbone hydrogen bonds with REC2, such as Asn831 with Thr249/Asn251, and Ser834 with Gly247/Thr249 (FIG. 5 b and FIG. 14 b ). Interestingly, Asp835 alone hydrogen-bonded to one helical turn of Ser217, Lys218 and Ser219 on REC2. Additionally, Arg832 and/or Arg859 (on ββα motif) formed charged interactions with the REC2 Glu223 (FIG. 5 b and FIG. 14 b ). Lying on the long loop between the two R elements of ββα motif, Gln844 and Lys848 were engaged to Glu60 on BH and Thr58 (on the loop linking BH and RuvC) via hydrogen-bond and ionic interactions, respectively (FIG. 5 d and FIG. 14 d ). Another adjacent residue Ser845 was implicated in hydrogen-bonding to the +3P of tDNA, a position only 1-nt from the cleavable site (FIG. 5 e and FIG. 14 e ). Also, the HNH domain formed a number of polar contacts with the backbone of sgRNA (primarily at its middle guide segment). Located on the N-terminal ββα motif flanking helices, the residue pair of Asn803 and Gln807, and the triplet of Arg780, Arg783 and Tyr812 firmly caught the two nucleotides 8 and 9 of sgRNA (numbered 1 from the most PAM-distal end), respectively, through hydrogen-bonds and/or salt bridges (FIG. 5 f and FIG. 14 f ). Meanwhile, the two basic residues, Lys848 and Arg895 (on the last C-terminal ββα motif flanking helix) participated in ionic interactions with the trinucleotide stretch from sites 11 to 13 (FIG. 5 d and FIG. 14 d ). Along with Mg2+, the identified Cas9 residues above definitely play a crucial role in locking the HNH domain onto the scissile phosphate on tDNA.
  • The structural information derived here can be exploited to minimize the off-target effects of CRISPR-Cas9. Guided by the “excess energy” hypothesis that Cas9-sgRNA is more energetic than needed for its optimal on-target recognition and cleavage, two recent works (Slaymaker et al., Science 351, 84-88, 2016; Kleinstiver et al., Nature 529, 490-495, 2016) reported several versions of high-fidelity Cas9 variants bearing multiple alanine substitutions, which were engineered based solely on an inactive DNA-bound crystal structure available at that time. The inventors noticed that there are four basic residues on the HNH domain (viz. Lys775, Arg832, Lys848 and Lys862) identified here that have been experimentally touched (FIG. 7 c )(Slaymaker et al., Science 351, 84-88, 2016). Neutralization of these residues were demonstrated to improve Cas9 specificity in varying degrees, in which the single K848A mutant performed best exhibiting remarkably reduced off-target cleavage at all tested sites while maintaining on-target efficiency (Slaymaker et al., Science 351, 84-88, 2016). From the catalytic Cas9 structure, K848A conversion could destabilize the activated conformation of HNH domain due to disruption of favorable interactions with the BH and sgRNA (FIG. 5 d and FIG. 14 d ), thereby requiring more stringent canonical basing paring between the guide RNA and tDNA. With the new structural information, likewise, more Cas9 nucleases with enhanced specificity can be rationally designed by trying different single and combined mutations.
  • B. Methods
  • System Setup. The initial configurations of the two Cas9 complex systems, viz. Cas9-sgRNA-dsDNA (with tDNA) and Cas9-sgRNA-tDNA (without ntDNA) were derived from the recently solved crystal structure at 3.4 Å resolution (PDB accession code: 5F9R (Jiang et al., Science 351, 867-871, 2016)). The ntDNA-free system was built by removing the entire non-target DNA strand from the intact structure, while for the dsDNA-bound system, the ntDNA 5′-end cleavage product was excluded based on previous study (Zuo and Liu, Sci. Rep. 5, 2016). Following the two-metal-ion and one-metal-ion mechanisms proposed for Cas9 (Jiang et al., Science 351, 867-871, 2016; Nishimasu et al., Cell 156, 935-949, 2014; Jinek et al., Science 343, 1247997, 2014), two Mg2+ were placed around the RuvC active center with partial ntDNA or without ntDNA, and if applicable, one Mg2+ was introduced at the HNH active center (Table 1), as previously described (Zuo and Liu, Sci. Rep. 5, 2016). The missing heavy atoms and hydrogen atoms were added using leap program within AmberTool16 (Salomon-Ferrer et al., Wiley Interdiscip. Rev. Comput. Mol. Sci. 3, 198-210, 2013) and the protonation states of protein titratable residues were assigned through the on-line tool H++ at a physiological pH of 7.5 (Gordon et al., Nucleic Acids Res. 33, W368-371, 2005), followed by visual check. Each system above was then immersed in a cubic water box with a thickness of 13.5 Å, leading to a simulation cell of approximately 139×124×187 Å3. To mimic the reaction buffer (Jinek et al., Science 337, 816-821, 2012; Jinek et al., Science 343, 1247997, 2014; Sternberg et al., Nature 507, 62-67, 2014; Sternberg et al., Nature 527, 110-113, 2015), extra 7 or 8 Mg2+ were added into the water box to yield a concentration of 5 mM, and the ionic strength of KCl was set to 100 mM. The total atoms of Cas9-sgRNA-dsDNA and Cas9-sgRNA-tDNA solution systems add up to ˜283,500 and ˜281,800, respectively.
  • TABLE 1
    Summary of MD simulations for Cas9 complex systems without
    non-target DNA strand (w/o ntDNA) and with ntDNA
    Simulation Production time No. of Mg2+ present at
    Group method* Starting structure per run [ns] runs HNH domain?
    w/o G1 cMD Crystal structure 2500 2
    ntDNA G2 (PDB code: 5F9R) 1000 1
    G3 aMDEd Extracted from G1 650 2
    G4 aMDdual 1000 2
    G5 tMD Extracted from G1/G2 100 2
    G6 cMD Extracted from G5 800 2
    G7 cMD Extracted from G6 800 2
    G8.1 cMDens Extracted from G1/G2 500 10
    G8.2 Extracted from G8.1 10
    G8.3 Extracted from G8.2 10
    G8.4 Extracted from G8.3 10
    with G9 cMD Crystal structure 1000 2
    ntDNA (PDB code: 5F9R) 1500 2
    *cMD, conventional unbiased MD; aMDEd, accelerated MD with dihedral boot only; aMDdual, accelerated; MD with simultaneous dihedral and total potential boost, tMD, targeted MD; cMDens, ensemble cMD.
  • Conventional Molecular Dynamics Simulations. All kinds of simulations were performed by the GPU version of AMBER16 pmemd engine (pmemed.cuda)(Salomon-Ferrer et al., Wiley Interdiscip. Rev. Comput. Mol. Sci. 3, 198-210, 2013) except the targeted MD simulations that were realized with NAMD2.10 (Phillips et al., J. Comput. Chem. 26, 1781-1802, 2005)(as described below). The amber force fields ff14SBonlysc, ff99bsc0 and ff99bsc0_chiOL3 were used to describe paired interactions involving protein, DNA and RNA, respectively. The TIP3P model (Jorgensen et al., J. Chem. Phys. 79, 926-35, 1983) was selected for water and the recently developed ion parameter sets optimized in TIP3P water were employed for the mono- and divalent ions (Li et al., J. Chem. Theory Comput. 11, 1645-57, 2015; Li et al., J. Chem. Theory Comput. 9, 2733-48, 2013). It should be mentioned that none of the available non-bonded models for metal ions, especially the multivalent ions, is able to reproduce various experimental properties simultaneously (Panteva et al., J. Comput. Chem. 36, 970-82, 2015); the Mg2+ parameter set here, as previously used for the same enzyme (Zuo and Liu, Sci. Rep. 5, 2016), represent the best possible compromise targeting the experimental coordination number, Mg2+-O distance and hydration free energy (Li et al., J. Chem. Theory Comput. 9, 2733-48, 2013). The short-range non-boned interaction were truncated at 10 Å, and the long-range electrostatics were treated via the particle mesh Eward summation (PME) method (Darden et al., J. Chem. Phys. 98, 10089-92, 1993) using a grid spacing of 1 Å. The bonds involving hydrogens were constrained through the SHAKE algorithm (Miyamoto and Kollman, J. Comput. Chem. 13, 952-62, 1992). Each system was subjected to a thorough energy minimization with the solute heavy atoms constrained, then followed by slow heating from 0 K to the target 310.15 K and 10-ns equilibration in the isothermal-isochoric (NVT) ensemble in which the backbone atoms were restrained. Finally, the production simulations (i.e. G1, G2 and G9 in Table 1) without any restraints were conducted under the isothermal-isobaric (NpT) condition and each independent run was extended to at least 1000 ns. The temperature was maintained at 310.15 K through the Langevin thermostat and the pressure was controlled at 1.013 bar via the Monte Carlo barostat. The integration time step was set to 1 fs during minimization and equilibration, and 2 fs in the production stage. The trajectory snapshots were saved at 10-ps intervals for analysis.
  • Accelerated Molecular Dynamics (aMD). aMD is an enhanced sampling technique by adding a non-negative potential [ΔV(r)] to the original potential energy surface [V(r)] when it falls below a threshold energy (E), as
  • Δ V ( r ) = { 0 V ( r ) E ( E - V ( r ) ) 2 α + ( E - V ( r ) ) V ( r ) < E ( 1 )
  • where the acceleration factor α modulates the depth and local roughness of the energy basins in the modified potential (Hamelberg et al., J. Chem. Phys. 120, 11919-29, 2004; Pierce et al., J. Chem. Theory Comput. 8, 2997-3002, 2012). Apparently, this simple formalism has several practical advantages: only two parameters (E, α) need to be specified and an a prior reaction coordinate is not required to be defined. Here, two acceleration levels were applied to the Cas9-sgRNA-ntDNA system, i.e., boosting only the dihedral energy terms (dihedral aMD) and boosting the whole potential with an extra boost to the dihedrals (dual aMD) (G3 and G4, Table 1). Following previous works (Pierce et al., J. Chem. Theory Comput. 8, 2997-3002, 2012; de Oliveira et al., PLoS Comput. Biol. 7, e1002178, 2011), the boosting parameters for each aMD run were estimated from the corresponding 60-ns conventional MD simulations carried out in the NVT ensemble. The aMD simulations were started from the last snapshots of the above short cMD simulations and were performed also in NVT ensemble, lasting 650 ns and 1000 ns for the dihedral and dual modes, respectively (G3 and G4, Table 1). In preliminary tests, the new variant GaMD (Gaussian accelerated MD) were run (Miao et al., J. Chem. Theory Comput. 11, 3584-3595, 2015) that allows for improved reweighting. In results, appreciable loss of protein secondary structures were found, thereby not applying this approach herein.
  • Targeted Molecular Dynamics (tMD). tMD induces conformational transition between two known states by means of steering forces (Schlitter et al., J. Mol. Graphics 12, 84-89, 1994; Schlitter et al., Mol. Simul. 10, 291-308, 1993). At each time step, the root-mean-square deviation (RMSD) between the current coordinates and the target structure is calculated. The force exerted on each atom is given by the gradient of the potential,
  • U t M D = 1 2 k N [ RMSD ( t ) - RMSD * ( t ) ] 2 ( 2 )
  • where the spring constant k is scaled down by the number N of targeted atoms, RMSD(t) is the instantaneous best-fit RMSD of the current coordinates from the target conformation, and RMSD*(t) evolves linearly from the initial RMSD at the first tMD step to the final value at the last step. The two start structures for tMD were extracted from the replicated long cMID simulations (Table 1), based on the HNH domain closeness to the putative catalytic state modeled from the crystal structure of T4 endonuclease VII (Endo VII) complexed with a DNA Holliday junction (See below and FIG. 11 )(Biertumpfel et al., Nature 449, 616-U614, 2007). The guiding forces were imposed only on the backbone atoms of HNH domain. The initial RMSDs of the biased atoms from the target states are around 10 Å, which are significantly lowered compared with that of 25 Å calculated directly from the pre-catalytic state structure. With the TclForces functionality in NAMD, an in-house TCL (Tool Command Language) script was used to implement the mass-weighted partial tMD simulations. During tMD, the Ca atoms of the protein residues (excluding HNH domain) exhibiting low fluctuations was weakly restrained with a force constant of 0.1 kcal/mol/A2 to prevent solute drift. Based on previous experience (Zuo et al., J. Phys. Chem. B 120, 2145-2154, 2016), a small force constant of 0.25 kcal/mol/A2 per targeted atoms was adopted, and the simulation length reached up to 100 ns, representing a decreasing rate in RMSD of approximately 0.1 Å per ns. The tMD simulations were performed in NVT ensemble with a time step of 1 fs. The above procedure could ensure a least perturbation on the system resulting from external forces applied by tMD.
  • Post Targeted Molecular Dynamics Simulations. At the end of tMD, the RMSD difference reduced to ˜0.8 Å, indicating completion of the expected conformational transition. Two trajectory snapshots at ˜90 ns of the above parallel tMD (G5, Table 1) were then extracted and subjected to 50-ns equilibration with gradually released restraints on the protein backbone atoms. The final structures were used to seed subsequent unbiased MD simulations (G6, Table 1), in which one Mg2+ was introduced at between the HNH active site and the ntDNA scissile phosphate according to the one-metal-ion mechanism. Each run was extended to 800 ns (G6, Table 1). Here, the inventors did not employ the tMD end structures (i.e., at 100 ns) as the start points for Mg2+ introduction, given that the modeled target coordinates used in tMD do not necessarily represent a true catalytic state, and importantly, that the Mg2+ might assist further conformation change to bridge the distance gap for catalysis as we previously demonstrated (Zuo and Liu, Sci. Rep. 5, 2016). This consideration allowed for spontaneous adaptation of the system to the catalytic conformation, thereby eliminating the potential artifacts from tMD. To probe the role of Mg2+, the inventors proceeded to perform a set of conventional simulations started from the derived catalytic state, in which the above placed Mg2+ was moved from the active center to the bulk solution (G7, Table 1).
  • Trajectory Analysis Methods. Details of principal component analysis (PCA), cluster analysis, binding free energy and non-bonded interaction energy calculations and other analyses are presented below.
  • Principal Component Analysis (PCA). PCA is a technique for transforming a series of potentially coordinated observations into a set of orthogonal vectors called principal components (PCs) and is widely used to characterize the dominant modes of motion underlying protein dynamics (David and Jacobs, Methods Mol. Biol. 1084:193-226, 2014; Amadei et al., Proteins 17:412-25, 1993). The calculations of PCs involve two main steps, (i) the calculation of covariance matrix, and (ii) the diagonalization of this matrix. With the goal of comparing the conformational dynamics of HNH domain between different MD simulations, the whole simulation trajectories (G1-G4 and G9, Table 1) were first combined and superimposed to the starting crystal structure using the Cas9 Ca atoms excluding those on the HNH domain. After that, the PCA calculations were performed only on the HNH domain to determine the eigen-vectors and associated eigen-values (referred to collectively as eigen-mode). The eigen-vector with the largest eigen-value corresponds to the lowest mode of motion. The PC analysis was done with the ccptraj module included within the AmberTools16 (Salomon-Ferrer et al., Wiley Interdiscip. Rev. Comput. Mol. Sci. 3:198-210, 2013).
  • HNH Active State Modeling and HNH Pairwise RMSD Computation. Starting from the pre-catalytic Cas9 structure (PDB code: 5F9R (Jiang et al. Science 351:867-871, 2016), the detailed procedure modeling its putative catalytic state of HNH domain from the homologous T4 Endonuclease VII (Endo VII) complexed with a DNA Holliday junction (PDB code: 2QNC (Biertumpfel et al., Nature 449:616-U614, 2007) is illustrated in FIG. 11 . It should be mentioned that 2QNC represents a catalytically active state where one Mg2+ was coordinated at the interface between the enzyme ββα motif and scissile phosphate (see also FIG. 2 c-2 d ), making it the best candidate for active state modeling among the available ββα-metal nuclease structures.
  • The inventors took three steps to model the HNH active state. In step 1, the scissile phosphate and flanking nucleotides in the T4 Endo VII system (2QNC) was aligned to the corresponding tDNA stretch in the Cas9 complex of the pre-catalytic state (5F9R). In step 2, Cas9 HNH domain was moved toward the tDNA with the transformation matrix calculated from the paired ββα motifs in the two nucleases, resulting in a model of the HNH domain docked at the cleavage site. Notably, the equivalent residues between the above ββα motifs for transformation matrix calculation were determined based on topology-independent structure superposition by the CLICK algorithm (Nguyen et al., Nucleic Acids Res. 39:W24-W28, 2011) instead of generally used sequence alignment. The backbone RMSD of HNH domain between the pre-catalytic Cas9 state (5F9R) and the modeled “active” state is 25 Å (FIG. 11 f ). In step 3, the inventors repeated step 1 and step 2, replacing the crystal structure (5F9R) with snapshot structures from the sets of long cMD trajectories (G1 and G2, Table 1). A modeled “active” state was obtained for every snapshot of the simulations. The inventors calculated RMSD between the snapshot structure and its corresponding “active” state and used it as a metric to evaluate how close the snapshot conformation to its putative active state.
  • Details of Generating tMD-derived Catalytic State. The inventors employed the targeted molecular dynamics (tMD) method to drive Cas9 conformational transition. The target structures for tMD were built by reference to the catalytically active T4 Endo VII system above (FIG. 11 ). To minimize the potential artificial effect by tMD, two snapshots were extracted from the sets of long cMID trajectories (G1 and G2, Table 1) as the starting structures that show most proximity to the their respective modeled “active” states in terms of HNH domain conformation (FIG. 11 ). The backbone RMSD differences for the HNH domain from the target structures are about 10 Å, which are remarkably reduced as compared with that of 25 Å if using the pre-catalytic crystal structure (5F9R) as the starting point. Accordingly, the tMD stating points were much closer to the corresponding end points in the subspace defined by the first two principal components with regard to the crystal structure (FIG. 1 d ). Docking of the HNH domain toward the putative catalytic state inevitably brings about numerous steric clashes with the other components in the complex system (FIG. 11 f ), indicating considerable conformational rearrangements in Cas9 must be implicated during the pre-catalytic to catalytic state transition. The inventors note that the trajectory snapshots from aMD were not employed, albeit further approach to the target conformations with a minimum RMSD difference of ˜5 Å, as it appears that the enhanced sampling via aMD also accompanies an appreciable distortion regarding the internal conformation of HNH domain (FIG. 10 and Table 2). During tMD, the Ca atoms of the protein residues (excluding HNH domain) exhibiting low fluctuations were weakly restrained with a force constant of 0.1 kcal/mol/A2 to prevent solute global drift. The guiding forces were exerted only on the HNH domain backbone atoms with a force constant of 0.5 kcal/mol/A2, and the simulation time was set to 100 ns (G5, Table 1), representing a RMSD deceasing rate of approximately 0.1 Å/ns.
  • At the end of tMD, the RMSD between the initial and target coordinates declined to −0.8 Å, indicating completion of the anticipated conformation change. The inventors selected two structure snapshots that are at near the end of tMD for subsequent cMID (G6, Table 1), in which one Mg2+ was introduced at the interface between the HNH domain and tDNA in the framework of the one-metal-ion mechanism (FIG. 2 d )(Yang, Q. Rev. Biophys. 44:1-93, 2011; Yang, Nat. Struct. Mol. Biol. 15:1228-31, 2008). Here, the inventors did not employ the tMD end structures (i.e., at 100 ns) as the start points for Mg2+ introduction, given that the modeled target coordinates used in tMD do not necessarily represent a true catalytic state, and importantly, that the Mg2+ may assist further conformation change to bridge the distance gap for catalysis as previously demonstrated with the RuvC domain (Zuo and Liu, Sci. Rep. 5, 2016). This consideration allowed for spontaneous adaptation of the system to the catalytic conformation. The deliberate building procedures could ensure least perturbation on the system and hence eliminate potential artificial effects by the tMD that is readily subjected to question. After sufficient equilibration, the inventors obtained a reasonable catalytic conformation, featuring stable Mg2+-involved coordination configuration (FIG. 2 b ) that matches well with that observed in the T4 Endo VII system (FIG. 2 c ).
  • Details of Generating cMDens-derived Catalytic State. The above tMD-based strategy to capture the catalytic state in essence is dependent on a modeled putative “target” state. One may question the reliability of the derived state and associated results, though the model was treated with careful considerations. To eliminate these concerns, the inventors developed an ensemble sampling-based scheme targeting the active state forward. The basic idea is as follows: (i) pre-define an a priori metric (or multiple if necessary) like distance, angle and RMSD; (ii) use this metric to track conformational transition and screen a structure most approximate to expected target state; (iii) perform ensemble conventional MD simulations (cMDens) starting from the above extracted structure; (iv) screen another closest structure snapshot from previous cMDens and initiate a new cycle of ensemble simulations. Ideally, the inventors get closer to or even hit the target conformation through several or more cycles, depending on the energetic barrier height between the initial and target states and the sampling length accessible to each independent run.
  • The inventors used the geometric mean of the distances of +4P (the scissile phosphate) to the two active residues His840 and Asp861 (√{square root over (d+4P−H840*d+ 4P−D861)}) as a metric to monitor the HNH domain conformational change: the smaller this value, the closer to the target active state (FIGS. 4 a and 4 b ). From the sets of long cMD trajectories (G1 and G2, Table 1), a structure bearing a minimum value of ˜9 Å was extracted as the starting point for ensemble simulations (FIG. 3 a ), where one Mg2+ was placed around the reaction center as done for the post tMD simulations. In each cycle, a total of 10 independent runs were carried out and each run lasted 500 ns (G8.1-G8.4, Table 1). Through four cycles, the above geometric mean got sable at ˜6 Å (FIG. 3 a ), which is comparable to that observed for the tMD-derived catalytic state (FIG. 4 a ). Accordingly, the RMSD of the reaction interface from the tMD-derived catalytic state declined from initial ˜3 Å to ˜1 Å (FIG. 4 b ). Moreover, the Mg2+-involved coordination composition and configuration here (FIG. 2 b ) are essentially the same to those derived from tMD (FIG. 2 a ), except that Tyr823 was engaged to Asp839 via an intercalated water molecule, again confirming the structural role of Tyr823 around the reaction center. These observations thus demonstrated formation of the cMDens-derived catalytic state.
  • Cluster Analysis. The simulation structures used for visualization and comparison were determined through the cluster analysis with the package VMD (version 1.9.2)(Humphrey et al., J. Mol. Graph. Model. 14:33-38, 1996). Following previous experience with the same system (Zuo and Liu, Sci. Rep. 5, 2016) the reaction interface atoms were selected for calculations, involving the heavy atoms of the three active residues, Asp839, His840 and Asp861, the Ca atoms of the remaining residues on the HNH ββα motif, the backbone of the scissile stretch on the tDNA (+3P to +5P), and the coordinated Mg2+ between them. By varying the RMSD cutoff (0.6-1.0 Å here), four groups were obtained in which the first two account for >80% of total population. The structure(s) closest to the centroid of the largest ensemble were extracted for analysis.
  • Binding Free Energy Calculation and Per-residue Energy Decomposition. The end-point Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) approach (Hou et al., J. Chem. Inf. Model. 51:69-82, 2011) was employed to estimate per-residue energetic contribution to Mg2+ binding and the difference in the affinities of the tDNA to Cas9 with and without Mg2+ bound at the reaction interface. Compared to the alternative Molecular Mechanics-Poisson Boltzmann Surface Area (MM-PBSA), MM-GBSA is computationally more efficient and has shown to give comparable or even better accuracy (Hou et al., J. Chem. Inf Model. 51:69-82, 2011; Zuo et al., J. Phys. Chem. B 120:2145-54, 2016). All the MM-GBSA calculations were performed with the program MMPBSA.py in AmberTools16 (Miller et al., J. Chem. Theory Comput. 8:3314-21, 2012). The entropic contribution was not taken into account here, as omission of this term does not qualitatively affect the results (Hou et al., J. Chem. Inf Model. 51:69-82, 2011; Zuo et al., J. Phys. Chem. B 120:2145-54, 2016). The last 400 ns of each set of simulation trajectories were used for calculations, with 50-ps intervals. Specially, in the case of Mg2+ binding free energy calculation, the three water molecules closest to the coordinated Mg2 in each trajectory snapshot were retained and considered as part of the Cas9-sgRNA/tDNA “receptor”.
  • Non-bonded Interaction Energy Calculation. The non-bonded interaction energies of the HNH ββα motif with the scissile phosphate and flanking nucleotides (+3P to +5P) were calculated by the software NAMD (version 2.12)(Phillips et al., J. Comput. Chem. 26:1781-1802, 2005), employing the same structural ensemble as mentioned above. The truncation cutoff was set to 10 Å, consistent with that used in MD simulations.
  • The inventors have identified two states, the pseudo active state and the active state, using computational techniques. These two states have similar global conformations. The major distinction lies in the local conformation involving the residues N863 and D861. The active state of the Cas9 HNH domain identified by computer modeling and simulations is responsible for the tDNA cleavage. The inventors have performed site-directed mutagenesis experiments to validate this newly identified active state. Four single mutations (D837A, D839A, D861A, and N863A) plus one double mutation (D861A/N863A) was performed (FIG. 15 ). Remarkably, the combined experimental and computational data suggest that D839 and N863 are the essential residues for Cas9 activity by directly coordinating the catalytic Mg2+ at the interface between the HNH domain and tDNA, validating the newly identified active state.
  • Both the pseudo-active and active states exist during the Cas9 conformational transition and the relevant structural information could be exploited for rational design of enhanced specificity Cas9 variants. Further comparison of the two conformational states reveal that the major structural differences lie in the interactions of the HNH domain with the REC1 domain. Collectively, —the data have identified two new interacting pairs, viz., Glu371 with Lys866 and Asp406 with Arg864, It is contemplated that alanine substitution at the sites can be beneficial and result in improved Cas9 specificity.
  • The initial model for the active Cas9 complex was constructed by replacing the α segment of the ββα-Me motif in the optimized catalytic Cas9 complex with the corresponding part in the Mg2+-bound apo-Cas9 structure (PDB code: 4CMP). The catalytic Cas9 complex structure was taken from the above production simulation, as described in ¶[137], near 100 ns (i.e., about half of the simulation time), and the Mg2+-bound apo-Cas9 structure from the simulation trajectory was selected based on the observation of reasonable bonding with the connecting residues and minimal steric clashes after replacement of the a segment. After thorough energy minimization, the structural model was subjected to multi-stage equilibration: an initial 20-ns relaxation of the α segment and surrounding residues, an another 20-ns equilibration with the inter-atomic distances within the metal center retrained relative to the T4 Endo VII system, followed by a 20-ns equilibration with the restraints gradually released. Subsequently, two independent replicas were performed (250 ns/run)under the same simulation conditions set for the pseudo-active system above.
  • Ten Cas9 variants were designed and synthesized to test its activity and specificity. (Table 5). The mutation designed in each variant followed the combination of five rationales, including (1) weakening Cas9 binding affinity with tDNA; (2) weakening Cas9 binding affinity with ntDNA; (3) weakening Cas9 binding affinity with sgRNA; (4) raising threshold energy for Cas9 HNH domain conformational activation; (5) destabilizing the formation of Cas9 HNH domain active conformation.
  • Two variants include mutations designed on all of the five rationales. These two mutants are N588A/R765A/D835A/K1246A (Mut1.8) and N14A/R447A/R765A/S845D (Mut1.9) (Table 5, FIG. 16 a-16 c ). The gene-editing activities and specificity assays of these two tetramutant variants of Cas9 (FIG. 16 a-16 c ) were performed. Using HEK293T-EGFP cells, the above two tetramutants exhibit similar protein expression level and comparable gene-editing efficiency compared to the wild type Cas9 (FIG. 16 a-16 c ), indicating these two designed variants do not significantly alter the on-target activity.
  • TABLE 2a
    Pairwise RMSDs for the Cα atoms of HNH domain among different Cas9 crystal structures [mean = 1.4 (0.6) Å]
    apo-Cas9 Cas9-sgRNA Cas9-sgRNA-DNA
    PDB code 4CMP_A 4CMP_B 4CMQ_A 4ZT0_A 4ZT0_C 4ZT9_A 4OO8_A 4UN3_B 5F9R_B
    4CMP_A* 0.5 0.5 1.9 2.4 1.8 1.8 1.8 1.8
    4CMP_B 0.5 0.5 1.9 2.5 1.9 1.9 1.9 1.9
    4CMQ_A 0.5 0.5 1.7 2.3 1.7 1.7 1.7 1.7
    4ZT0_A 1.9 1.9 1.7 1.5 0.6 0.7 0.7 0.8
    4ZT0_C 2.4 2.5 2.3 1.5 1.6 1.6 1.5 1.6
    4ZT9_A 1.8 1.9 1.7 0.6 1.6 0.8 0.8 0.9
    4OO8_A 1.8 1.9 1.7 0.7 1.6 0.8 0.5 0.7
    4UN3_B 1.8 1.9 1.7 0.7 1.5 0.8 0.5 0.7
    5F9R_B 1.8 1.9 1.7 0.8 1.6 0.9 0.7 0.7
  • TABLE 2b
    Pairwise RMSDs for the Cα atoms of HNH ββα fold among different Cas9
    crystal structures [mean = 1.4 (0.7) Å]
    apo-Cas9 Cas9-sgRNA Cas9-sgRNA-DNA
    PDB code 4CMP_A 4CMP_B 4CMQ_A 4ZT0_A 4ZT0_C 4ZT9_A 4OO8_A 4UN3_B 5F9R_B
    4CMP_A 0.3 0.3 2.1 2.2 2.0 2.1 2.0 2.0
    4CMP_B 0.3 0.4 2.2 2.3 2.1 2.2 2.1 2.1
    4CMQ_A 0.3 0.4 2.1 2.1 1.9 2.0 2.0 2.0
    4ZT0_A 2.1 2.2 2.1 0.7 0.3 1.0 1.0 1.1
    4ZT0_C 2.2 2.3 2.1 0.7 0.8 0.7 0.8 0.8
    4ZT9_A 2.0 2.1 1.9 0.3 0.8 1.0 1.0 1.0
    4OO8_A 2.1 2.2 2.0 1.0 0.7 1.0 0.4 0.5
    4UN3_B 2.0 2.1 2.0 1.0 0.8 1.0 0.4 0.4
    5F9R_B 2.0 2.1 2.0 1.1 0.8 1.0 0.5 0.4
    Residues 781 to 905;
    Residues 837 to 867;
    *Chain identifier present in the PDB file.
  • TABLE 3
    Average pairwise Cα RMSDs of tMD-derived and cMDens-derived catalytic Cas9 aggregates
    relative to the crystal structure (upper) and between the two structural ensembles (lower) [Å]*
    ALL RuvC Topo CTD HNH REC1 REC2 REC3
    Relative to the crystal structure [PDB code: 5F9R]
    tMD-derived 5.6 (0.2) 3.0 (0.2) 3.8 (0.6) 7.2 (0.7) 10.6 (0.3) 4.0 (0.4) 7.0 (0.4) 3.4 (0.3)
    catalytic state
    cMDens-derived 5.7 (0.2) 2.7 (0.3) 3.0 (0.5) 6.8 (0.8) 11.5 (0.6) 3.2 (0.5) 7.9 (0.7) 2.9 (0.3)
    catalytic state
    Relative to the crystal structure [PDB code: 5F9R]§
    tMD-derived 5.6 (0.2) 1.7 (0.1) 2.6 (0.5) 3.6 (0.5)  1.3 (0.2) 1.6 (0.1) 1.9 (0.1) 2.5 (0.1)
    catalytic state
    cMDens-derived 5.7 (0.2) 1.7 (0.2) 2.3 (0.3) 3.5 (0.4)  1.4 (0.3) 1.5 (0.1) 1.9 (0.2) 2.2 (0.2)
    catalytic state
    ALL RuvC Topo CTD HNH REC1 REC2 REC3
    Between the two different derived catalytic states
    2.6 (0.1) 1.8 (0.2) 1.8 (0.4) 3.6 (0.3) 2.5 (0.5) 2.0 (0.2) 2.0 (0.3) 2.4 (0.1)
    Between the two different derived catalytic states§
    2.6 (0.1) 1.2 (0.1) 1.0 (0.2) 3.1 (0.1) 1.2 (0.4) 1.2 (0.1) 1.1 (0.2) 1.9 (0.1)
    *tMD, targeted MD; cMDens, ensemble conventional MD. See Table 1 in main text. An aggregate of 50 most populated structures were extracted for calculations based on cluster analysis (Supplementary Text)
    The whole protein
    Residues 1047-1071 and 1016-1031 excluded. Due to the absence of 5′-end ntDNA4, this local binding groove exhibits remarkable opening and closing mobility.
    Best-fit to the Cα atoms of the whole reference protein prior to RMSD calculations
    §Best-fit to the Cα atoms of individual protein domains prior to RMSD calculations
  • TABLE 4
    Summary of the interacting pairs between Cas9 HNH domain and other components in the complex system from biased
    (tMD) and unbiased ensemble (cMDens) simulations and comparison with the starting pre-catalytic structure
    Cas9 HNH Catalytic state [tMD] Pre-catalytic Catalytic Suggested
    domain domain* Interaction pattern§ (occurrence %) state [5F9R] state [cMDens] substitution#
    REC3 Glu584 Lys775 Salt bridge/H-bond 19
    Asp585 27
    Asp585 Arg778 Salt bridge/H-bond 16
    Lys558 Glu779 Salt bridge/H-bond 17 Arg586Ala
    Arg586 48 Glu779Ala
    REC2 Asp261 Gln805 H-bond 7
    Lys263 16
    Lys234 Asp829 Salt bridge/H-bond 15
    Asn235 H-bond 13
    Glu223 Arg832 Salt bridge/H-bond 91 Glu223Ala
    Arg859 Salt bridge/H-bond 18 Arg859Ala
    Thr249 Asn831 H-bond 27
    Asn251 46
    Gly247 Ser834 H-bond 44
    Thr249 16
    Ser217 Asp835 H-bond 43 Asp835Ala
    Lys218 55
    Ser219 99
    BH Thr58 Lys848 H-bond 57
    Glu60 Lys848 Salt bridge/H-bond 51
    Gln844 H-bond 20
    REC1 Glu370 Lys862 Salt bridge/H-bond 61 Glu370Ala
    Glu396 68 Glu396Ala
    Glu370 Lys866 Salt bridge/H-bond 25 Lys866Ala
    Glu371 18
    tDNA DT23 Arg765 Salt bridge/H-bond 100 Arg765Ala
    DA24 16
    DT25 Asn767 H-bond 92 Asn767Ala
    DG13 Ser845 H-bond 93 Ser845Asp
    sgRNA RG2 Arg765 Salt bridge/H-bond 99 Arg765Ala
    RA9 Arg780 Salt bridge/H-bond 100 Arg780Ala
    Arg783 72 Arg783Ala
    RA8 Asn803 H-bond 94 Asn803Ala
    Gln807 37
    RA9 Tyr812 H-bond 97 Tyr812Ala
    RA12 Lys848 Salt bridge/H-bond 89
    RU13 81
    RG11 Arg895 Salt bridge/H-bond 99 Arg895Ala
    The residues whose alanine substitution was experimentally shown to enhance Cas9 specificity are highlighted (see FIG. 7c). The promising candidate residues for further testing, determined based on our study, are in red boldface.
    *Part of HNH domain flanking link regions (L1&L2) included into statistics
    §Salt bridge interaction is defined as the distance between the nitrogen and oxygen atoms is less than 4 Å; A hydrogen bond (H-bond) is defined as the distance between the donor and receptor atoms is less than 3.5 Å and the angle formed by the donor, hydrogen and acceptor atoms is less than 35° from 180°.
    Post targeted MD (tMD)-derived interactions (G6 in Table 1).
    Presence (✓) or not (—) in the initial pre-catalytic crystal structure (PDB code: 5F9R)
    Presence (✓) or not (—) in the ensemble conventional MD (cMDens)-derived catalytic state
    #Suggested amino acid mutations for further specificity improvement
  • TABLE 5
    Rational Design of spCas9 Variants with Potential Improved Specifity
    Index Version Combination Substition Rationale*
    1 HF-spCas9(v1.0) K526A/N588A/R765A/N767A R1 + R3 + R4
    2 HF-spCas9(v1.1) N588A/K929A/H930A/Y1013A R1 + R3 + R4
    3 HF-spCas9(v1.2) R447A/K526A/K929A R1 + R3 + R4
    4 HF-spCas9(v1.3) N588A/N767A/Y1013A/K866A R1 + R3 + R4 + R5
    5 HF-spCas9(v1.4) N588A/N767A/Y1013A/S845D R1 + R3 + R4 + R5
    6 HF-spCas9(v1.5) K268A/K526A/N588A/N767A R1 + R3 + R4
    7 HF-spCas9(v1.6) N14A/K526A/K866A/K1246A R1 + R2 + R5
    8 HF-spCas9(v1.7) N14A/R447A/Y1013A/K1246A R1 + R2 + R3
    9 HF-spCas9(v1.8) N588A/R765A/D835A/K1246A R1 + R2 + R3 + R4 + R5
    10 HF-spCas9(v1.9) N14A/R447A/R765A/S845D R1 + R2 + R3 + R4 + R5
    *R1: weakening binding affinity with tDNA
    R2: weakening binding affinity with ntDNA
    R3: weakening binidng affinity with sgRNA
    R4: rasing threshold enegy for Cas9 HNH domain conformational activation
    R5: destablizing the formaiton of Cas9 HNH domain active conformation

Claims (19)

1. A method for engineering a modified Cas9 protein, the method comprising modeling an active state of a Cas9 HNH domain by:
(a) aligning a scissile phosphate and flanking nucleotides of a T4 Endo VII system (2QNC) to a corresponding tDNA stretch in a Cas9 pre-catalytic state complex (5F9R);
(b) calculating a tDNA transformation matrix from paired ββα-metal (ββα-Me) motifs in the two nucleases to produce a model of the Cas9-HNH domain docked at a cleavage site;
(c) repeating (a) and (b), replacing the Cas9 pre-catalytic state complex crystal structure (5F9R) with snapshot structures from sets of long conventional molecular dynamics (cMD) trajectories to obtain an optimized Cas9 complex;
(d) replacing an α segment of the ββα-Me motif in the optimized Cas9 complex from (c) with the corresponding part in a Mg2+-bound apo-Cas9 structure (4CMP); and
(e) performing long cMD simulations to obtain the active state of the Cas9 HNH domain.
2. The method of claim 1, further comprising identifying amino acid residues in the active state of the Cas9 HNH domain model.
3. The method of claim 2, wherein the identified amino acid residues are involved in non-specific DNA cleavage or off-target binding.
4. The method of claim 2, further comprising modifying one or more of the identified amino acid residues to amino acids having a lower propensity for non-specific DNA cleavage or off-target binding.
5. The method of claim 4, wherein the amino acids used for modification have lower hydrophobicity, lower positive charge, or higher polarizability than the corresponding wild-type residues.
6. The method of claim 4, wherein the modified amino acid residues comprise at least three or four modifications to the amino acid sequence corresponding to SEQ ID NO:1, the modifications comprising one or more of N588A/R765A/N767A;
N588A/Q695A/R765A/N767A; N588A/N692A/R765A/N767A; N588A/N692A/R765A/R925A;
N588A/N692A/N767A/R925A; N692A/R765A/N767A/R925A; Q695A/R765A/N767A/R925A;
N588A/N692A/R765A/K929A; N588A/N692A/N767A/K929A;
N692A/R765A/N767A/K929A; Q695A/R765A/N767A/K929A;
N497A/Q695A/R765A/N767A; K526A/K528A/N497A/Q926A; K526A/K528A/K929A;
K526A/R765A/N767A/Y1013A; K528A/R765A/N767A/Y1013A;
K526A/R765A/N767A/Q926A; N497A/K526A/R765A/N767A;
N497A/K528A/R765A/N767A; N497A/K526A/R765A/Q926A;
N497A/K528A/R765A/Q926A; N588A/R765A/N767A/S845D; N588A/R765A/N767A/R832A;
N588A/R765A/N767A/K862A; N588A/R765A/N767A/K866A; N588A/R765A/N767A/R859A;
N588A/R765A/N767A/Q844A; N588A/R765A/N767A/K810A;
N588A/R765A/N767A/K848A; N588A/R765A/N767A/E370A; N588A/R765A/N767A/E223A;
N497A/N692A/K1031A/S845D; N497A/N692A/K1031A/R832A;
N497A/N692A/K1031A/K862A; N497A/N692A/K1031A/K866A;
N497A/N692A/K1031A/R859A; N497A/N692A/K1031A/Q844A;
N497A/N692A/K1031A/K810A; N497A/N692A/K1031A/K848A;
N497A/N692A/K1031A/E370A; N497A/N692A/K1031A/E223A;
N497A/N695A/K1031A/S845D; N497A/N695A/K1031A/R832A;
N497A/N695A/K1031A/K862A; N497A/N695A/K1031A/K866A;
N497A/N695A/K1031A/R859A; N497A/N695A/K1031A/Q844A;
N497A/N695A/K1031A/K810A; N497A/N695A/K1031A/K848A;
N497A/N695A/K1031A/E370A; N497A/N695A/K1031A/E223A;
K526A/N695A/K1031A/S845D; K526A/N695A/K1031A/R832A;
K526A/N695A/K1031A/K862A; K526A/N695A/K1031A/K866A;
K526A/N695A/K1031A/R859A; K526A/N695A/K1031A/Q844A;
K526A/N695A/K1031A/K810A; K526A/N695A/K1031A/K848A;
K526A/N695A/K1031A/E370A; K526A/N695A/K1031A/E223A;
K528A/N695A/K1031A/S845D; K528A/N695A/K1031A/R832A;
K528A/N695A/K1031A/K862A; K528A/N695A/K1031A/K866A;
K528A/N695A/K1031A/R859A; K528A/N695A/K1031A/Q844A;
K528A/N695A/K1031A/K810A; K528A/N695A/K1031A/K848A;
K528A/N695A/K1031A/E370A; K528A/N695A/K1031A/E223A; N692A/R765A/Y1013A;
N692A/R765A/S845D/Y1013A; N692A/R765A/R832A/Y1013A;
N692A/R765A/K862A/Y1013A; N692A/R765A/K866A/Y1013A;
N692A/R765A/R859A/Y1013A; N692A/R765A/Q844A/Y1013A;
N692A/R765A/K810A/Y1013A; N692A/R765A/K848A/Y1013A;
N692A/R765A/E370A/Y1013A; N692A/R765A/E223A/Y1013A; N692A/R765A/Y1013A;
N692A/Q695A/K810A/Y1013A; N692A/Q695A/K848A/Y1013A; K526A/K528A/Y1013A;
K526A/K528A/K268A/Y1013A; R447A/K526A/K528A/Y1013A; R765A/K929A/H930A;
R765A/K929A/S845D/Y1013A; R765A/K929A/R832A/Y1013A;
R765A/K929A/K862A/Y1013A; R765A/K929A/K866A/Y1013A;
R765A/K929A/R859A/Y1013A; R765A/K929A/Q844A/Y1013A;
R765A/K929A/K810A/Y1013A; R765A/K929A/K848A/Y1013A;
R765A/K929A/E370A/Y1013A; R765A/K929A/E223A/Y1013A;
R765A/Q926A/K929A/H930A; R447A/K500A/R661A; K500A/N695A/K929A/S845D;
K500A/N695A/K929A/R832A; K500A/N695A/K929A/K862A;
K500A/N695A/K929A/K866A; K500A/N695A/K929A/R859A;
K500A/N695A/K929A/Q844A; K500A/N695A/K929A/K810A;
K500A/N695A/K929A/K848A; K500A/N695A/K929A/E370A; K500A/N695A/K929A/E223A;
R765A/R925/Q926A; R765A/R925/Q926/Y1013A; N14A/K961A/K968A;
N14A/K961A/K968A/S845D; N14A/K961A/K968A/K848A; R447A/R765A/Y1013A;
K526A/N588A/R765A/N767A; N588A/K929A/H930A/Y1013A; R447A/K526A/K929A;
N588A/N767A/Y1013A/K866A; N588A/N767A/Y1013A/S845D;
K268A/K526A/N588A/N767A; N14A/K526A/K866A/K1246A;
N14A/R447A/Y1013A/K1246A; N588A/R765A/D835A/K1246A;
N14A/R447A/R765A/S845D; K1244A/K1246A/K848A; K1244A/K1246A/K810A;
K1244A/K1246A/R832A; K1244A/K1246A/K862A; K1244A/K1246A/K866A;
K1244A/K1246A/R859A; K1244A/K1246A/E370A; K1244A/K1246A/E223A;
K1244A/K1246A/S845D; K1244A/K1246A/Q844A; K1244A/K1246A/Q844A/K1031A;
K1244A/K1246A/Q844A/Y1013A; K1244A/K1246A/Q844A/N695A;
K1244A/K1246A/Q844A/N692A; K1244A/K1246A/Q844A/N588A;
K1244A/K1246A/Q844A/N767A; K1244A/K1246A/Q844A/Q926A;
K268A/R447A/Y450A/K1031A; K268A/R447A/Y450A/Y1013A;
K268A/R447A/Y450A/N695A; K268A/R447A/Y450A/N692A;
K268A/R447A/Y450A/N588A; K268A/R447A/Y450A/N767A;
K268A/R447A/Y450A/Q926A; N14A/K268A/R447A/Y450A; N14A/Y450A/K526A/K528A;
N14A/Y450A/R765A/S845D; N14A/Y450A/R765A/R832A; N14A/Y450A/R765A/K862A;
N14A/Y450A/R765A/K866A; N14A/Y450A/R765A/R859A; N14A/Y450A/R765A/Q844A;
N14A/Y450A/R765A/K810A; N14A/Y450A/R765A/K848A; N14A/Y450A/R765A/E370A;
N14A/Y450A/R765A/E223A; R447A/Y450A/R765A/S845D; R447A/Y450A/R765A/R832A;
R447A/Y450A/R765A/K862A; R447A/Y450A/R765A/K866A; R447A/Y450A/R765A/R859A;
R447A/Y450A/R765A/Q844A; R447A/Y450A/R765A/K810A; R447A/Y450A/R765A/K848A;
R447A/Y450A/R765A/E370A; R447A/Y450A/R765A/E223A; K268A/R447A/R765A/S845D;
K268A/R447A/R765A/R832A; K268A/R447A/R765A/K862A; K268A/R447A/R765A/K866A;
K268A/R447A/R765A/R859A; K268A/R447A/R765A/Q844A; K268A/R447A/R765A/K810A;
K268A/R447A/R765A/K848A; K268A/R447A/R765A/E370A; K268A/R447A/R765A/E223A;
Q805A/D829A/N831A/D835A; R765A/D829A/D835A/Y1013A;
R918A/D829A/D835A/Y1013A; R895A/D829A/D835A/Y1013A;
K500A/D829A/D835A/Y1013A; K929A/D829A/D835A/Y1013A;
R780A/D829A/D835A/Y1013A; R783A/D829A/D835A/Y1013A;
R765A/D829A/D835A/N695A; R918A/D829A/D835A/N695A;
R895A/D829A/D835A/N695A; K500A/D829A/D835A/N695A;
K929A/D829A/D835A/N695A; R780A/D829A/D835A/N695A;
R783A/D829A/D835A/N695A; N695A/R780A/R783A/S845D; N695A/R780A/R783A/R832A;
N695A/R780A/R783A/K862A; N695A/R780A/R783A/K866A; N695A/R780A/R783A/R859A;
N695A/R780A/R783A/Q844A; N695A/R780A/R783A/K810A; N695A/R780A/R783A/K848A;
N695A/R780A/R783A/E370A; N695A/R780A/R783A/E223A; N692A/R780A/R783A/S845D;
N692A/R780A/R783A/R832A; N692A/R780A/R783A/K862A; N692A/R780A/R783A/K866A;
N692A/R780A/R783A/R859A; N692A/R780A/R783A/Q844A; N692A/R780A/R783A/K810A;
N692A/R780A/R783A/K848A; N692A/R780A/R783A/E370A; N692A/R780A/R783A/E223A;
N692A/R780A/N803A/S845D; N692A/R780A/N803A/R832A; N692A/R780A/N803A/K862A;
N692A/R780A/N803A/K866A, N692A/R780A/N803A/R859A; N692A/R780A/N803A/Q844A;
N692A/R780A/N803A/K810A; N692A/R780A/N803A/K848A; N692A/R780A/N803A/E370A;
N692A/R780A/N803A/E223A; N692A/R783A/N803A/S845D; N692A/R783A/N803A/R832A;
N692A/R783A/N803A/K862A; N692A/R783A/N803A/K866A; N692A/R783A/N803A/R859A;
N692A/R783A/N803A/Q844A; N692A/R783A/N803A/K810A;
N692A/R783A/N803A/K848A; N692A/R783A/N803A/E370A; N692A/R783A/N803A/E223A;
N695A/R783A/N803A/S845D; N695A/R783A/N803A/R832A; N695A/R783A/N803A/K862A;
N695A/R783A/N803A/K866A; N695A/R783A/N803A/R859A; N695A/R783A/N803A/Q844A;
N695A/R783A/N803A/K810A; N695A/R783A/N803A/K848A; N695A/R783A/N803A/E370A;
N695A/R783A/N803A/E223A; N695A/R783A/Y812A/S845D; N695A/R783A/Y812A/R832A;
N695A/R783A/Y812A/K862A; N695A/R783A/Y812A/K866A; N695A/R783A/Y812A/R859A;
N695A/R783A/Y812A/Q844A; N695A/R783A/Y812A/K810A;
N695A/R783A/Y812A/K848A; N695A/R783A/Y812A/E370A; N695A/R783A/Y812A/E223A;
K500A/N588A/S845D/Y1013A; K500A/N588A/R832A/Y1013A;
K500A/N588A/K862A/Y1013A; K500A/N588A/K866A/Y1013A;
K500A/N588A/R859A/Y1013A; K500A/N588A/Q844A/Y1013A;
K500A/N588A/K810A/Y1013A; K500A/N588A/K848A/Y1013A;
K500A/N588A/E370A/Y1013A; K500A/N588A/E223A/Y1013A;
K500A/N588A/S845D/Y1013A; N588A/N692A/K1244A/K1246A; R447A/R765A/N497A;
R447A/R765A/K929A; R447A/R765A/N767A; R447A/R765A/N767A/K558A;
R447A/R765A/N767A/R586A; R447A/R765A/N767A/K1244A;
R447A/R765A/N767A/K1246A; R447A/R765A/N767A; R447A/N695A/R765A/N767A;
R447A/R765A/N695A/K558A; R447A/R765A/N695A/R586A;
R447A/R765A/N695A/K1244A; R447A/R765A/N695A/K1246A;
R447A/R765A/N767A/K1246A; R447A/N695A/R765A/N767A;
R447A/R765A/N695A/K558A; R447A/R765A/N695A/R586A;
R447A/R765A/N695A/K1244A; R447A/R765A/N695A/K1246A;
R447A/N692A/R765A/N767A; R447A/R765A/N692/K558A; R447A/R765A/N692/R586A;
R447A/R765A/N692/K1244A; R447A/R765A/N692/K1246A; or N14A/R447A/R765A/S845A.
7. The method of claim 6, wherein the modifications comprise:
K526A/N588A/R765A/N767A; N588A/K929A/H930A/Y1013A; R447A/K526A/K929A;
N588A/N767A/Y1013A/K866A; N588A/N767A/Y1013A/S845D;
K268A/K526A/N588A/N767A; N14A/K526A/K866A/K1246A;
N14A/R447A/Y1013A/K1246A; N588A/R765A/D835A/K1246A; or
N14A/R447A/R765A/S845D.
8. The method of claim 6, wherein the modifications comprise:
N588A/R765A/D835A/K1246A or N14A/R447A/R765A/S845D.
9. The method of claim 6, further comprising one or more modifications including modification of Asn14, Lys268, Glu370, Arg447, Tyr450, Asn497, Lys500, Lys526, Lys528, Lys558, Asn588, Arg661, Asn692, Gln695, Arg780, Arg783, Asn803, Gln805, Lys810, Tyr812, Asp829, Asn831, Arg832, Asp835, Gln844, Lys848, Lys862, Arg925, Gln926, Lys929, His930, Lys961, Lys968, Tyr1013, Lys1031, Lys1244, or Lys1246 corresponding to SEQ ID NO:1.
10. The method of claim 9, wherein the modifications comprise substitution to an alanine, glycine, lysine, arginine, aspartic acid, or glutamic acid.
11. The method of claim 6, wherein the modifications comprise at least four amino acid modifications.
12. The method of claim 1, further comprising constructing 3D structure models of the engineered modified Cas9 proteins.
13. The method of claim 1, further comprising evaluating the specificity of the engineered modified Cas9 proteins.
14. The method of claim 13, wherein the specificity of the engineered modified Cas9 proteins is evaluated by comparing them with the specificity of a wild-type Cas9 protein or a Cas9 protein having other modifications.
15. The method of claim 13, wherein the engineered modified Cas9 proteins have improved specificity compared to the specificity of the wild-type Cas9 protein.
16. The method of claim 15, wherein the wild-type Cas9 protein is a Streptococcus pyogenes Cas9 protein.
17. The method of claim 13, wherein the specificity is evaluated using a cleavage assay or a human cell-based enhanced GFP disruption assay.
18. The method of claim 1, further comprising selecting one or more of the engineered modified Cas9 proteins for use in genome editing.
19. A method for engineering a modified Cas9 protein having, the method comprising:
modeling an active state of a Cas9 HNH domain by:
(a) aligning a scissile phosphate and flanking nucleotides of a T4 Endo VII system (2QNC) to a corresponding tDNA stretch in a Cas9 pre-catalytic state complex (5F9R);
(b) calculating a tDNA transformation matrix from paired ββα-metal (ββα-Me) motifs in the two nucleases to produce a model of the Cas9-HNH domain docked at a cleavage site;
(c) repeating (a) and (b), replacing the Cas9 pre-catalytic state complex crystal structure (5F9R) with snapshot structures from sets of long conventional molecular dynamics (cMD) trajectories to obtain an optimized Cas9 complex;
(d) replacing an α segment of the ββα-Me motif in the optimized Cas9 complex from (c) with the corresponding part in a Mg2+-bound apo-Cas9 structure (4CMP); and
(e) performing long cMD simulations to obtain the active state of the Cas9 HNH domain;
identifying amino acid residues in the active state of the Cas9 HNH domain model involved in non-specific DNA cleavage or off-target binding;
engineering the modified Cas9 protein by modifying one or more of the identified amino acid residues to amino acids having lower propensity for non-specific DNA cleavage or off-target binding.
US18/201,537 2017-09-08 2023-05-24 Engineered cas9 variants Pending US20230357737A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/201,537 US20230357737A1 (en) 2017-09-08 2023-05-24 Engineered cas9 variants

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762555873P 2017-09-08 2017-09-08
PCT/US2018/050279 WO2019051419A1 (en) 2017-09-08 2018-09-10 MODIFIED CASE VARIANTS9
US202016645254A 2020-03-06 2020-03-06
US18/201,537 US20230357737A1 (en) 2017-09-08 2023-05-24 Engineered cas9 variants

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2018/050279 Continuation WO2019051419A1 (en) 2017-09-08 2018-09-10 MODIFIED CASE VARIANTS9
US16/645,254 Continuation US11713452B2 (en) 2017-09-08 2018-09-10 Engineered CAS9 variants

Publications (1)

Publication Number Publication Date
US20230357737A1 true US20230357737A1 (en) 2023-11-09

Family

ID=65634528

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/645,254 Active 2039-12-28 US11713452B2 (en) 2017-09-08 2018-09-10 Engineered CAS9 variants
US18/201,537 Pending US20230357737A1 (en) 2017-09-08 2023-05-24 Engineered cas9 variants

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/645,254 Active 2039-12-28 US11713452B2 (en) 2017-09-08 2018-09-10 Engineered CAS9 variants

Country Status (2)

Country Link
US (2) US11713452B2 (en)
WO (1) WO2019051419A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016057961A1 (en) 2014-10-10 2016-04-14 Editas Medicine, Inc. Compositions and methods for promoting homology directed repair
US12286727B2 (en) 2016-12-19 2025-04-29 Editas Medicine, Inc. Assessing nuclease cleavage
US12110545B2 (en) 2017-01-06 2024-10-08 Editas Medicine, Inc. Methods of assessing nuclease cleavage
EP3615672A1 (en) 2017-04-28 2020-03-04 Editas Medicine, Inc. Methods and systems for analyzing guide rna molecules
EP3635104A1 (en) 2017-06-09 2020-04-15 Editas Medicine, Inc. Engineered cas9 nucleases
EP3652312A1 (en) 2017-07-14 2020-05-20 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
AU2019291918B2 (en) 2018-06-29 2025-06-12 Editas Medicine, Inc. Synthetic guide molecules, compositions and methods relating thereto
WO2020163307A1 (en) * 2019-02-06 2020-08-13 Emendobio Inc. New engineered high fidelity cas9
WO2020182941A1 (en) * 2019-03-12 2020-09-17 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Cas9 variants with enhanced specificity
US20220315913A1 (en) * 2019-06-14 2022-10-06 Arbor Biotechnologies, Inc. Novel crispr dna targeting enzymes and systems
CN114341166A (en) * 2019-09-05 2022-04-12 阿伯生物技术公司 Novel CRISPR DNA-targeting enzymes and systems
WO2022120439A1 (en) * 2020-12-11 2022-06-16 The University Of Western Australia Enzyme variants
WO2022152746A1 (en) 2021-01-13 2022-07-21 Alia Therapeutics Srl K526d cas9 variants and applications thereof
EP4463544A1 (en) * 2022-01-12 2024-11-20 Genecker Co., Ltd Cas9 proteins with enhanced specificity and uses thereof

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202013012597U1 (en) 2012-10-23 2017-11-21 Toolgen, Inc. A composition for cleaving a target DNA comprising a guide RNA specific for the target DNA and a Cas protein-encoding nucleic acid or Cas protein, and their use
DK2898075T3 (en) 2012-12-12 2016-06-27 Broad Inst Inc CONSTRUCTION AND OPTIMIZATION OF IMPROVED SYSTEMS, PROCEDURES AND ENZYME COMPOSITIONS FOR SEQUENCE MANIPULATION
US9234213B2 (en) * 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
MA41349A (en) * 2015-01-14 2017-11-21 Univ Temple RNA-GUIDED ERADICATION OF HERPES SIMPLEX TYPE I AND OTHER ASSOCIATED HERPES VIRUSES
US20160362667A1 (en) * 2015-06-10 2016-12-15 Caribou Biosciences, Inc. CRISPR-Cas Compositions and Methods
AU2016280893B2 (en) * 2015-06-18 2021-12-02 Massachusetts Institute Of Technology CRISPR enzyme mutations reducing off-target effects
US9926546B2 (en) * 2015-08-28 2018-03-27 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
CN118726313A (en) * 2016-10-07 2024-10-01 综合Dna技术公司 CAS9 mutant gene of Streptococcus pyogenes and polypeptide encoded thereby
WO2018074979A1 (en) * 2016-10-17 2018-04-26 Nanyang Technological University Truncated crispr-cas proteins for dna targeting
WO2018226855A1 (en) * 2017-06-06 2018-12-13 The General Hospital Corporation Engineered crispr-cas9 nucleases
BR112020000310A2 (en) * 2017-07-07 2020-07-14 Toolgen Incorporated specific target crispr variants

Also Published As

Publication number Publication date
US11713452B2 (en) 2023-08-01
US20200208129A1 (en) 2020-07-02
WO2019051419A1 (en) 2019-03-14

Similar Documents

Publication Publication Date Title
US20230357737A1 (en) Engineered cas9 variants
Collias et al. CRISPR technologies and the search for the PAM-free nuclease
Schmitz et al. Structural basis for the assembly of the type V CRISPR-associated transposon complex
Palermo Structure and dynamics of the CRISPR–Cas9 catalytic complex
Sterner et al. Catalytic versatility, stability, and evolution of the (βα) 8-barrel enzyme fold
Cavarelli et al. l‐arginine recognition by yeast arginyl‐tRNA synthetase
Zhang et al. Role of the active site guanine in the glmS ribozyme self-cleavage mechanism: quantum mechanical/molecular mechanical free energy simulations
Kitevski-LeBlanc et al. The RNF168 paralog RNF169 defines a new class of ubiquitylated histone reader involved in the response to DNA damage
Svidritskiy et al. Extensive ribosome and RF2 rearrangements during translation termination
Chen et al. Identification of the catalytic Mg2+ ion in the hepatitis delta virus ribozyme
CA2359889A1 (en) Protein modeling tools
Schmitt et al. Structure of crystalline Escherichia coli methionyl‐tRNA (f) Met formyltransferase: comparison with glycinamide ribonucleotide formyltransferase.
US20190002882A1 (en) Molecular robot
Zuo et al. Structural and functional insights into the bona fide catalytic state of Streptococcus pyogenes Cas9 HNH nuclease domain
KR20220025708A (en) Engineered CAS9 with extended DNA target range
Singh et al. Mitoribosome structure with cofactors and modifications reveals mechanism of ligand binding and interactions with L1 stalk
Kang et al. Molecular mechanism of D1135E-induced discriminated CRISPR-Cas9 PAM recognition
Singh et al. Structure of mitoribosome reveals mechanism of mRNA binding, tRNA interactions with L1 stalk, roles of cofactors and rRNA modifications
Berger et al. Distal mutations in the β-clamp of DNA polymerase III* disrupt DNA orientation and affect exonuclease activity
Ray et al. Molecular simulations have boosted knowledge of CRISPR/Cas9: a review
Waltmann et al. Kinetic growth of multicomponent microcompartment shells
Giegé et al. Aminoacyl-tRNA Synthetases
WO2024167765A1 (en) Cas9 variants enhancing specificity
Stevens et al. Examining the mechanism of phosphite dehydrogenase with quantum mechanical/molecular mechanical free energy simulations
Rallapalli Computational Approaches to Understand the Design of Adenine Base Editors

Legal Events

Date Code Title Description
AS Assignment

Owner name: UNIVERSITY OF NORTH TEXAS HEALTH SCIENCE CENTER, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIU, JIN;ZUO, ZHICHENG;REEL/FRAME:063751/0883

Effective date: 20171103

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION