[go: up one dir, main page]

WO2025021702A1 - Nucléase d'orthologue cas9 et ses utilisations - Google Patents

Nucléase d'orthologue cas9 et ses utilisations Download PDF

Info

Publication number
WO2025021702A1
WO2025021702A1 PCT/EP2024/070604 EP2024070604W WO2025021702A1 WO 2025021702 A1 WO2025021702 A1 WO 2025021702A1 EP 2024070604 W EP2024070604 W EP 2024070604W WO 2025021702 A1 WO2025021702 A1 WO 2025021702A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
cluster
identity
crispr
polynucleotide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/070604
Other languages
English (en)
Inventor
Antonio CASINI
Anna CERESETO
Nicola SEGATA
Matteo CICIANI
Laura PEZZÉ
Veronica PINAMONTI
Inês MACIEIRA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alia Therapeutics Srl
Original Assignee
Alia Therapeutics Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alia Therapeutics Srl filed Critical Alia Therapeutics Srl
Publication of WO2025021702A1 publication Critical patent/WO2025021702A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • C12N15/625DNA sequences coding for fusion proteins containing a sequence coding for a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • BACKGROUND [0003] The ability to make precise and deliberate changes at targeted genomic locations and/or modify (e.g., cleaving or editing) specific endogenous chromosomal sequences, thus altering the genomic complement of living cells has been a long-standing goal in biomedical research and development.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • Cas CRISPR-associated proteins
  • Cas9 functions as an RNA-guided endonuclease that uses a dual-guide RNA consisting of crRNA and trans-activating crRNA (tracrRNA) for target recognition and cleavage by a mechanism involving two nuclease active sites that together generate double-stranded DNA breaks (DSBs) (Barrangou and May, Expert Opin Biol Ther 15, 311-314 (2015); Jinek et al., Science 337, 816-821(2012)).
  • DSBs double-stranded DNA breaks
  • the Cas9 system derived from Streptococcus pyogenes (Barrangou and Doudna. Nat Biotechnol. 2016;34(9):933-941 (2016)), which leaves a blunt-end overhang and effects gene editing via the recognition of a Protospacer Adjacent Motif (PAM) sequence of “NGG” on the target polynucleotide.
  • PAM Protospacer Adjacent Motif
  • PAMs are short nucleotide sequences recognized by the CRISPR-Cas9 complex where this complex directs editing of the target sequence.
  • the precise PAM sequence and PAM length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-8 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5’ or 3’ to the target sequence.
  • this technology is still limited by major constraints, mainly related to the reduced number of CRISPR-Cas systems active in mammalian cells which hardly respond to the complexity of gene therapy applications.
  • Cas endonucleases or Cas polypeptides and their respective Protospacer Adjacent Motif (PAM) requirements, site-directed modifying polypeptides, and ribonucleoproteins comprising the polypeptides.
  • the site-directed modifying polypeptides are useful in a variety of methods for target nucleic acid modification. In certain embodiments, the site-directed modifying polypeptides are modified for passive entry into target cells. [0008]
  • the identification of the PAM sequences of Cas9 nucleases is important for their exploitation as genome editing tools.
  • polypeptides comprising: a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas orthologue nuclease having an amino acid sequence with at least 80% identity to any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • polypeptides comprising: a CRISPR-Cas orthologue nuclease having an amino acid sequence with at least 80% identity to any nuclease capable of complexing with a guide RNA as shown in Table 34.
  • nuclease comprises an amino acid sequence having at least 80% identity with any one of SEQ ID NO.: 1, SEQ ID NO.: 28, SEQ ID NO.: 55, SEQ ID NO.: 106, SEQ ID NO.: 129, SEQ ID NO.: 163, SEQ ID NO.: 217, SEQ ID NO.: 242, SEQ ID NO.: 254, SEQ ID NO.: 270, SEQ ID NO.: 297, SEQ ID NO.: 325, SEQ ID NO.: 341, SEQ ID NO.: 373, SEQ ID NO.: 391, SEQ ID NO.: 409, SEQ ID NO.: 426, SEQ ID NO.: 462, SEQ ID NO.: 481, SEQ ID NO.: 503, SEQ ID NO.: 523, SEQ ID NO.: 552, SEQ ID NO.: 593, SEQ ID NO.: 646, SEQ ID NO.: 686, SEQ ID NO.: 735, SEQ ID NO.: 833
  • the polypeptide is a fusion protein.
  • the polypeptide comprises a nuclear localization signal peptide, a tag peptide, or both fused to the CRISPR-Cas orthologue nuclease.
  • the polypeptide comprises two or more nuclear localization signals fused to the CRISPR-Cas orthologue nuclease.
  • the nuclear localization signal comprises an N-terminal nuclear localization signal, a C-terminal nuclear localization signal, or both.
  • the nuclear localization signal comprises an amino acid sequence having at least 90% identity with any one of SEQ ID NOs: 1800-1822.
  • the tag peptide comprises an amino acid sequence having at least 90% identity with SEQ ID NO: 1823 or SEQ ID NO: 1824.
  • the CRISPR-Cas orthologue nuclease has at least 90% identity with any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease has at least 95% identity with any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease is any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease is selected from any one of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease is SEQ ID NO: 76 or SEQ ID NO: 1387. [0018] In some embodiments, the CRISPR-Cas orthologue nuclease is SEQ ID NO: 362 or SEQ ID NO: 1384. [0019] In some embodiments, the CRISPR-Cas orthologue nuclease is SEQ ID NO: 444 or SEQ ID NO: 1385. [0020] In some embodiments, the CRISPR-Cas orthologue nuclease is SEQ ID NO: 721 or SEQ ID NO: 1383.
  • the CRISPR-Cas orthologue nuclease is SEQ ID NO: 768 or SEQ ID NO: 1386. [0022] In some embodiments, the CRISPR-Cas orthologue nuclease is capable of complexing with a guide RNA.
  • the guide RNA comprises: a crRNA comprising a spacer and a crRNA scaffold wherein the spacer is positioned 5’ to the crRNA scaffold; and a tracrRNA.
  • the guide RNA is a single guide RNA (sgRNA) comprising a spacer and a sgRNA scaffold, wherein the spacer is positioned 5’ to the sgRNA scaffold.
  • the guide RNA comprises a spacer.
  • the spacer comprises a nucleic acid sequence that is partially or fully complementary to a target mammalian genomic sequence.
  • the target mammalian genomic sequence is upstream of a Protospacer Adjacent Motif (PAM) sequence in the non-target strand recognized by the polypeptide.
  • PAM Protospacer Adjacent Motif
  • the PAM polynucleotide comprises a nucleic acid sequence as set forth in Table 32.
  • the guide RNA comprises separate crRNA and tracrRNA molecules.
  • the crRNA comprises, in 5’ to 3’ order, the spacer and a crRNA scaffold.
  • the guide RNA comprises, in 5’ to 3’ order, the crRNA molecule comprising at least 20 consecutive nucleotides of SEQ ID NO: 1362, and the tracrRNA molecule comprising at least 50 consecutive nucleotides of SEQ ID NO: 1367; the crRNA molecule comprising at least 20 consecutive nucleotides of SEQ ID NO: 1363, and the tracrRNA molecule comprising at least 50 consecutive nucleotides of SEQ ID NO: 1368; the crRNA molecule comprising at least 20 consecutive nucleotides of SEQ ID NO: 1364, and the tracrRNA molecule comprising at least 50 consecutive nucleotides of SEQ ID NO: 1369; the crRNA molecule comprising at least 20 consecutive nucleotides of
  • the guide RNA comprises , in 5’ to 3’ order, the crRNA molecule comprising at least 30 consecutive nucleotides of SEQ ID NO: 1362, and the tracrRNA molecule comprising at least 80 consecutive nucleotides of SEQ ID NO: 1367; the crRNA molecule comprising at least 30 consecutive nucleotides of SEQ ID NO: 1363, and the tracrRNA molecule comprising at least 80 consecutive nucleotides of SEQ ID NO: 1368;the crRNA molecule comprising at least 30 consecutive nucleotides of SEQ ID NO: 1364, and the tracrRNA molecule comprising at least 80 consecutive nucleotides of SEQ ID NO: 1369; the crRNA molecule comprising at least 30 consecutive nucleotides of SEQ ID NO: 1365, and the tracrRNA molecule comprising at least 80 consecutive nucleotides of SEQ ID NO: 1370; or the crRNA molecule comprising at least 30 consecutive nucleotides of SEQ ID NO: 1370; or
  • the crRNA molecule has at least 90% identity with SEQ ID NO: 1362, and the tracrRNA molecule has at least 90% identity with SEQ ID NO: 1367; the crRNA molecule has at least 90% identity with SEQ ID NO: 1363, and the tracrRNA molecule has at least 90% identity with SEQ ID NO: 1368; the crRNA molecule has at least 90% identity with SEQ ID NO: 1364, and the tracrRNA molecule has at least 90% identity with SEQ ID NO: 1369; the crRNA molecule has at least 90% identity with SEQ ID NO: 1365, and the tracrRNA molecule has at least 90% identity with SEQ ID NO: 1370; or the crRNA molecule has at least 90% identity with SEQ ID NO: 1366, and the tracrRNA molecule has at least 90% identity with SEQ ID NO: 1371.
  • the crRNA molecule has at least 95% identity with SEQ ID NO: 1362, and the tracrRNA molecule has at least 95% identity with SEQ ID NO: 1367; the crRNA molecule has at least 95% identity with SEQ ID NO: 1363, and the tracrRNA molecule has at least 95% identity with SEQ ID NO: 1368; the crRNA molecule has at least 95% identity with SEQ ID NO: 1364, and the tracrRNA molecule has at least 95% identity with SEQ ID NO: 1369; the crRNA molecule has at least 95% identity with SEQ ID NO: 1365, and the tracrRNA molecule has at least 95% identity with SEQ ID NO: 1370; or the crRNA molecule has at least 95% identity with SEQ ID NO: 1366, and the tracrRNA molecule has at least 95% identity with SEQ ID NO: 1371.
  • the guide RNA comprises a crRNA molecule comprising at least 20, at least 30, at least 40, at least 50, or at least 60 consecutive nucleotides of any one of SEQ ID NOs: 1388-1418 as shown in Table 34, and the crRNA is capable of complexing with a tracrRNA molecule comprising at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, or at least 80 consecutive nucleotides of any one of SEQ ID NOs: 1419-1449 as shown in Table 34.
  • the guide RNA comprises a crRNA molecule comprising a polynucleotide sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% identity with any one of SEQ ID NOs: 1388-1418 as shown in Table 34, and the crRNA is capable of complexing with a tracrRNA molecule comprising a polynucleotide sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% identity with any one of any one of SEQ ID NOs: 1419-1449 as shown in Table 34.
  • the guide RNA is a sgRNA comprising at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% identity with any one of SEQ ID NOs: 1450- 1483 and SEQ ID NO: 1825.
  • the guide RNA is a sgRNA.
  • the sgRNA comprises, in 5’ to 3’ order, a spacer and a sgRNA scaffold.
  • the sgRNA comprises at least one modified nucleotide or modified internucleoside linkage.
  • the sgRNA comprises at least one modified nucleotide selected from 2'-fluoro, 2'-amino or 2'-O-methyl modification on the ribose of pyrimidines, abasic residues, and an inverted base at the 3' end of the RNA.
  • the sgRNA comprises at least one modified internucleoside linkage selected from phosphorothioates, phosphotriesters, methyl phosphonates, short chain alkyl or cycloalkyl inter-sugar linkages, and short chain heteroatomic or heterocyclic inter-sugar linkages.
  • the sgRNA comprises at least 50 consecutive nucleotides of any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377. [0045] In some embodiments, the sgRNA comprises at least 80 consecutive nucleotides of any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the sgRNA has at least 90% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377. [0047] In some embodiments, the sgRNA has at least 95% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the guide RNA is a sgRNA comprising, in 5’ to 3’ order, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, or at least 80 consecutive nucleotides of any one of SEQ ID NOs: 1450-1483 and SEQ ID NO: 1825 as shown in Table 34.
  • systems comprising: a polypeptide comprising the CRISPR-Cas9 orthologue nuclease in accordance with any one of the embodiments disclosed herein, and a guide RNA in accordance with any one of the embodiments disclosed herein, wherein the CRISPR-Cas9 orthologue nuclease is capable of complexing with the guide RNA.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 76 or SEQ ID NO: 1387, and wherein the guide RNA comprises a crRNA polynucleotide comprising, in 5’ to 3’ order, at least 20 consecutive nucleotides of SEQ ID NO: 1366, and a tracrRNA polynucleotide comprising, in 5’ to 3’ order, at least 50 consecutive nucleotides of SEQ ID NO: 1371.
  • the crRNA polynucleotide has at least 90% identity with SEQ ID NO: 1366, and the tracrRNA polynucleotide has at least 90% identity with SEQ ID NO: 1371.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 76 or SEQ ID NO: 1387, and wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 50 consecutive nucleotides of SEQ ID NO: 1377.
  • the sgRNA comprises a polynucleotide having at least 90% identity with SEQ ID NO: 1377.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 362 or SEQ ID NO: 1384, and wherein the guide RNA comprises a crRNA polynucleotide comprising, in 5’ to 3’ order, at least 20 consecutive nucleotides of SEQ ID NO: 1363, and a tracrRNA polynucleotide comprising, in 5’ to 3’ order, at least 50 consecutive nucleotides of SEQ ID NO: 1368.
  • the crRNA polynucleotide has at least 90% identity with SEQ ID NO: 1363; and the tracrRNA polynucleotide has at least 90% identity to SEQ ID NO: 1368.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 362 or SEQ ID NO: 1384, and wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 50 consecutive nucleotides of SEQ ID NO: 1373.
  • the sgRNA comprises a polynucleotide having at least 90% identity to SEQ ID NO: 1373.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 444 or SEQ ID NO: 1385, and wherein the guide RNA comprises a crRNA polynucleotide comprising, in 5’ to 3’ order, at least 20 consecutive nucleotides of SEQ ID NO: 1364, and a tracrRNA polynucleotide comprising, in 5’ to 3’ order, at least 50 consecutive nucleotides of SEQ ID NO: 1369.
  • the crRNA polynucleotide has at least 90% identity with SEQ ID NO: 1364; and the tracrRNA polynucleotide has at least 90% identity with SEQ ID NO: 1369.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 444 or SEQ ID NO: 1385, and wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 50 consecutive nucleotides of SEQ ID NO: 1374.
  • the sgRNA comprises a polynucleotide having at least 90% identity to SEQ ID NO: 1374.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 721 or SEQ ID NO: 1383, and wherein the guide RNA comprises a crRNA polynucleotide comprising, in 5’ to 3’ order, at least 20 consecutive nucleotides of SEQ ID NO: 1362, and a tracrRNA polynucleotide comprising, in 5’ to 3’ order, at least 50 consecutive nucleotides of SEQ ID NO: 1367.
  • the crRNA polynucleotide has at least 90% identity with SEQ ID NO: 1362, and the tracrRNA polynucleotide has at least 90% identity with SEQ ID NO: 1367.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 721 or SEQ ID NO: 1383, and wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 50 consecutive nucleotides of SEQ ID NO: 1372.
  • the sgRNA comprises a polynucleotide having at least 90% identity with SEQ ID NO: 1372.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 768 or SEQ ID NO: 1386, and wherein the guide RNA comprises a crRNA polynucleotide comprising, in 5’ to 3’ order, at least 20 consecutive nucleotides of SEQ ID NO: 1365, and a tracrRNA polynucleotide comprising, in 5’ to 3’ order, at least 50 consecutive nucleotides of SEQ ID NO: 1370.
  • the crRNA polynucleotide has at least 90% identity with SEQ ID NO: 1365, and the tracrRNA polynucleotide has at least 90% identity with SEQ ID NO: 1370.
  • the CRISPR-Cas9 orthologue nuclease has at least 90% identity with SEQ ID NO: 765 or SEQ ID NO: 1386, and wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 50 consecutive nucleotides of SEQ ID NO: 1375 or SEQ ID NO: 1376.
  • the sgRNA comprises a polynucleotide having at least 90% identity with SEQ ID NO: 1375 or SEQ ID NO: 1376.
  • the polynucleotide has at least 80% identity with any one of SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1380, SEQ ID NO: 1381, and SEQ ID NO: 1382.
  • nucleic acids comprising a polynucleotide encoding the guide RNA in accordance with any of the embodiments, wherein the guide RNA comprises: a polynucleotide encoding the crRNA comprising at least 90% identity with SEQ ID NO: 1362, and a polynucleotide encoding the tracrRNA comprising at least 90% identity with SEQ ID NO: 1367; a polynucleotide encoding the crRNA comprising at least 90% identity with SEQ ID NO: 1363, and a polynucleotide encoding the tracrRNA comprising at least 90% identity with SEQ ID NO: 1368; a polynucleotide encoding the crRNA comprising at least 90% identity with SEQ ID NO: 1364, and a polynucleotide encoding the tracrRNA comprising at least 90% identity with SEQ ID NO: 1369; a polynucleotide encoding the crRNA comprising at least 90% identity with SEQ ID
  • nucleic acids comprising a polynucleotide encoding the guide RNA in accordance with any of the embodiments, wherein the guide RNA is a sgRNA comprising a polynucleotide having at least 90% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the nucleic acid comprising the polynucleotide encoding the polypeptide in accordance with any of the embodiments, or the nucleic acids encoding the crRNA in accordance with any of the embodiments; or the nucleic acids encoding the tracrRNA in accordance with any of the embodiments; or the nucleic acids encoding the sgRNA in accordance with any of the embodiments, wherein the nucleic acid is obtained by in vitro transcription.
  • the nucleic acid comprising the polynucleotide encoding the polypeptide in accordance with any of the embodiments, or the nucleic acids encoding the crRNA in accordance with any of the embodiments; or the nucleic acids encoding the tracrRNA in accordance with any of the embodiments; or the nucleic acids encoding the sgRNA in accordance with any of the embodiments, wherein the nucleic acid is operably linked to a promoter.
  • the nucleic acid comprising the polynucleotide encoding the polypeptide in accordance with any of the embodiments, or the nucleic acids encoding the crRNA in accordance with any of the embodiments; or the nucleic acids encoding the tracrRNA in accordance with any of the embodiments; or the nucleic acids encoding the sgRNA in accordance with any of the embodiments, wherein the nucleic acid is synthetic.
  • the nucleic acid comprising the polynucleotide encoding the polypeptide in accordance with any of the embodiments, or the nucleic acids encoding the crRNA in accordance with any of the embodiments; or the nucleic acids encoding the tracrRNA in accordance with any of the embodiments; or the nucleic acids encoding the sgRNA in accordance with any of the embodiments, wherein the nucleic acid encoding the polypeptide is human codon optimized, and the polynucleotide has a sequence of SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1380, SEQ ID NO: 1381, or SEQ ID NO: 1382.
  • the human codon optimized nucleic acid encodes a Cas orthologue polypeptide having a sequence of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, or SEQ ID NO: 1387 as shown in Table 33.
  • compositions comprising: the polypeptide in accordance with any of the embodiments, or the polypeptide in the system in accordance with any of the embodiments; and/or the guide RNA in accordance with any of the embodiments, wherein the guide RNA comprises separate crRNA and tracrRNA, or wherein the guide RNA comprises a sgRNA; or the system in accordance with any of the embodiments; or the nucleic acid in accordance with any of the embodiments; or the expression vector in accordance with any of the embodiments; and a pharmaceutically acceptable excipient.
  • Described are methods of site-specifically cleaving a DNA target in a cell comprising contacting the cell with: the polypeptide in accordance with any of the embodiments or the polypeptide in the system in accordance with any of the embodiments, and the guide RNA in the system in accordance with any of the embodiments; or the system in accordance with any of the embodiments; or the nucleic acid in accordance with any of the embodiments; the expression vector in accordance with any of the embodiments; and/or the pharmaceutical composition in accordance with any of the embodiments.
  • FIG.1 shows a schematic of the pipeline used to generate PAM predictions.
  • FIG.2 illustrates an exemplary PAM prediction method.
  • FIGs.3A-3E illustrate exemplary predicted PAM logos or motifs as described in Tables 1A (FIG.3A), Table 1B (FIG.3B), Table 1C (FIG.3C), Table 1D (FIG.3D), and Table 1E (FIG.3E).
  • FIGs.4A-4C illustrate exemplary predicted PAM logos or motifs as described in Tables 2A (FIG.4A), Table 2B (FIG.4B), and Table 2C (FIG.4C).
  • FIGs.5A-5B illustrate exemplary predicted PAM logos or motifs as described in Table 3A (FIG.5A), and Table 3B (FIG.5B).
  • FIGs.6A-6D illustrate exemplary predicted PAM logos or motifs as described in Tables 4A (FIG.6A), Table 4B (FIG.6B), Table 4C (FIG.6C), and Table 4D (FIG.6D).
  • FIGs.7A-7C illustrate exemplary predicted PAM logos or motifs as described in Tables 5A (FIG.7A), Table 5B (FIG.7B), and Table 5C (FIG.7C).
  • FIGs.8A-8B illustrate exemplary predicted PAM logos or motifs as described in Tables 6A (FIG.8A), and Table 6B (FIG.8B).
  • FIGs.9A-9D illustrate exemplary predicted PAM logos or motifs as described in Tables 7A (FIG.9A), Table 7B (FIG.9B), Table 7C (FIG.9C), and Table 7D (FIG.9D).
  • FIG.10 illustrates exemplary predicted PAM logos or motifs as described in Tables 8.
  • FIGs.11A-11B illustrate exemplary predicted PAM logos or motifs as described in Tables 9A (FIG.11A), and Table 9B (FIG.11B).
  • FIGs.12A-12D illustrate exemplary predicted PAM logos or motifs as described in Tables 10A (FIG.12A), Table 10B (FIG.12B), Table 10C (FIG.12C), and Table 10D (FIG. 12D).
  • FIG.13 illustrates exemplary predicted PAM logos or motifs as described in Tables 11.
  • FIGs.14A-14B illustrate exemplary predicted PAM logos or motifs as described in Tables 12A (FIG.14A), and Table 12B (FIG.14B).
  • FIG.15 illustrates exemplary predicted PAM logos or motifs as described in Tables 13.
  • FIG.16 illustrates exemplary predicted PAM logos or motifs as described in Tables 14.
  • FIG.17 illustrates exemplary predicted PAM logos or motifs as described in Table 15.
  • FIGs.18A-18B illustrate exemplary predicted PAM logos or motifs as described in Tables 16A (FIG.18A), and Table 16B (FIG.18B).
  • FIG.19 illustrates exemplary predicted PAM logos or motifs as described in Tables 17.
  • FIG.20 illustrates exemplary predicted PAM logos or motifs as described in Tables 18.
  • FIG.21 illustrates exemplary predicted PAM logos or motifs as described in Tables 19.
  • FIGs.22A-22B illustrate exemplary predicted PAM logos or motifs as described in Tables 20A (FIG.22A), and Table 20B (FIG.22B).
  • FIGs.23A-23B illustrate exemplary predicted PAM logos or motifs as described in Tables 21A (FIG.23A), and Table 21B (FIG.23B).
  • FIGs.24A-24B illustrate exemplary predicted PAM logos or motifs as described in Tables 22.
  • FIGs.25A-25C show predicted structures of sgRNAs designed for FRDB, DNDB, and GMIE CRISPR-Cas polypeptides. The capitalized four-letter identifier assigned to each CRISPR-Cas polypeptide is arbitrary and unique. Shown are schematic representation of the hairpin structure designed from crRNAs and tracrRNAs identified for FRDB CRISPR-Cas polypeptide protein (FIG.25A), DNDB CRISPR-Cas polypeptide protein (FIG.25B), and GMIE CRISPR-Cas polypeptide protein (FIG.25C).
  • FIGs.25A-25C disclose sgRNA scaffolds, corresponding to SEQ ID NOS 1372-1374, respectively, in order of appearance.
  • the sgRNAs are capable of complexing with FRDB, DNDB, and GMIE CRISPR-Cas polypeptides, respectively.
  • FIGs.26A-26B show predicted structures of sgRNAs designed for ARHP and DUUZ CRISPR-Cas polypeptides.
  • FIG.26A Shown are schematic representation of the hairpin structure designed from crRNAs and tracrRNAs identified for ARHP CRISPR-Cas polypeptide protein (FIG.26A), DUUZ CRISPR-Cas polypeptide protein (FIG.26B). These sgRNAs have been trimmed by the insertion of a GAAA tetraloop at the level of the repeat:anti-repeat loop to reduce their size.
  • FIGs.26A-26B disclose sgRNA scaffolds, corresponding to SEQ ID NOS 1375 and 1377, respectively, in order of appearance. The sgRNAs are capable of complexing with ARHP and DUUZ, respectively.
  • FIGs.27A-27F illustrate experimentally determined PAM specificities of the FRDB, DNDB and GMIE CRISPR-Cas polypeptides.
  • FIGs.27A, 27C and 27E show PAM sequence logos for FRDB CRISPR-Cas polypeptide (FIG.27A), DNDB CRISPR-Cas polypeptide (FIG. 27C), and GMIE CRISPR-Cas polypeptide (FIG.27E) as determined using an in vitro PAM discovery assay.
  • FIGs.27B, 27D and 27F show PAM enrichment heatmaps calculated from the same in vitro PAM discovery assay showing the nucleotide preferences at different positions along the PAM for FRDB CRISPR-Cas polypeptide (FIG.27B, positions 5,6 and 7,8), DNDB CRISPR-Cas polypeptide (FIG.27D, positions 3,4 and 5,6), GMIE CRISPR-Cas polypeptide (FIG.27F, positions 5,6 and 7,8).
  • FIGs.28A-28D illustrate experimentally determined PAM specificities for ARHP CRISPR-Cas polypeptide protein and DUUZ CRISPR-Cas polypeptide protein.
  • FIGs.28A and 28C show PAM sequence logos for ARHP CRISPR-Cas polypeptide (FIG.28A) and DUUZ CRISPR-Cas polypeptide (FIG.28C) as determined using the in vitro PAM discovery assay.
  • FIGs.28B and 4D show PAM enrichment heatmaps calculated from the same in vitro PAM discovery assay showing the nucleotide preferences at different positions along the PAM for ARHP CRISPR-Cas polypeptide (FIG.28B, positions 2,3 and 4,5) and DUUZ CRISPR-Cas polypeptide (FIG.28D, positions 4,5 and 6,7).
  • FIG.29 shows activity of CRISPR-Cas polypeptide proteins against an EGFP reporter in mammalian cells. Data presented as mean ⁇ SEM of n ⁇ 3 biologically independent experiments. 6. DETAILED DESCRIPTION [0117] The present disclosure addresses the need for systems and methods for editing (e.g., cleaving, deleting, inserting, substituting) polynucleotides (e.g., DNA) using novel Cas9 orthologue nucleases. 6.1 Definitions [0118] It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment.
  • a CRISPR-Cas protein cluster includes a combination of two CRISPR-Cas protein clusters, a combination of three CRISPR-Cas protein clusters, and the like.
  • an “or” conjunction is intended to be used in its correct sense as a Boolean logical operator, encompassing both the selection of features in the alternative (A or B, where the selection of A is mutually exclusive from B) and the selection of features in conjunction (A or B, where both A and B are selected).
  • the term “and/or” is used for the same purpose, which shall not be construed to imply that “or” is used with reference to mutually exclusive alternatives.
  • polynucleotide and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • the polynucleotide is DNA.
  • the polynucleotide is RNA.
  • polynucleotide and “nucleic acid” should be understood to include, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.
  • peptide polypeptide
  • protein protein
  • Cas9 orthologue nuclease “Cas9 orthologous nuclease,” “Cas9 nuclease,” “Cas9 orthologue protein,” “Cas9 polypeptide,” “CRISPR-Cas9 polypeptide,” “CRISPR-Cas9 orthologue nuclease,” and “CRISPR-Cas enzymes” are used interchangeably herein, and they refer to a protein, polypeptide, or peptide that is capable of interacting with a nucleic acid, such as a guide nucleic acid, to form a complex (as used herein, “capable of complexing”; i.e.
  • the Cas9 orthologue nuclease pertains to wild-type Cas9 orthologue nuclease or Cas9 orthologue nuclease variants.
  • the Cas9 orthologue nuclease can have nuclease activity or be catalytically inactive (e.g., as in a dCas).
  • RNA comprises a sequence of nucleotides that enables it to non- covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength.
  • standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA].
  • A adenine
  • U uracil
  • G guanine
  • C cytosine
  • G/U base-pairing is partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA.
  • a guanine (G) of a protein-binding segment (dsRNA duplex) of a subject DNA-targeting RNA molecule is considered complementary to a uracil (U), and vice versa.
  • G guanine
  • U uracil
  • hybridize or “complementary” refer to a first nucleotide sequence capable of forming non-covalently bind (hydrogen bond) with at least a portion of a specified second nucleotide sequence.
  • sequence of polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable.
  • a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure).
  • a polynucleotide can comprise at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which they are targeted.
  • an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity.
  • the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides.
  • Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST programs (basic local alignment search tools) and PowerBLAST programs known in the art (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
  • Binding refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). While in a state of non-covalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner).
  • Binding interactions are generally characterized by a dissociation constant (Kd) of less than 10 ⁇ 6 M, less than 10 ⁇ 7 M, less than 10 ⁇ 8 M, less than 10 ⁇ 9 M, less than 10 ⁇ 10 M, less than 10 ⁇ 11 M, less than 10 ⁇ 12 M, less than 10 ⁇ 13 M, less than 10 ⁇ 14 M, or less than 10 ⁇ 15 M.
  • Kd dissociation constant
  • Affinity refers to the strength of binding, increased binding affinity being correlated with a lower Kd.
  • binding domain it is meant a protein domain that is able to bind non-covalently to another molecule.
  • a binding domain can bind to, for example, a DNA molecule (a DNA-binding protein), an RNA molecule (an RNA-binding protein) and/or a protein molecule (a protein- binding protein).
  • a protein domain-binding protein it can bind to itself (to form homodimers, homotrimers, etc.) and/or it can bind to one or more molecules of a different protein or proteins.
  • CRISPR-Cas protein cluster refers to a set of CRISPR-Cas protein sequences grouped (or clustered) according to sequence identity level (e.g., a user-defined sequence identity level). Algorithms for clustering protein sequences are known in the art and include, for example, UCLUST, e.g., version 11.0.667, (Edgar, 2010, Bioinformatics 26:2460–2461) and CD-HIT (Weizhong & Godzik, 2006, Bioinformatics 22(13):1658-1659).
  • Some clustering algorithms such as UCLUST, define a cluster by a sequence known as the centroid or representative sequence, and each sequence in the cluster has a percent identity to the centroid sequence above a user-defined percent identity threshold.
  • the user-defined percent identity threshold can be set at 90%.
  • a CRISPR-Cas locus refers to a prokaryotic genomic region having (i) a sequence encoding CRISPR-Cas proteins and (ii) a CRISPR array having spacer sequences, which are short sequences that originate from and correspond to viral DNA sequences called protospacers.
  • CRISPR arrays are composed of repeat sequences flanking spacer sequences.
  • a CRISPR array can have at least one spacer, each flanked by two repeats.
  • CRISPR arrays are typically located in the vicinity of cas genes but are sometimes located distant to cas genes (Butiuc-Keul et al., 2022, Microb Physiol 32:2-17; Shmakov et al., 2020, CRISPR J.3(6):535-549.
  • Exemplary tools for identifying and annotating CRISPR-Cas loci and CRISPR arrays include CRISPRCasTyper (Russel et al., 2020, CRISPR J.3:462–469) and CRISPRDetect (Biswas et al., 2016, BMC Genomics 17:356).
  • a CRISPR array “corresponds to” a CRISPR-Cas protein when the sequence encoding the CRISPR-Cas protein and the CRISPR array are from the same genome and are found in close proximity.
  • PAM Protospacer Adjacent Motif
  • the PAM may refer to a DNA sequence downstream (e.g., in the case of Type II Cas proteins such as Cas9) of a target sequence recognized by a CRISPR-Cas protein.
  • the PAM may refer to a DNA sequence upstream (e.g., in the case of Type V Cas proteins such as Cas12a) of the target sequence.
  • the sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used.
  • the PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
  • “Putative protospacer” refers to a nucleotide sequence in a viral genome that aligns with a spacer sequence with no or only a small number of mismatches or gaps.
  • a nucleotide sequence in a viral genome is identified as a putative protospacer if the spacer aligns with the nucleotide sequence in the viral genome with no more than four, three, two, one, or zero nucleotide mismatches or gaps.
  • the terms “guide nucleic acid” or “guide RNA” or “guide RNA molecule” or “single guide RNA” or “sgRNA” or “guide polynucleotide” or “gRNA” are used interchangeably and refer to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a genomic target locus, and 2) a scaffold sequence capable of interacting or complexing with a nucleic acid- guided nuclease.
  • gRNAs of the disclosure are in some embodiments single guide RNAs (sgRNAs), which typically comprise a crRNA sequence fused to a tracrRNA sequence.
  • sgRNAs single guide RNAs
  • the “guide sequence” or “spacer” or “spacer sequence” refers to the polynucleotide on a gRNA that is capable of hybridizing to a genomic target locus (e.g., a target polynucleotide, a protospacer sequence).
  • the guide RNA is a sgRNA.
  • the guide RNA comprises separate crRNA and tracRNA.
  • trans-activating CRISPR (cr) RNA or “trans-activating crRNA (tracrRNA)” are used interchangeably and refer to a distinct RNA species that interacts with the CRISPR RNA (crRNA) to form the dual guide RNA (gRNA) in type II and subtype V-B CRISPR-Cas systems.
  • the tracrRNA-crRNA interaction is essential for pre-crRNA processing as well as target recognition and cleavage.
  • target genomic DNA sequence or “target DNA” or “target sequence”, or “genomic target locus” or “target site” are used interchangeably and refer to any locus in vitro or in vivo, or in a nucleic acid (e.g., genome) of a cell or population of cells, in which a change of at least one nucleotide is desired using a nucleic acid-guided nuclease editing system.
  • the target sequence can be a genomic locus or extrachromosomal locus.
  • a target sequence may be referred to “protospacer DNA” or “protospacer sequence” or “protospacer polynucleotides” presented on any locus in a genomic polynucleotide (e.g., genomic locus) of a cell or population of cells.
  • site-directed modifying polypeptide or “RNA-binding site-directed polypeptide” or “RNA-binding site-directed modifying polypeptide” or “site-directed polypeptide” it is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence.
  • a site-directed modifying polypeptide as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound.
  • RNA molecule comprises a sequence that is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).
  • cleavage or “cleaving” or “cleave” it is meant the breakage of the covalent backbone of a DNA molecule. Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events.
  • DNA cleavage can result in the production of either blunt ends or staggered ends.
  • a complex comprising a DNA- targeting RNA and a site-directed modifying polypeptide is used for targeted double-stranded DNA cleavage.
  • Nuclease and “endonuclease” are used interchangeably herein to mean an enzyme which possesses catalytic activity for DNA cleavage.
  • cleavage domain or “active domain” or “nuclease domain” of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for DNA cleavage.
  • a cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides.
  • a single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
  • “Recombinant,” as used herein, means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
  • DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
  • Such sequences can be provided in the form of an open reading frame uninterrupted by internal non-translated sequences, or introns, which are typically present in eukaryotic genes.
  • Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit.
  • sequences of non-translated DNA may be present 5′ or 3′ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”, below).
  • the term “recombinant” polynucleotide or “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • polypeptide refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequence through human intervention.
  • a polypeptide that comprises a heterologous amino acid sequence is recombinant.
  • DNA regulatory sequences refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
  • transcriptional and translational control sequences such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
  • transformation is used interchangeably herein with “genetic modification” and refers to a permanent or transient genetic change induced in a cell following introduction of new nucleic acid (i.e., DNA exogenous to the cell).
  • Genetic change can be accomplished either by incorporation of the new DNA into the genome of the host cell, or by transient or stable maintenance of the new DNA as an episomal element.
  • a permanent genetic change is generally achieved by introduction of the DNA into the genome of the cell.
  • permanent changes can be introduced into the chromosome or via extrachromosomal elements such as plasmids and expression vectors, which may contain one or more selectable markers to aid in their maintenance in the recombinant host cell.
  • “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner.
  • a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
  • heterologous promoter and “heterologous control regions” refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature.
  • a “transcriptional control region heterologous to a coding region” is a transcriptional control region that is not normally associated with the coding region in nature.
  • a “promoter” or “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a polynucleotide or polypeptide coding sequence such as messenger RNA, ribosomal RNA, small nuclear or nucleolar RNA, guide RNA, or any kind of RNA transcribed by any class of any RNA polymerase I, II or III. Promoters may be constitutive or inducible and, in some embodiments—particularly many embodiments in which selection is employed—the transcription of at least one component of the nucleic acid-guided nuclease editing system is under the control of an inducible promoter.
  • Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res.2003 Sep.1; 31(17)), a human H1 promoter (H1), and the like.
  • LTR mouse mammary tumor virus long terminal repeat
  • Ad MLP adenovirus major late promoter
  • HSV herpes simplex virus
  • CMV cytomegalovirus
  • CMVIE C
  • a “host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell, or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid (e.g., an expression vector that comprises a nucleotide sequence encoding one or more biosynthetic pathway gene products such as mevalonate pathway gene products), and include the progeny of the original cell which has been genetically modified by the nucleic acid.
  • a nucleic acid e.g., an expression vector that comprises a nucleotide sequence encoding one or more biosynthetic pathway gene products such as mevalonate pathway gene products
  • a “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector.
  • a subject prokaryotic host cell is a genetically modified prokaryotic host cell (e.g., a bacterium), by virtue of introduction into a suitable prokaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to (not normally found in nature in) the prokaryotic host cell, or a recombinant nucleic acid that is not normally found in the prokaryotic host cell; and a subject eukaryotic host cell is a genetically modified eukaryotic host cell, by virtue of introduction into a suitable eukaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell, or a recombinant nucleic acid that is not normally found in the eukaryotic host cell.
  • a suitable prokaryotic host cell e.g., a bacterium
  • a host cell refers to a mammalian cell, including but not limited to a human cell, mouse cell, rat cell, rabbit cell, dog cell, or non-human primate cell; bacterial cell, yeast cell, or plant cell.
  • the invention does not comprise a process for modifying the germ line genetic identity of human beings, and/or the use of human embryo for industrial or commercial purposes.
  • conservative amino acid substitution refers to the interchangeability in proteins of amino acid residues having similar side chains.
  • a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consists of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains consists of cysteine and methionine.
  • Exemplary conservative amino acid substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine. Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferable. Conservative deletions, insertions, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, insertion, or combination thereof can be evaluated by routine screening assays.
  • nucleic acids or polypeptide sequences refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same as measured using a sequence comparison algorithm. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.
  • HSPs high scoring sequence pairs
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • FASTA Another alignment algorithm is FASTA, available in the Genetics Computing Group (GCG) package, from Madison, Wis., USA, a wholly owned subsidiary of Oxford Molecular Group, Inc.
  • GCG Genetics Computing Group
  • Other techniques for alignment are described in Methods in Enzymology, vol.266: Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, Calif., USA.
  • alignment programs that permit gaps in the sequence.
  • the Smith-Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol.70: 173-187 (1997).
  • the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences.
  • a “vector” or “expression vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to and/or expressed in a cell.
  • Vectors are typically composed of DNA, although RNA vectors are also available.
  • Vectors include, but are not limited to, plasmids, fosmids, phagemids, virus genomes, synthetic chromosomes, and the like.
  • the “vector” comprises a coding sequence for a nuclease to be used in the nucleic acid- guided nuclease systems and methods of the present disclosure.
  • the vector may also comprise, in a bacterial system, the X Red recombineering system or an equivalent thereto.
  • Vectors also typically comprise a selectable marker.
  • the “vector” may comprise a donor nucleic acid, optionally including an alteration to the target sequence that prevents nuclease binding at a PAM or spacer in the target sequence after editing has taken place, and a coding sequence for a gRNA.
  • the vector may also comprise a selectable marker and/or a barcode.
  • the vector may comprise control sequences operably linked to, e.g., the nuclease coding sequence, recombineering system coding sequences (if present), donor nucleic acid, guide nucleic acid, and selectable marker(s).
  • control sequences operably linked to e.g., the nuclease coding sequence, recombineering system coding sequences (if present), donor nucleic acid, guide nucleic acid, and selectable marker(s).
  • 6.2 Compositions for Editing in Nucleic Acid-Guided Nuclease Genome Systems Described herein are compositions for editing polynucleotides (e.g., DNA) in a cell or genome using nucleic acid-guided nuclease systems.
  • the nucleic acid- guided nuclease systems utilize an RNA-guided nuclease.
  • the composition comprises a nuclease system comprising the components: a polypeptide or an amino acid sequence having at least about 90% identity to any one of the amino acid sequences set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99),
  • the nucleic acid-guided nuclease systems utilize an RNA-guided nuclease.
  • the composition comprises a nuclease system comprising the components: a polypeptide or an amino acid sequence having at least about 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to any one of the amino acid sequences set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A- 6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Table
  • the polypeptide or amino acid sequence of the composition is capable of recognizing or hybridizing to at least one protospacer adjacent motif (PAM) polynucleotide sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A- 6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22
  • PAM
  • the polypeptide or amino acid sequence of the composition and the corresponding PAM polynucleotide nucleic acid sequence are as shown in in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • the composition further comprises a guide polynucleotide such as a guide RNA (gRNA).
  • gRNA guide RNA
  • nucleic acid-guided nuclease systems are CRISPR-Cas (CRISPR associated) systems.
  • the nuclease is a Cas9 orthologue endonuclease.
  • the nucleic acid-guided nuclease systems are capable of editing such as cleaving a nucleic acid molecule (e.g., double stranded or single stranded), inserting or deleting or substituting one or more polynucleotides to a target polynucleotide sequence.
  • a “functional fragment” in relation to a nucleic acid sequence or amino acid sequence refers to a partial or truncated sequence relative to a full-length reference sequence, which is derived from and/or shares high sequence identity (e.g., at least 80%, 85%, 90%, 95%, 97%, or 99% sequence identity) with the full-length reference sequence and retains the same functionality or activity (e.g., a functional fragment of a Cas endonuclease sequence may encompass a polypeptide comprising a truncated form of that reference sequence and retaining endonuclease activity).
  • sequence identity e.g., at least 80%, 85%, 90%, 95%, 97%, or 99% sequence identity
  • a functional fragment of a Cas endonuclease sequence may encompass a polypeptide comprising a truncated form of that reference sequence and retaining endonuclease activity.
  • the skilled person will be able to readily determine the structure and activity of
  • the composition comprises a polypeptide comprising: a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas orthologue nuclease having an amino acid sequence with at least 70%, at least 75%, at least 80%, at least85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • the CRISPR-Cas orthologue nuclease has an amino acid sequence of any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387. In some embodiments, the CRISPR-Cas orthologue nuclease has an amino acid sequence of any one of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease is capable of complexing with a sgRNA comprising a polynucleotide sequence having at least 80% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the CRISPR-Cas orthologue nuclease is capable of complexing with a crRNA comprising a polynucleotide sequence having at least 80% identity with any one of SEQ ID NO: 1362, SEQ ID NO: 1363, SEQ ID NO: 1364, SEQ ID NO: 1365, and SEQ ID NO: 1366
  • the crRNA is capable of complexing with a tracrRNA comprising a polynucleotide sequence having at least 80% identity with any one of SEQ ID NO: 1367, SEQ ID NO: 1368, SEQ ID NO: 1369, SEQ ID NO: 1370, and SEQ ID NO: 1371.
  • the composition comprises a polypeptide comprising: a CRISPR-Cas orthologue nuclease having an amino acid sequence with at least 80% identity to any nuclease capable of complexing with a guide RNA as shown in Table 34.
  • nuclease comprises an amino acid sequence having at least 80% identity with any one of SEQ ID NO.: 1, SEQ ID NO.: 28, SEQ ID NO.: 55, SEQ ID NO.: 106, SEQ ID NO.: 129, SEQ ID NO.: 163, SEQ ID NO.: 217, SEQ ID NO.: 242, SEQ ID NO.: 254, SEQ ID NO.: 270, SEQ ID NO.: 297, SEQ ID NO.: 325, SEQ ID NO.: 341, SEQ ID NO.: 373, SEQ ID NO.: 391, SEQ ID NO.: 409, SEQ ID NO.: 426, SEQ ID NO.: 462, SEQ ID NO.: 481, SEQ ID NO.: 503, SEQ ID NO.: 523, SEQ ID NO.: 552, SEQ ID NO.: 593, SEQ ID NO.: 646, SEQ ID NO.: 686, SEQ ID NO.: 735, SEQ ID NO.: 833
  • the CRISPR-Cas orthologue nuclease is capable of complexing with a sgRNA having a polypeptide sequence having at least 80% identity with any one of SEQ ID NOs: 1450-1483 and SEQ ID NO: 1825 as shown in Table 34.
  • the CRISPR-Cas orthologue nuclease is capable of complexing with a guide RNA comprising separate cRNA and tracrRNA
  • the crRNA comprises a nucleic acid sequence having at least 80% identity with SEQ ID NOs: 1388-1418 as shown in Table 34
  • the crRNA is capable of complexing with a tracrRNA comprising a nucleic acid sequence having at least 80% identity with any one of SEQ ID NOs: 1419-1449 as shown in Table 34.
  • the CRISPR-Cas orthologue nuclease is a fusion protein, in which the CRISPR-Cas orthologue nuclease protein sequence is fused to one more amino acid sequences, such as one or more nuclear localization signal peptides, one or more non-native tags, or both.
  • a nuclear localization sequence refers to an amino acid sequence that tags the polypeptide for importation into the cell nucleus by nuclear transport.
  • the nuclear localization signal comprises an amino acid sequence having at least 90% identity with any one of SEQ ID NOs: 1800-1822.
  • the tag comprises an amino acid sequence having at least 90% identity with SEQ ID NO: 1823 or SEQ ID NO: 1824.
  • the composition comprises a guide RNA capable of complexing with a polypeptide (e.g., a CRISPR-Cas nuclease), the guide RNA comprises a crRNA capable of complexing with a tracrRNA as shown in Tables 29 and Table 30, or Table 34; or a sgRNA as shown in Table 31 or Table 34.
  • the composition comprises a system, the system comprising: a polypeptide comprising a CRISPR-Cas9 orthologue nuclease capable of complexing with a guide RNA, such as a crRNA capable of complexing with a tracrRNA as shown in Table 29 and Table 30, or Table 34; or a sgRNA as shown in Table 31 or Table 34.
  • a guide RNA such as a crRNA capable of complexing with a tracrRNA as shown in Table 29 and Table 30, or Table 34
  • a sgRNA as shown in Table 31 or Table 34.
  • the composition comprises a nucleic acid comprising a polynucleotide encoding a polypeptide capable of complexing with a guide RNA, optionally, the polypeptide is a Cas orthologue polypeptide.
  • the polynucleotide encoding the Cas orthologue polypeptide has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1380, SEQ ID NO: 1381, and SEQ ID NO: 1382.
  • the polynucleotide encoding the Cas orthologue polypeptide is human codon optimized and the polypeptide has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the composition comprises a nucleic acid comprising a polynucleotide encoding a guide RNA.
  • the nucleic acid comprises a polynucleotide encoding a sgRNA having a sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377; or any one of SEQ ID Nos: 1388-1418 as shown in Table 34, and the crRNA is capable of complexing with a tracrRNA comprising a polynucleotide having a sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1419- 1449 as shown in Table 34
  • the nucleic acid comprises a polynucleotide encoding a CRISPR-Cas orthologue nuclease, a sgRNA, a crRNA, or a tracrRNA can be obtained by in vitro transcription.
  • the nucleic acid is operably linked to a promoter.
  • the nucleic acid is a natural molecule.
  • the nucleic acid comprises the polynucleotide encoding a CRISPR-Cas orthologue nuclease, a sgRNA, a crRNA, or a tracrRNA that is artificially synthesized.
  • the composition comprises one or more expression vectors comprising nucleic acids encoding a CRISPR-Cas orthologue nuclease, a guide RNA (e.g., a sgRNA, a crRNA, or a tracrRNA) as described herein.
  • the CRISPR-Cas orthologue nuclease and guide RNA are expressed in a single expression vector.
  • the CRISPR-Cas orthologue nuclease and guide RNA are expressed in separate vectors.
  • the composition comprises a cell comprising the CRISPR-Cas orthologue nuclease system and the guide RNA as disclosed herein.
  • the cell is a prokaryotic cell or eukaryotic cell.
  • the prokaryotic cell is a bacterial cell.
  • the eukaryotic cell is a mammalian cell.
  • the cell is a human cell.
  • the composition comprises a pharmaceutical composition comprising the CRISPR-Cas orthologue nuclease system and the guide RNA, and a pharmaceutically acceptable excipient as disclosed herein.
  • RNA-guided nucleases have rapidly become the foundational tools for genome engineering of prokaryotes and eukaryotes.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • MGEs mobile genetic elements
  • RGNs are a major part of this defense system because they identify and destroy MGEs.
  • RGNs can be repurposed for genome editing in various organisms by reprogramming the CRISPR RNA (crRNA) that guides the RGN to a specific target DNA.
  • crRNA CRISPR RNA
  • a number of different RGNs have been identified to date for various applications; however, there are various properties that make some RGNs more desirable than others for specific applications.
  • RGNs can be used for creating specific double strand breaks (DSBs), specific nicks of one strand of DNA, or guide another moiety to a specific DNA sequence.
  • DSBs double strand breaks
  • RGN radionuclear nucleic acid
  • PAM protospacer adjacent motif
  • Type V RGNs such as MAD7, AsCas12a and LbCas12a tend to access DNA targets that contain YTTN/TTTN on the 5′ end whereas type II RGNs target DNA sequences containing a specific short motif on the 3′ end.
  • An example well known in the art for a type II RGN is SpCas9 which requires an NGG on the 3′ end of the target DNA.
  • Type II RGNs have substantially different domain architecture relative to type V RGNs.
  • type II RGNs also require a transactivating RNA (tracrRNA) in addition to a crRNA for optimal function.
  • tracrRNA transactivating RNA
  • the type II RGNs create a double-strand break closer to the PAM sequence, which is highly desirable for precise genome editing applications.
  • PAMs a number of type II RGNs have been discovered so far; however, their use in widespread applications is limited by restrictive PAMs. For example, the PAM of SpCas9 occurs less frequently in AT-rich regions of the genome. New RGNs with new and less restrictive PAMs are beneficial for the field. Further, not all type II nucleases are active in multiple organisms.
  • compositions, formulations and medicaments comprising a CRISPR-Cas orthologue nuclease, gRNA (e.g., sgRNA, crRNA and tracrRNA complex), nucleic acid or plurality of nucleic acids, system, particles, cells, or plurality of particles of the disclosure together with a pharmaceutically acceptable excipient.
  • gRNA e.g., sgRNA, crRNA and tracrRNA complex
  • Suitable excipients include, but are not limited to, salts, diluents, (e.g., Tris-HCI, acetate, phosphate), preservatives e.g., Thimerosal, benzyl alcohol, parabens), binders, fillers, solubilizers, disintegrants, sorbents, solvents, pH modifying agents, antioxidants, antinfective agents, suspending agents, wetting agents, viscosity modifiers, tonicity agents, stabilizing agents, and other components and combinations thereof.
  • Suitable pharmaceutically acceptable excipients can be selected from materials which are generally recognized as safe (GRAS), and may be administered to an individual without causing undesirable biological side effects or unwanted interactions.
  • compositions can be complexed with polyethylene glycol (PEG), metal ions, or incorporated into polymeric compounds such as polyacetic acid, polyglycolic acid, hydrogels, etc., or incorporated into liposomes, microemulsions, micelles, unilamellar or multilamellar vesicles, erythrocyte ghosts or spheroblasts.
  • PEG polyethylene glycol
  • metal ions or incorporated into polymeric compounds such as polyacetic acid, polyglycolic acid, hydrogels, etc.
  • liposomes such as polyacetic acid, polyglycolic acid, hydrogels, etc.
  • Suitable dosage forms for administration include solutions, suspensions, and emulsions.
  • compositions can be dissolved or suspended in a suitable solvent such as, for example, water, Ringer's solution, phosphate buffered saline (PBS), or isotonic sodium chloride.
  • a suitable solvent such as, for example, water, Ringer's solution, phosphate buffered saline (PBS), or isotonic sodium chloride.
  • the formulation may also be a sterile solution, suspension, or emulsion in a nontoxic, parenterally acceptable diluent or solvent such as 1 ,3-butanediol.
  • compositions can include one or more tonicity agents to adjust the isotonic range of the formulation. Suitable tonicity agents are well known in the art and include glycerin, mannitol, sorbitol, sodium chloride, and other electrolytes.
  • the formulations can be buffered with an effective amount of buffer necessary to maintain a pH suitable for parenteral administration.
  • Suitable buffers are well known by those skilled in the art and some examples of useful buffers are acetate, borate, carbonate, citrate, and phosphate buffers.
  • the compositions can be distributed or packaged in a liquid form, or alternatively, as a solid, obtained, for example by lyophilization of a suitable liquid formulation, which can be reconstituted with an appropriate carrier or diluent prior to administration.
  • the compositions can comprise a guide RNA and a CRISPR-Cas orthologue nuclease in a pharmaceutically effective amount sufficient to edit a gene in a cell.
  • nuclease system refers collectively to transcripts and other elements involved in the expression of or directing the activity of Cas genes, including sequences encoding a Cas9 orthologue gene, a tracr (trans-activating CRISPR) sequence (e.g.
  • tracrRNA or an active partial tracrRNA a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA- processed partial direct repeat in the context of an endogenous nuclease system), a guide sequence (also referred to as a “spacer” in the context of an endogenous nuclease system), or other sequences and transcripts from a CRISPR locus.
  • one or more elements of a nuclease system is derived from a type I, type II, or type III CRISPR system.
  • one or more elements of a nuclease system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes.
  • a nuclease system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous nuclease system).
  • target sequence or “protospacer sequence” refers to a sequence to which a guide sequence or “spacer sequence” is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex.
  • a target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
  • a target sequence is located in the nucleus or cytoplasm of a cell.
  • the target sequence may be within an organelle of a eukaryotic cell, for example, mitochondrion or chloroplast.
  • a sequence or template that may be used for recombination into the targeted locus comprising the target sequences is referred to as an “editing template” or “editing polynucleotide” or “editing sequence”.
  • an exogenous template polynucleotide may be referred to as an editing template.
  • the recombination is homologous recombination.
  • a CRISPR complex comprising a guide sequence hybridized to a target sequence and complexed with one or more nucleases such as Cas9 orthologue proteins
  • formation of a CRISPR complex results in cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence.
  • the tracrRNA sequence which may comprise or consist of all or a portion of a wild-type tracrRNA sequence (e.g.
  • a wild-type tracrRNA sequence may also form part of a CRISPR complex, such as by hybridization along at least a portion of the tracrRNA sequence to all or a portion of a tracrRNA mate sequence that is operably linked to the guide sequence.
  • the tracrRNA sequence has sufficient complementarity to a tracrRNA mate sequence to hybridize and participate in formation of a CRISPR complex. As with the target sequence, it is believed that complete complementarity is not needed, provided there is sufficient to be functional.
  • the tracrRNA sequence has at least 50%, 60%, 70%, 80%, 90%, 95% or 99% of sequence complementarity along the length of the tracrRNA mate sequence when optimally aligned.
  • one or more vectors driving expression of one or more elements of a nuclease system are introduced into a host cell such that expression of the elements of the nuclease system direct formation of a CRISPR complex at one or more target sites.
  • a Cas9 orthologue enzyme, a guide sequence linked to a tracrRNA-mate sequence, and a tracrRNA sequence could each be operably linked to separate regulatory elements on separate vectors.
  • two or more of the elements expressed from the same or different regulatory elements may be combined in a single vector, with one or more additional vectors providing any components of the nuclease system not included in the first vector.
  • Nuclease system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element.
  • the coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction.
  • a single promoter drives expression of a transcript encoding a CRISPR enzyme and one or more of the guide sequence, tracrRNA mate sequence (optionally operably linked to the guide sequence), and a tracrRNA sequence embedded within one or more intron sequences (e.g. each in a different intron, two or more in at least one intron, or all in a single intron).
  • the CRISPR enzyme, guide sequence, tracrRNA mate sequence, and tracrRNA sequence are operably linked to and expressed from the same promoter. 6.3.1.
  • CRISPR-Cas systems are protein-RNA complexes that use an RNA molecule (sgRNA) as a guide to localize the complex to a target DNA sequence via base- pairing.
  • sgRNA RNA molecule
  • a Cas protein then acts as an endonuclease to cleave the targeted DNA sequence.
  • the target DNA sequence typically is complementary to the sgRNA and also contains a protospacer- adjacent motif (PAM) at the 3 '-end of the complementary region in order for the system to function.
  • PAM protospacer- adjacent motif
  • a CRISPR-Cas system comprises, at a minimum, a CRISPR RNA (crRNA) molecule and at least one CRISPR-associated (Cas) endonuclease to form crRNA ribonucleoprotein (crRNP) effector complexes.
  • the Cas endonuclease is guided by a CRISPR RNA (crRNA) through direct RNA-DNA base-pairing to recognize a DNA target site that is in close vicinity to a protospacer adjacent motif (PAM) (Jore, M.M. et al, 2011, Nat. Struct. Mol. Biol.18:529-536, Westra, E.R.
  • PAM protospacer adjacent motif
  • a CRISPR-Cas system refers to a polypeptide of the Cas endonuclease and a guide polynucleotide including a crRNA polynucleotide and a tracrRNA polynucleotide.
  • the crRNA comprises a spacer region complementary to one strand of the double strand DNA target and a region that base pairs with the tracrRNA (trans-activating CRISPR RNA) forming a RNA duplex that directs the Cas endonuclease to cleave the DNA target.
  • tracrRNA trans-activating CRISPR RNA
  • Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain.
  • Cas endonucleases either as single effector proteins or in an effector complex with other components, unwind the DNA duplex at the target sequence and optionally cleave at least one DNA strand, as mediated by recognition of the target sequence by a polynucleotide (such as, but not limited to, a crRNA or guide RNA) that is in complex with the Cas endonuclease.
  • a polynucleotide such as, but not limited to, a crRNA or guide RNA
  • Such recognition and cutting of a target sequence by a Cas endonuclease typically occurs if the correct protospacer-adjacent motif (PAM) is located at or adjacent to the 3' end of the DNA target sequence.
  • PAM protospacer-adjacent motif
  • Cas endonucleases that have been described include, but are not limited to, for example: Cas3, Cas9, and Cas12 (Cpfl).
  • S. pyogenes Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) has been mostly widely used as a tool for genome engineering.
  • This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains.
  • Cas9 forms a complex with a crRNA and a tracrRNA, or with a single guide polynucleotide (sgRNA), for specifically recognizing and cleaving all or part of a DNA target sequence.
  • sgRNA single guide polynucleotide
  • a Cas9 protein comprises a RuvC nuclease with an HNH (H-N-H) nuclease adjacent to the RuvC-II domain.
  • the RuvC nuclease and HNH nuclease each can cleave a single DNA strand at a target sequence (the concerted action of both domains leads to DNA double strand cleavage, whereas activity of one domain leads to a nick).
  • the RuvC domain comprises subdomains I, II and III, where domain I is located near the N-terminus of Cas9 and subdomains II and III are located in the middle of the protein, flanking the HNH domain (Hsu et al, 2013, Cell 157:11228-1278).
  • the nuclease is a Cas9 nickase variant or a dCas9.
  • the CRISPR-Cas orthologue nuclease or polypeptide disclosed herein has one or more mutations in a residue homologous to D10 and H840 of SpCas9 and N580 of SaCas9, identified by multisequence alignment using Clustal Omega.
  • the mutations can generate nickases or deactivated nucleases. For example, mutation of these residues to A (alanine) can abolish the catalytic activity of the Cas9 RuvC domain, Cas9 HNH domain, or both.
  • the CRISPR-Cas orthologue nuclease or polypeptide has one or more mutations at the position D20, H631, or N654 of SEQ ID NO: 721 or SEQ ID NO: 1383. [0188] In some embodiments, the CRISPR-Cas orthologue nuclease or polypeptide has one or more mutations at the position D10, H797, or N819 of SEQ ID NO: 362 or SEQ ID NO: 1384.
  • the CRISPR-Cas orthologue nuclease or polypeptide has one or more mutations at the position D9, H687, or N710 of SEQ ID NO: 444 or SEQ ID NO: 1385. [0190] In some embodiments, the CRISPR-Cas orthologue nuclease or polypeptide has one or more mutations at the position D9, H565, or N588 of SEQ ID NO: 768 or SEQ ID NO: 1386. [0191] In some embodiments, the CRISPR-Cas orthologue nuclease or polypeptide has one or more mutations at the position D8, H681, N704 of SEQ ID NO: 76 or SEQ ID NO: 1387.
  • Cas endonucleases can be used for targeted genome editing (e.g., via simplex and multiplex double-strand breaks and nicks) and targeted genome regulation (e.g., via tethering of epigenetic effector domains to either the Cas protein or sgRNA.
  • the most widely used Cas9 recognizes a 3’ GC-rich PAM sequence on the target dsDNA, typically comprising an NGG motif.
  • the Cas9 orthologues described herein may recognize additional PAM sequences and used to modify target sites with different recognition sequence specificity.
  • the Cas9 orthologues described herein are found in a bacterial or archaeal genome.
  • the Cas9 orthologues described herein are isolated from a bacterial or archaeal genome.
  • the Cas9 orthologues are found, identified, or isolated from (i) publicly available sequences on NCBI database (Benson et al., 2013, Nucleic Acids Res.41:D36–D42 (available as of January 2021), (ii) 771,529 metagenome- assembled genomes (MAGs) from an unpublished study, and (iii) 54,169 additional MAGs obtained with a validated assembly-based pipeline similarly to Pasolli et al, 2019, Cell 176:649– 662.e20.
  • a Cas9 orthologue protein as described herein may comprise a polypeptide having an amino acid sequence or a functional fragment thereof having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28),
  • the Cas9 orthologue protein comprises a polypeptide having an amino acid sequence or a functional fragment thereof having at least 90% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A- 5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A- 12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99
  • the Cas9 orthologue protein comprises a polypeptide having an amino acid sequence or a functional fragment thereof having at least 95% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • the Cas9 orthologue protein comprises a polypeptide having an amino acid sequence or a functional fragment thereof having at least 98% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • a Cas9 orthologue protein as described herein may comprise a polypeptide having an amino acid sequence or a functional fragment thereof having between 90% and 95%, between 95% and 96%, between 96% and 99%, between 97% and 98%, between 97% and 99%, between 98% and 99%, between 99% and 100%, or 100% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17
  • the Cas9 orthologue protein comprises a polypeptide having an amino acid sequence or a functional fragment thereof having between 90% and 100%, or between 95% and 99% amino acid sequence identity with any of one amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A- 6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41
  • a Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof may recognize at least one PAM sequence categorized as any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A- 16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • the Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof recognizes at least one PAM sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25
  • the Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof recognizes at least one PAM sequence having at least 90%, at least 95%, or at least 98% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A
  • a Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof may recognize at least one PAM sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A- 5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A- 12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • the Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof recognizes at least one PAM sequence having between 90% and 95%, between 95% and 96%, between 96% and 99%, between 97% and 98%, between 97% and 99%, between 98% and 99%, between 99% and 100%, or 100% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A- 6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25),
  • the Cas9 orthologue protein or Cas9 polypeptide, or a functional fragment thereof recognizes at least one PAM sequence having between 90% and 95%, between 90% and 99%, between 95% and 99% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33),
  • a Cas9 orthologue protein as described herein comprises a polypeptide having an amino acid sequence set forth as Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99), or a functional fragment thereof, that recognizes (hybri
  • the present inventors have surprisingly identified novel CRISPR-Cas orthologue nucleases which retain strong endonuclease activity and recognize hitherto-untargeted PAM sequences. Therefore, these novel nucleases are capable of being directed towards targets which were previously inaccessible for genome editing, thus addressing a major constraint of existing CRISPR-Cas systems.
  • the CRISPR-Cas polypeptide encoding the Cas orthologue nuclease comprises an amino acid sequence with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387, or any one as shown in Table 34.
  • the CRISPR-Cas orthologue nuclease has an amino acid sequence of any one of SEQ ID NO: 76, SEQ ID NO: 362, SEQ ID NO: 444, SEQ ID NO: 721, SEQ ID NO: 768, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387, or any one as shown in Table 34. [0200] In one embodiment, the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of SEQ ID NO: 721 or SEQ ID NO: 1383.
  • the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of SEQ ID NO: 362 or SEQ ID NO: 1384. [0202] In one embodiment, the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of SEQ ID NO: 444 or SEQ ID NO: 1385. [0203] In one embodiment, the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of SEQ ID NO: 768 or SEQ ID NO: 1386. [0204] In one embodiment, the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of SEQ ID NO: 76 or SEQ ID NO: 1387.
  • the CRISPR-Cas orthologue nuclease comprises an amino acid sequence of any one of nucleases having a SEQ ID NOs as shown in Table 34.
  • the nuclease comprises an amino acid sequence having at least 80% identity with any one of SEQ ID NO.: 1, SEQ ID NO.: 28, SEQ ID NO.: 55, SEQ ID NO.: 106, SEQ ID NO.: 129, SEQ ID NO.: 163, SEQ ID NO.: 217, SEQ ID NO.: 242, SEQ ID NO.: 254, SEQ ID NO.: 270, SEQ ID NO.: 297, SEQ ID NO.: 325, SEQ ID NO.: 341, SEQ ID NO.: 373, SEQ ID NO.: 391, SEQ ID NO.: 409, SEQ ID NO.: 426, SEQ ID NO.: 462, SEQ ID NO.: 481, SEQ ID NO.: 503, SEQ ID NO.
  • the CRISPR-Cas polypeptide is a fusion protein, in which the CRISPR-Cas orthologue nuclease protein sequence is fused to one more amino acid sequences, such as one or more nuclear localization signal peptides, one or more non-native tags, or both.
  • the polypeptide comprises two or more nuclear localization signals fused to the CRISPR-Cas orthologue nuclease.
  • the nuclear localization signal comprises an N-terminal nuclear localization signal, a C-terminal nuclear localization signal, or both.
  • the nuclear localization signal comprises an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1800-1822. In some embodiments, the nuclear localization signal comprises an amino acid sequence of any one of SEQ ID NOs: 1800-1822.
  • Fusion proteins can also comprise an amino acid sequence of, for example, a nucleoside deaminase, a reverse transcriptase, a transcriptional activator, a transcriptional repressor, a histone-modifying protein, an integrase, or a recombinase.
  • the CRISPR-Cas orthologue nuclease protein is fused to one or more protein tags such as SV5 or V5-tags.
  • the tag peptide comprises an amino acid sequence having a at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1823 or SEQ ID NO: 1824. In some embodiments, the tag peptide comprises an amino acid sequence of SEQ ID NO: 1823 or SEQ ID NO: 1824.
  • Additional exemplary fusion partners include, FLAG-tag, myc-tag, HA-tag, GST-tag, polyHis-tag, MBP-tag, protein domains, transcription modulators, enzymes acting on small molecule substrates, DNA, RNA and protein modification enzymes (e.g., adenosine deaminase, cytidine deaminase, guanosyl transferase, DNA methyltransferase, RNA methyltransferases, DNA demethylases, RNA demethylases, dioxygenases, polyadenylate polymerases, pseudouridine synthases, acetyltransferases, deacetylase, ubiquitin-ligases, deubiquitinases, kinases, phosphatases, NEDD8-ligases, de-NEDDylases, SUMO-ligases, deSUMOylases, histone deacetylases, reverse transcriptases, histone acetylases
  • a fusion partner is an adenosine deaminase.
  • An exemplary adenosine deaminase is the tRNA adenosine deaminase (TadA) moiety contained in the adenine base editor ABE8e. 6.3.1.2.
  • Guide polynucleotide [0212] The guide polynucleotide enables target recognition, binding, and optionally cleavage by the Cas endonuclease, and can be a single molecule or a double molecule.
  • a guide polynucleotide is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • any suitable algorithm for aligning sequences include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and
  • a guide polynucleotide is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 140, 150, 180, 200, 220, 250, 300, or more nucleotides in length, including or excluding the spacer sequence.
  • a guide polynucleotide is less than about 300, 250, 200, 180, 150, 120, 115, 110, 105, 100, 95, 90, 85, 80, 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length, including or excluding the spacer sequence. In some embodiments, a guide polynucleotide is between 60-130, 70-280, 80-110, 90-200, or 120-150 in length, including or excluding the spacer sequence. In some embodiments, a guide polynucleotide is between 80-110 nucleotides in length excluding the spacer sequence.
  • the guide polynucleotide sequence can be an RNA sequence, a DNA sequence, or a combination thereof (an RNA-DNA combination sequence).
  • a guide polynucleotide that solely comprises ribonucleic acids is also referred to as a “guide RNA” or “gRNA” (US20150082478, US20150059010, WO2019165168A1, WO2021118626A1, each of which is incorporated herein by reference in its entirety).
  • a guide polynucleotide may be engineered or synthetic.
  • the guide polynucleotide as described herein is a double molecule such as a guide RNA comprising a crRNA and a tracrRNA.
  • the crRNA may comprise a polynucleotide sequence (e.g., a spacer) that recognizes (or hybridizes or complementary to) a target polynucleotide sequence (e.g., protospacer) presence on the target molecule (e.g., genomic locus).
  • a polynucleotide sequence e.g., a spacer
  • a target polynucleotide sequence e.g., protospacer
  • the target polynucleotide sequence (e.g., protospacer) is located adjacent to a PAM such as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • a PAM such as categorized in
  • the guide RNA comprises a spacer that is partially or fully complementary to a target mammalian genomic sequence upstream of a Protospacer Adjacent Motif (PAM) sequence in the non-target strand recognized by the polypeptide.
  • PAM Protospacer Adjacent Motif
  • Non-limiting examples of polynucleotides are listed in Table 32 and Table 34.
  • the guide RNA capable of complexing with the CRISPR-Cas orthologue nuclease is a sgRNA shown in Table 31 or Table 34.
  • the sgRNA comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, or at least 100 consecutive nucleotides of any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377, or any one of SEQ ID NOs: 1450-1483, and SEQ ID NO: 1825 in Table 34.
  • the sgRNA has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the sgRNA comprises the polynucleotide sequence of any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the sgRNA comprises the polynucleotide sequence of any one of SEQ ID NOs: 1450-1483, and SEQ ID NO: 1825 in Table 34. [0218] In some embodiments, the sgRNA comprises at least one modified nucleotide, at least one modified internucleoside linkage, or both. In some embodiments, the sgRNA comprises at least one modified nucleotide selected from 2'-fluoro, 2'-amino or 2'-O-methyl modification on the ribose of pyrimidines, abasic residues, and an inverted base at the 3' end of the RNA.
  • the sgRNA comprises at least one modified internucleoside linkage selected from phosphorothioates, phosphotriesters, methyl phosphonates, short chain alkyl or cycloalkyl inter- sugar linkages, and short chain heteroatomic or heterocyclic inter-sugar linkages.
  • the guide RNA capable of complexing with the CRISPR-Cas orthologue nuclease is a guide RNA having separate crRNA and tracrRNA as shown in Table 29, Table 30, or Table 34.
  • the guide RNA comprises, in 5’ to 3’ order, the crRNA molecule or the tracrRNA molecule comprising at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, or at least 100 consecutive nucleotides of any crRNA or tracrRNA as shown in Table 29, Table 30, or Table 34.
  • the crRNA comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, or at least 100 consecutive nucleotides of any one of SEQ ID NOs: 1362-1366 in Table 29, or any one of SEQ ID NOs: 1388-1418 in Table 34.
  • the tracrRNA comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, or at least 100 consecutive nucleotides of any one of SEQ ID NOs: 1367-1371 in Table 30, or any one of SEQ ID NOs: 1419-1449 in Table 34.
  • the crRNA molecule or the tracrRNA molecule has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one crRNA or tracrRNA as shown in Table 29, Table 30, or Table 34.
  • the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1362 or SEQ ID NO: 1367.
  • the guide RNA comprises a crRNA molecule of SEQ ID NO: 1362 and a tracrRNA of SEQ ID NO: 1367. [0224] In some embodiments, the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with of SEQ ID NO: 1363 or SEQ ID NO: 1368. In some embodiments, the guide RNA comprises a crRNA molecule of SEQ ID NO: 1363 and a tracrRNA of SEQ ID NO: 1368.
  • the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1364 or SEQ ID NO: 1369.
  • the guide RNA comprises a crRNA molecule of SEQ ID NO: 1364 and a tracrRNA of SEQ ID NO: 1369.
  • the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1365 or SEQ ID NO: 1370.
  • the guide RNA comprises a crRNA molecule of SEQ ID NO: 1352 and a tracrRNA of SEQ ID NO: 1370.
  • the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1366 or SEQ ID NO: 1371.
  • the guide RNA comprises a crRNA molecule of SEQ ID NO: 1366 and a tracrRNA of SEQ ID NO: 1371.
  • the guide RNA comprises a crRNA molecule and a tracrRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1366 or SEQ ID NO: 1371.
  • the guide RNA comprises a crRNA molecule of SEQ ID NO: 1366 and a tracrRNA of SEQ ID NO: 1371.
  • the guide RNA comprises a crRNA molecule having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1388-1418, and their corresponding tracrRNA molecule has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1419-1449 as shown in Table 34. 6.3.2.2.1.
  • the guide RNA is synthetic (i.e., non-naturally occurring).
  • the guide RNA is modified.
  • Guide RNAs can be readily synthesized by chemical means, enabling a number of modifications to be readily incorporated, as described in the relevant field.
  • the disclosed gRNA (e.g., sgRNA) molecules can be unmodified or can contain any one or more of an array of chemical modifications.
  • HPLC high-performance liquid chromatography
  • RNAs of greater length One approach that can be used for generating chemically modified RNAs of greater length is to produce two or more molecules that are ligated together. Much longer RNAs, such as those encoding a CRISPR-Cas orthologue nuclease, are more readily generated enzymatically. While fewer types of modifications are available for use in enzymatically produced RNAs, there are still modifications that can be used to, for instance, enhance stability, reduce the likelihood or degree of innate immune response, and/or enhance other attributes, as described herein and in the art.
  • modifications can comprise one or more nucleotides modified at the 2' position of the sugar, for instance a 2'-O-alkyl, 2'-O-alkyl-O-alkyl, or 2'-fluoro-modified nucleotide.
  • RNA modifications can comprise 2'-fluoro, 2'-amino or 2'-O-methyl modifications on the ribose of pyrimidines, abasic residues, or an inverted base at the 3' end of the RNA.
  • modified oligonucleotides include those comprising modified backbones, for example, phosphorothioates, phosphotriesters, methyl phosphonates, short chain alkyl or cycloalkyl intersugar linkages or short chain heteroatomic or heterocyclic intersugar linkages.
  • oligonucleotides are oligonucleotides with phosphorothioate backbones and those with heteroatom backbones, particularly CH2-NH-O-CH2, CH, ⁇ N(CH3)- O-CH2 (known as a methylene(methylimino) or MMI backbone), CH2-O-N (CH 3 )-CH 2 , CH 2 -N (CH 3 )-N (CH 3 )-CH 2 and O-N (CH 3 )- CH 2 -CH 2 backbones, wherein the native phosphodiester backbone is represented as O- P- O- CH,); amide backbones; morpholino backbone structures; peptide nucleic acid (PNA) backbone (wherein the phosphodiester backbone of the oligonucleotide is replaced with a polyamide backbone, the nucleotides being bound directly or indirectly to the aza nitrogen atoms of the polyamide backbone).
  • PNA peptide nucleic acid
  • Phosphorus-containing linkages include, but are not limited to, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates comprising 3'alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates comprising 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'.
  • Modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic intemucleoside linkages.
  • These comprise those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S, and CH2 component parts.
  • One or more substituted sugar moieties can also be included, e.g., one of the following at the 2' position: OH, SH, SCH 3 , F, OCN, OCH 3 , OCH 3 O(CH 2 )n CH 3 , O(CH 2 )n NH 2 , or O(CH 2 )n CH 3 , where n is from 1 to about 10; Ci to C10 lower alkyl, alkoxyalkoxy, substituted lower alkyl, alkaryl or aralkyl; Cl; Br; ON; CF 3 ; OCF 3 ; O-, S-, or bi- alkyl; O-, S-, or N-alkenyl; SOCH 3 ; SO 2 CH 3 ; ONO 2 ; NO 2 ; N 3 ; NH 2 ; heterocycloalkyl; heterocycloalkaryl; aminoalkylamino; polyalkylamino; substituted silyl; an RNA cleaving group; a reporter group;
  • a modification includes 2'- methoxyethoxy (2'-O-CH2CH2OCH3, also known as 2'-O-(2-methoxyethyl)) (Martin et al., 1995, Helv. Chim. Acta, 78, 486).
  • Other modifications include 2'-methoxy (2 -O-CH3), 2'- propoxy (2 - OCH2 CH2CH3) and 2'-fluoro (2 -F). Similar modifications can also be made at other positions on the oligonucleotide, particularly the 3' position of the sugar on the 3' terminal nucleotide and the 5' position of 5' terminal nucleotide.
  • Oligonucleotides can also have sugar mimetics, such as cyclobutyls in place of the pentofuranosyl group.
  • sugar mimetics such as cyclobutyls in place of the pentofuranosyl group.
  • both a sugar and an internucleoside linkage (in the backbone) of the nucleotide units can be replaced with novel groups.
  • the base units can be maintained for hybridization with an appropriate nucleic acid target compound.
  • PNA peptide nucleic acid
  • RNAs such as guide RNAs can also include, additionally or alternatively, nucleobase (often referred to in the art simply as "base") modifications or substitutions.
  • base nucleobase
  • unmodified or “natural” nucleobases include adenine (A), guanine (G), thymine (T), cytosine (C), and uracil (U).
  • Modified nucleobases include nucleobases found only infrequently or transiently in natural nucleic acids, e.g., hypoxanthine, 6-methyladenine, 5-Me pyrimidines, particularly 5- methylcytosine (also referred to as 5-methyl-2' deoxy cytosine and often referred to in the art as 5-Me-C), 5-hydroxymethylcytosine (HMC), glycosyl HMC and gentobiosyl HMC, as well as synthetic nucleobases, e.g., 2-aminoadenine, 2-(methylamino) adenine, 2- (imidazolylalkyl)adenine, 2-(aminoalklyamino) adenine or other heterosub stituted alkyladenines, 2-thiouracil, 2-thiothymine, 5-bromouracil, 5-hydroxymethyluracil, 8-azaguanine, 7-deazaguanine, N6 (6-amino
  • Modified nucleobases can comprise other synthetic and natural nucleobases, such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudo- uracil), 4-thiouraci 1 ,
  • the modified guide RNA is a cRNA comprising a nucleic acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1362, SEQ ID NOs: 1363, SEQ ID NOs: 1364, SEQ ID NOs: 1365, and SEQ ID NOs: 1366, or any one of SEQ ID NOs: 1388-1418 as shown in Table 34.
  • the modified crRNA is capable of complexing with a modified tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1367, SEQ ID NOs: 1368, SEQ ID NOs: 1369, SEQ ID NOs: 1370, and SEQ ID NOs: 1371, or any one of SEQ ID NOs: 1419-1449 as shown in Table 34.
  • the modified guide RNA is a sgRNA comprising a nucleic acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, or SEQ ID NO: 1376, or any one of SEQ ID NOs: 1450-1483 and SEQ ID NO: 1825 as shown in Table 34. 6.3.1.3.
  • a “protospacer adjacent motif” refers to a polynucleotide sequence located adjacent (e.g., downstream or upstream) to a target polynucleotide sequence.
  • a PAM sequence is necessary for a CRISPR-Cas protein to bind and cut target genomic DNA.
  • the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to, or near, a PAM sequence.
  • a PAM polynucleotide as described herein, or a functional fragment thereof may comprise a nucleic acid sequence having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A- 6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29),
  • the PAM polynucleotide, or a functional fragment thereof have not more than 90%, 95%, 98%, 99%, or 100% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A- 7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables
  • a PAM polynucleotide as described herein, or a functional fragment thereof may comprise a nucleic acid sequence having between 85% and 90%, between 90% and 95%, between 95% and 96%, between 96% and 99%, between 97% and 98%, between 97% and 99%, between 98% and 99%, between 99% and 100%, or 100% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A- 5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A- 12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23),
  • the PAM polynucleotide, or a functional fragment thereof have between 90% to 95%, between 95% to 99%, or between 95% and 99% nucleic acid sequence identity with at least one PAM nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A- 16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 9), Tables 2
  • a PAM polynucleotide as described herein, or a functional fragment thereof comprises a consensus nucleic acid sequence as categorized in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • a PAM polynucleotide as described herein, or a functional fragment thereof comprises a PAM logo or motif shown in FIGs.3A-3E, FIGs.4A-4C, FIGs. 5A-5B, FIGs.6A-6D, FIGs.7A-7C, FIGs.8A-8B, FIGs.9A-9B, FIG.10, FIGs.11A-11B, FIGs. 12A-12D, FIG.13, FIGs.14A-14B, FIG.15, FIG.16, FIGs.17, FIGs.18A-18B, FIG.19, FIG. 20, FIG.21, FIGs.22A-22B, FIGs.23A-23B, or FIGs.24A-24B.
  • the PAM is recognized by one or more CRISPR-Cas proteins in a CRISPR-Cas cluster.
  • a PAM polynucleotide as described herein, or a functional fragment thereof comprises a consensus nucleic acid sequence as categorized in any one of Table 32 and Table 34.
  • the PAM is recognized by one or more Cas9 orthologues as described herein.
  • the PAM is predicted and validated according to methods as described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety. [0246] In various embodiments, the PAM is found in one or more viral genomes.
  • the PAM is identified in the viral genomes that have matching or near matching sequences to the spacer sequences of one or more the Cas9 orthologues as described herein.
  • the PAM has a polynucleotide sequence found, identified, or isolated from a viral genome in the Human Gut virome datasets as described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety. 6.3.1.4. Nucleic acids [0247] A nucleic acid comprising a polynucleotide encoding the polypeptide capable of complexing with a guide RNA, the sgRNA, the crRNA, or the tracrRNA can be synthesized by chemical means or by in vitro transcription.
  • the human codon optimized polynucleotide encodes a polypeptide, the polynucleotide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1380, SEQ ID NO: 1381, and SEQ ID NO: 1382.
  • the polynucleotide is a human codon optimized polynucleotide encoding a CRISPR-Cas orthologue nuclease.
  • the human codon optimized polynucleotide encodes a polypeptide, the polypeptide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the polynucleotide encodes a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377.
  • the polynucleotide encodes a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1450-1483, and SEQ ID NO: 1825 as shown in Table 34.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1362 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1367.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1363 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1368.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1364 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1369.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1365 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1370.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1366 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1371.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NOs: 1388-1418 as shown in Table 34, complexed with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1419-1449 as shown in Table 34. 6.3.1.4.1.
  • the system comprises a CRISPR-Cas9 orthologue nuclease having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 76 or SEQ ID NO: 1387, capable of complexing with a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: SEQ ID NO: 1377, or with a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1366 complexed
  • the system comprises a CRISPR-Cas9 orthologue nuclease of SEQ ID NO: 76 or SEQ ID NO: 1387, capable of complexing with a sgRNA of SEQ ID NO: 1377, or with a crRNA of SEQ ID NO: 1366 complexed with a tracrRNA of SEQ ID NO: 1371.
  • the system comprises a CRISPR-Cas9 orthologue nuclease having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 362 or SEQ ID NO: 1384, capable of complexing with a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1373, or with a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1363 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%,
  • the system comprises a CRISPR-Cas9 orthologue nuclease of SEQ ID NO: 362 or SEQ ID NO: 1384, capable of complexing with a sgRNA of SEQ ID NO: 1373, or with a crRNA of SEQ ID NO: 1363 complexed with a tracrRNA of SEQ ID NO: 1368.
  • the system comprises a CRISPR-Cas9 orthologue nuclease having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 444 or SEQ ID NO: 1385, capable of complexing with a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1374, or with a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1364 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%,
  • the system comprises a CRISPR-Cas9 orthologue nuclease of SEQ ID NO: 444 or SEQ ID NO: 1385, capable of complexing with a sgRNA of SEQ ID NO: 1374, or with a crRNA of SEQ ID NO: 1364 complexed with a tracrRNA of SEQ ID NO: 1369.
  • the system comprises a CRISPR-Cas9 orthologue nuclease having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 721 or SEQ ID NO: 1383, capable of complexing with a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1372, or with a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1362 complexed with a tracrRNA having at least 70%, at least 75%, at least 80%,
  • the system comprises a CRISPR-Cas9 orthologue nuclease of SEQ ID NO: 721 or SEQ ID NO: 1383, capable of complexing with a sgRNA of SEQ ID NO: 1372, or with a crRNA of SEQ ID NO: 1362 complexed with a tracrRNA of SEQ ID NO: 1367.
  • the system comprises a CRISPR-Cas9 orthologue nuclease having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 768 or SEQ ID NO: 1386, capable of complexing with a sgRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1375 or SEQ ID NO: 1376, or with a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NO: 1365 complexed with a tracrRNA having at least 70%, at least
  • the system comprises a CRISPR-Cas9 orthologue nuclease of SEQ ID NO: 768 or SEQ ID NO: 1386, capable of complexing with a sgRNA of SEQ ID NO: 1375 or SEQ ID NO: 1376, or with a crRNA of SEQ ID NO: 1365 complexed with a tracrRNA of SEQ ID NO: 1370.
  • the mined Cas9 orthologue cluster nucleases may be delivered to cells to be edited as a polypeptide; alternatively, a polynucleotide sequence encoding the mined Cas9 orthologue cluster nucleases are transformed or transfected into the cells to be edited.
  • the polynucleotide sequence encoding the mined Cas9 orthologue cluster nuclease may be codon optimized for expression in particular cells, such as archaeal, prokaryotic or eukaryotic cells.
  • Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells.
  • Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammals including non-human primates.
  • the choice of the mined Cas9 orthologue cluster nuclease to be employed depends on many factors, such as what type of edit is to be made in the target sequence and whether an appropriate PAM is located close to the desired target sequence.
  • the Cas9 orthologue cluster series nuclease may be encoded by a DNA sequence on a vector (e.g., the engine vector) and be under the control of a constitutive or inducible promoter.
  • the sequence encoding the nuclease is under the control of an inducible promoter, and the inducible promoter may be separate from but the same as an inducible promoter controlling transcription of the guide nucleic acid; that is, a separate inducible promoter may drive the transcription of the nuclease and guide nucleic acid sequence/s but the two inducible promoters may be the same type of inducible promoter (e.g., both are pL promoters).
  • the inducible promoter controlling expression of the nuclease may be different from the inducible promoter controlling transcription of the guide nucleic acid; that is, e.g., the nuclease may be under the control of the pBAD inducible promoter, and the guide nucleic acid may be under the control of the pL inducible promoter.
  • a guide polypeptide complexes with a compatible nucleic acid-guided nuclease and can then hybridize with a target sequence, thereby directing the nuclease to the target sequence.
  • the nucleic acid-guided nuclease editing system uses two separate guide nucleic acid components that combine and function as a guide nucleic acid; that is, a CRISPR RNA (crRNA) and a transactivating CRISPR RNA (tracrRNA).
  • the gRNA may be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or the coding sequence may reside within an editing cassette and is under the control of a constitutive promoter, or, in some embodiments, an inducible promoter as described herein.
  • the components of the guide polynucleotide is provided as a sequence to be expressed from a plasmid or vector and comprises both the guide sequence and the scaffold sequence as a single transcript under the control of a promoter, and in some embodiments, an inducible promoter.
  • the gRNA/nuclease complex binds to a target sequence as determined by the guide RNA, and the nuclease recognizes a protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
  • the target sequence can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro.
  • the target sequence can be a polynucleotide residing in the nucleus of a eukaryotic cell.
  • a target sequence can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide, an intron, a PAM, or “junk” DNA).
  • the guide polynucleotide may be part of an editing cassette that encodes the donor nucleic acid. Alternatively, the guide polynucleotide may not be part of the editing cassette and instead may be encoded on the engine or editing vector backbone.
  • a sequence coding for a guide polynucleotide can be assembled or inserted into a vector backbone first, followed by insertion of the donor nucleic acid in, e.g., the editing cassette.
  • the donor nucleic acid in, e.g., an editing cassette can be inserted or assembled into a vector backbone first, followed by insertion of the sequence coding for the guide nucleic acid.
  • the sequence encoding the guide polynucleotide and the donor nucleic acid are simultaneously but separately inserted or assembled into a vector.
  • the sequence encoding the guide polynucleotide and the sequence encoding the donor nucleic acid are both included in the editing cassette.
  • the target sequence is associated with a PAM, which is a short nucleotide sequence recognized by the gRNA/nuclease complex.
  • PAMs typically are 2-8 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence.
  • Engineering of the PAM-interacting domain of a nucleic acid- guided nuclease may allow for alteration of PAM specificity, improve fidelity, or decrease fidelity.
  • the genome editing of a target sequence both introduces a desired DNA change to a target sequence, e.g., the genomic DNA of a cell, and removes, mutates, or renders inactive a protospacer adjacent motif (PAM) region in the target sequence. Rendering the PAM at the target sequence inactive precludes additional editing of the cell genome at that target sequence, e.g., upon subsequent exposure to a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid in later rounds of editing.
  • PAM protospacer adjacent motif
  • nucleases can recognize some PAMs very well (e.g., canonical PAMs), and other PAMs less well or poorly (e.g., non-canonical PAMs). Because the mined Cas9 orthologue cluster nucleases disclosed herein may recognize different PAMs, the mined Cas9 orthologue cluster nucleases increase the number of target sequences that can be targeted for editing; that is, mined Cas9 orthologue cluster nucleases decrease the regions of “PAM deserts” in the genome. Thus, the mined Cas9 orthologue cluster nucleases expand the scope of target sequences that may be edited by increasing the number (variety) of PAM sequences recognized.
  • cocktails of mined Cas9 orthologue cluster nucleases may be delivered to cells such that target sequences adjacent to several different PAMs may be edited in a single editing run.
  • Another component of the nucleic acid-guided nuclease system is the donor nucleic acid.
  • the donor nucleic acid is on the same polynucleotide (e.g., editing vector or editing cassette) as the guide nucleic acid and may be (but not necessarily) under the control of the same promoter as the guide nucleic acid (e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid).
  • the same promoter e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid.
  • the donor nucleic acid is designed to serve as a template for homologous recombination with a target sequence nicked or cleaved by the nucleic acid-guided nuclease as a part of the gRNA/nuclease complex.
  • a donor nucleic acid polynucleotide may be of any suitable length, such as about or more than about 20, 25, 50, 75, 100, 150, 200, 500, or 1000 nucleotides in length.
  • the donor nucleic acid can be provided as an oligonucleotide of between 20-300 nucleotides, more preferably between 50-250 nucleotides.
  • the donor nucleic acid comprises a region that is complementary to a portion of the target sequence (e.g., a homology arm). When optimally aligned, the donor nucleic acid overlaps with (is complementary to) the target sequence by, e.g., about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 or more nucleotides.
  • the donor nucleic acid comprises two homology arms (regions complementary to the target sequence) flanking the mutation or difference between the donor nucleic acid and the target template.
  • the donor nucleic acid comprises at least one mutation or alteration compared to the target sequence, such as an insertion, deletion, modification, or any combination thereof compared to the target sequence.
  • the donor nucleic acid is provided as an editing cassette, which is inserted into a vector backbone where the vector backbone may comprise a promoter driving transcription of the gRNA and the coding sequence of the gRNA, or the vector backbone may comprise a promoter driving the transcription of the gRNA but not the gRNA itself.
  • each guide nucleic acid is under the control of separate different promoters, separate like promoters, or where all guide nucleic acid/donor nucleic acid pairs are under the control of a single promoter.
  • the promoter driving transcription of the gRNA and the donor nucleic acid is an inducible promoter.
  • Inducible editing is advantageous in that isolated cells can be grown for several to many cell doublings to establish colonies before editing is initiated, which increases the likelihood that cells with edits will survive, as the double-strand cuts caused by active editing are largely toxic to the cells. This toxicity results both in cell death in the edited colonies, as well as a lag in growth for the edited cells that do survive but must repair and recover following editing. However, once the edited cells have a chance to recover, the size of the colonies of the edited cells will eventually catch up to the size of the colonies of unedited cells. See, e.g., U.S. Pat. Nos.10,533,152; 10,550,363; 10,532,324; and U.S. Ser. No. 16/597,826, filed 9 Oct.2019; Ser.
  • a guide nucleic acid may be efficacious directing the edit of more than one donor nucleic acid in an editing cassette; e.g., if the desired edits are close to one another in a target sequence.
  • an editing cassette may comprise one or more primer sites. The primer sites can be used to amplify the editing cassette by using oligonucleotide primers; for example, if the primer sites flank one or more of the other components of the editing cassette.
  • the editing cassette may comprise a barcode.
  • a barcode is a unique DNA sequence that corresponds to the donor DNA sequence such that the barcode can identify the edit made to the corresponding target sequence.
  • the barcode typically comprises four or more nucleotides.
  • the editing cassettes comprise a collection of donor nucleic acids representing, e.g., gene-wide or genome-wide libraries of donor nucleic acids. The library of editing cassettes is cloned into vector backbones where, e.g., each different donor nucleic acid is associated with a different barcode.
  • an expression vector or cassette encoding components of the nucleic acid-guided nuclease system further encodes one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs.
  • NLSs nuclear localization sequences
  • the nuclease comprises NLSs at or near the amino-terminus of the mined Cas9 orthologue cluster RGN, NLSs at or near the carboxy-terminus of the mined Cas9 orthologue cluster RGN, or a combination.
  • the engine and editing vectors comprise control sequences operably linked to the component sequences to be transcribed.
  • the promoters driving transcription of one or more components of the mined Cas9 orthologue cluster nuclease editing system may be inducible, and an inducible system is likely employed if selection is to be performed.
  • a number of gene regulation control systems have been developed for the controlled expression of genes in plant, microbe, and animal cells, including mammalian cells, including the pL promoter (induced by heat inactivation of the CI857 repressor), the pBAD promoter (induced by the addition of arabinose to the cell growth medium), and the rhamnose inducible promoter (induced by the addition of rhamnose to the cell growth medium).
  • the cells may be transformed simultaneously with separate engine and editing vectors; the cells may already be expressing the mined Cas9 orthologue cluster nuclease (e.g., the cells may have already been transformed with an engine vector or the coding sequence for the mined Cas9 orthologue cluster nuclease may be stably integrated into the cellular genome) such that only the editing vector needs to be transformed into the cells; or the cells may be transformed with a single vector comprising all components required to perform nucleic acid-guided nuclease genome editing.
  • a variety of delivery systems can be used to introduce (e.g., transform or transfect) nucleic acid-guided nuclease editing system components into a host cell.
  • These delivery systems include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires, exosomes.
  • molecular trojan horse liposomes may be used to deliver nucleic acid- guided nuclease components across the blood brain barrier.
  • electroporation particularly flow-through electroporation (either as a stand-alone instrument or as a module in an automated multi-module system) as described in, e.g., U.S. Pat. Nos. 10,435,713; 10,443,074; 10,323,1223; and 10,415,058.
  • the cells are cultured under conditions that promote editing.
  • the transformed cells need only be cultured in a typical culture medium under typical conditions (e.g., temperature, CO2 atmosphere, etc.)
  • editing is inducible—by, e.g., activating inducible promoters that control transcription of one or more of the components needed for nucleic acid-guided nuclease editing, such as, e.g., transcription of the gRNA, donor DNA, nuclease, or, in the case of bacteria, a recombineering system—the cells are subjected to inducing conditions.
  • Vectors can be designed for expression of guide polynucleotides, Cas polypeptide (e.g., Cas endonucleases), or guide polynucleotide/Cas polypeptide systems disclosed herein, or any one combination thereof, in prokaryotic or eukaryotic cells.
  • Cas polypeptide e.g., Cas endonucleases
  • guide polynucleotides, Cas polypeptide (e.g., Cas endonucleases), or guide polynucleotide/Cas polypeptide systems disclosed herein, or any one combination thereof can be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors), yeast cells, mammalian cells, or plant cells.
  • bacterial cells such as Escherichia coli
  • insect cells using baculovirus expression vectors
  • yeast cells mammalian cells
  • mammalian cells include, but not limit to human cells, mouse cells, rat cells, rabbit cells, dog cells, or non-human primate cells.
  • the recombinant expression vector can be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase.
  • Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, or regulatory sequences.
  • a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
  • Vectors may be introduced and propagated in a prokaryote.
  • a prokaryote is used to amplify copies of a vector to be introduced into a eukaryotic cell or as an intermediate vector in the production of a vector to be introduced into a eukaryotic cell (e.g., amplifying a plasmid as part of a viral vector packaging system).
  • a prokaryote is used to amplify copies of a vector and express one or more nucleic acids, such as to provide a source of one or more proteins for delivery to a host cell or host organism.
  • Expression of proteins in prokaryotes is most often carried out in Escherichia coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins.
  • Fusion vectors add a number of amino acids to a protein encoded therein, such as to the amino terminus of the recombinant protein.
  • Such fusion vectors may serve one or more purposes, such as: (i) to increase expression of recombinant protein; (ii) to increase the solubility of the recombinant protein; and (iii) to aid in the purification of the recombinant protein by acting as a ligand in affinity purification.
  • a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to enable separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein.
  • enzymes, and their cognate recognition sequences include Factor Xa, thrombin and enterokinase.
  • Example fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), maltose E binding protein, or protein A. respectively, to the target recombinant protein.
  • a vector comprises one or more insertion sites, such as a restriction endonuclease recognition sequence (also referred to as a “cloning site”). In some embodiments, one or more insertion sites (e.g.
  • a vector comprises an insertion site upstream of a tracr mate sequence, and optionally downstream of a regulatory element operably linked to the tracr mate sequence, such that following insertion of a guide sequence into the insertion site and upon expression the guide sequence directs sequence-specific binding of a CRISPR complex to a target sequence in a eukaryotic cell.
  • a vector comprises two or more insertion sites, each insertion site being located between two tracr mate sequences so as to allow insertion of a guide sequence at each site.
  • the two or more guide sequences may comprise two or more copies of a single guide sequence, two or more different guide sequences, or combinations of these.
  • a single expression construct may be used to target CRISPR activity to multiple different, corresponding target sequences within a cell.
  • a single vector may comprise about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guide sequences. In some embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more such guide-sequence-containing vectors may be provided, and optionally delivered to a cell.
  • a vector comprises a regulatory element operably linked to an enzyme-coding sequence encoding one of the Cas9 orthologues proteins described in any one of Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99), or functional fragments
  • the CRISPR enzyme directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • the expression vector comprises a human codon optimized polynucleotide encoding a polypeptide, the polynucleotide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1380, SEQ ID NO: 1381, and SEQ ID NO: 1382.
  • the polynucleotide is a human codon optimized polynucleotide encoding a CRISPR-Cas orthologue nuclease.
  • the expression vector comprises a human codon optimized polynucleotide encoding a polypeptide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1386, and SEQ ID NO: 1387.
  • the expression vector comprises a polynucleotide encoding a human codon optimized polypeptide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of the nucleases having a SEQ ID NO as shown in Table 34.
  • the expression vector comprises a sgRNA comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NO: 1372, SEQ ID NO: 1373, SEQ ID NO: 1374, SEQ ID NO: 1375, SEQ ID NO: 1376, and SEQ ID NO: 1377, or any one of SEQ ID NOs: 1450-1483, and SEQ ID NO: 1825 as shown in Table 34.
  • the expression vector comprises a crRNA comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1362-1366, and the crRNA is capable of complexing with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% with any one of SEQ ID NOs: 1367-1371.
  • the polynucleotide encodes a crRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with SEQ ID NOs: 1388-1418 as shown in Table 34, and the crRNA is capable of complexing with a tracrRNA having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with any one of SEQ ID NOs: 1419-1449 as shown in Table 34.
  • an enzyme coding sequence of a Cas9 orthologue is codon optimized for expression in particular cells, such as eukaryotic cells.
  • the eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human primate.
  • codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence.
  • Codon bias differs in codon usage between organisms
  • mRNA messenger RNA
  • tRNA transfer RNA
  • the predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database”, and these tables can be adapted in a number of ways.
  • codon optimization tools are used (e.g., Gensmart, IDT).
  • one or more codons e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons
  • one or more codons in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.
  • a method of altering a cell comprises contacting a eukaryotic cell (e.g., a human cell) with a polypeptide, nucleic acid, particle, system or pharmaceutical composition described herein.
  • a eukaryotic cell e.g., a human cell
  • Contacting a cell with a disclosed nucleic acid, particle, system or pharmaceutical composition can be achieved by any method known in the art and can be performed in vivo, ex vivo, or in vitro.
  • the methods can include obtaining one or more cells from a subject prior to contacting the cell(s) with a herein disclosed nucleic acid, particle, system or pharmaceutical composition. In some embodiments, the methods can further comprise returning or implanting the contacted cell or a progeny thereof to the subject.
  • CRISPR-Cas orthologue nuclease, as well as nucleic acids encoding CRISPR-Cas orthologue nuclease and gRNAs can be delivered to a cell by, for example, by viral or non-viral delivery vehicles, electroporation or lipid nanoparticles.
  • a polynucleotide encoding CRISPR-Cas orthologue nuclease and a gRNA can be delivered to a cell (ex vivo or in vivo) by a lipid nanoparticle (LNP).
  • LNPs can have, for example, a diameter of less than 1000 nm, 500 nm, 250 nm, 200 nm, 150 nm, 100 nm, 75 nm, 50 nm, or 25 nm.
  • a nanoparticle can range in size from 1-1000 nm, 1-500 nm, 1-250 nm, 25-200 nm, 25-100 nm, 35-75 nm, or 25-60 nm.
  • LNPs can be made from cationic, anionic, neutral lipids, and combinations thereof.
  • Neutral lipids such as the fusogenic phospholipid DOPE or the membrane component cholesterol, can be included in LNPs as 'helper lipids' to enhance transfection activity and nanoparticle stability.
  • LNPs can also be comprised of hydrophobic lipids, hydrophilic lipids, or both hydrophobic and hydrophilic lipids. Lipids and combinations of lipids that are known in the art can be used to produce a LNP.
  • Examples of lipids used to produce LNPs are: DOTMA, DOSPA, DOTAP, DMRIE, DC- cholesterol, DOTAP-cholesterol, GAP-DMORIE-DPyPE, and GL67A- DOPE-DMPE- polyethylene glycol (PEG).
  • Examples of cationic lipids are: 98N12-5, C12-200, DLin-KC2- DMA (KC2), DLin- MC3-DMA (MC3), XTC, MD1 , and 7C1.
  • Examples of neutral lipids are: DPSC, DPPC, POPC, DOPE, and SM.
  • Examples of PEG-modified lipids are: PEG-DMG, PEG- CerCI4, and PEG- CerC20.
  • Lipids can be combined in any number of molar ratios to produce a LNP.
  • the polynucleotide(s) can be combined with lipid(s) in a wide range of molar ratios to produce a LNP.
  • CRISPR-Cas orthologue nuclease and/or gRNAs can be delivered to a cell via an adeno-associated viral vector (e.g., of an AAV2, AAV5, AAV7m8, AAV8, AAV9, AAVrh8r, or AAVrhl 0 serotype), or by another viral vector.
  • an adeno-associated viral vector e.g., of an AAV2, AAV5, AAV7m8, AAV8, AAV9, AAVrh8r, or AAVrhl 0 serotype
  • viral vectors include, but are not limited to lentivirus, adenovirus, alphavirus, enterovirus, pestivirus, baculovirus, herpesvirus, Epstein Barr virus, papovavirus, poxvirus, vaccinia virus, and herpes simplex virus.
  • a CRISPR-Cas orthologue nuclease mRNA is formulated in a lipid nanoparticle, while a sgRNA is delivered to a cell in an AAV or other viral vector.
  • one or more AAV vectors are used to deliver both a sgRNA and a CRISPR-Cas orthologue nuclease.
  • a CRISPR-Cas orthologue nuclease and a sgRNA are delivered using separate vectors.
  • a CRISPR-Cas orthologue nuclease and a sgRNA are delivered using a single vector.
  • CRISPR-Cas orthologue nuclease with their relatively small size can be delivered with a gRNA (e.g., sgRNA) using a single AAV vector.
  • a gRNA e.g., sgRNA
  • Compositions and methods for delivering CRISPR-Cas orthologue nuclease and gRNAs to a cell and/or subject are further described in PCT Patent Application Publications WO 2019/102381 , WO 2020/012335, and WO 2020/053224, each of which is incorporated by reference herein in its entirety.
  • DNA cleavage can result in a single-strand break (SSB) or double-strand break (DSB) at particular locations within the DNA molecule.
  • SSB single-strand break
  • DSB double-strand break
  • Such breaks can be and regularly are repaired by natural, endogenous cellular processes, such as homology-dependent repair (HDR) and non- homologous endjoining (NHEJ). These repair processes can edit the targeted polynucleotide by introducing a mutation, thereby resulting in a polynucleotide having a sequence which differs from the polynucleotide’s sequence prior to cleavage by a CRISPR-Cas orthologue nuclease.
  • HDR DNA repair processes consist of a family of alternative pathways.
  • Non-homologous end-joining refers to the natural, cellular process in which a double- stranded DNA-break is repaired by the direct joining of two non-homologous DNA segments. See, e.g. Cahill et al., 2006, Front. Biosci.11 :1958-1976. DNA repair by non-homologous end- joining is error-prone and frequently results in the untemplated addition or deletion of DNA sequences at the site of repair. Thus, NHEJ repair mechanisms can introduce mutations into the coding sequence which can disrupt gene function.
  • HDR Homology-dependent repair
  • the donor can be an exogenous nucleic acid, such as a plasmid, a single-strand oligonucleotide, a double- stranded oligonucleotide, a duplex oligonucleotide or a virus, that has regions of high homology with the nuclease-cleaved locus, but which can also contain additional sequence or sequence changes including deletions that can be incorporated into the cleaved target locus.
  • exogenous nucleic acid such as a plasmid, a single-strand oligonucleotide, a double- stranded oligonucleotide, a duplex oligonucleotide or a virus, that has regions of high homology with the nuclease-cleaved locus, but which can also contain additional sequence or sequence changes including deletions that can be incorporated into the cleaved target locus.
  • a third repair mechanism includes microhomology-mediated end joining (MMEJ), also referred to as “Alternative NHEJ (ANHEJ)”, in which the genetic outcome is similar to NHEJ in that small deletions and insertions can occur at the cleavage site.
  • MMEJ can make use of homologous sequences of a few base pairs flanking the DNA break site to drive a more favored DNA end joining repair outcome. In some instances, it may be possible to predict likely repair outcomes based on analysis of potential microhomologies at the site of the DNA break.
  • Modifications of a cleaved polynucleotide by HDR, NHEJ, and/or ANHEJ can result in, for example, mutations, deletions, alterations, integrations, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, translocations and/or gene mutation.
  • the aforementioned process outcomes are examples of editing a polynucleotide.
  • Advantages of ex vivo cell therapy approaches include the ability to conduct a comprehensive analysis of the therapeutic prior to administration. Nuclease-based therapeutics can have some level of off-target effects.
  • Performing gene correction ex vivo allows a method user to characterize the corrected cell population prior to implantation, including identifying any undesirable off-target effects. Where undesirable effects are observed, a method user may opt not to implant the cells or cell progeny, may further edit the cells, or may select new cells for editing and analysis.
  • Other advantages include ease of genetic correction in iPSCs compared to other primary cell sources. iPSCs are prolific, making it easy to obtain the large number of cells that will be required for a cell-based therapy. Furthermore, iPSCs are an ideal cell type for performing clonal isolations. This allows screening for the correct genomic correction, without risking a decrease in viability.
  • In vivo treatment can eliminate problems and losses from ex vivo treatment and engraftment.
  • An advantage of in vivo gene therapy can be the ease of therapeutic production and administration. The same therapeutic approach and therapy has the potential to be used to treat more than one patient, for example a number of patients who share the same or similar genotype or allele.
  • ex vivo cell therapy typically requires using a subject’s own cells, which are isolated, manipulated and returned to the same patient. 6.5.1.
  • the disclosure further provides particles comprising a CRISPR-Cas polypeptide protein of the disclosure (e.g., a Cas9 orthologue nuclease), particles comprising a gRNA of the disclosure, particles comprising a system of the disclosure, and particles comprising a nucleic acid or plurality of nucleic acids of the disclosure.
  • the particles can in some embodiments comprise or further comprise a gRNA, or a nucleic acid encoding the gRNA (e.g., DNA or mRNA).
  • the particles can comprise a RNP of the disclosure.
  • Exemplary particles include lipid nanoparticles, vesicles, viral-like particles (VLPs) and gold nanoparticles.
  • the disclosure provides particles (e.g., virus particles) comprising a nucleic acid encoding a CRISPR-Cas orthologue nuclease protein of the disclosure.
  • the particles can further comprise a nucleic acid encoding a gRNA.
  • a nucleic acid encoding a CRISPR-Cas orthologue nuclease protein can further encode a gRNA.
  • the disclosure further provides pluralities of particles (e.g., pluralities of virus particles).
  • pluralities can include a particle encoding a CRISPR-Cas orthologue nuclease protein and a different particle encoding a gRNA.
  • a plurality of particles can comprise a virus particle (e.g., a AAV2, AAV5, AAV7m8, AAV8, AAV9, AAVrh8r, or AAVrhI O virus particle) encoding a CRISPR-Cas orthologue nuclease protein and a second virus particle (e.g., a AAV2, AAV5, AAV7m8, AAV8, AAV9, AAVrh8r, or AAVrhIO virus particle) encoding a gRNA.
  • a plurality of particles can comprise a plurality of virus particles where each particle encodes a CRISPR-Cas orthologue nuclease protein and a gRNA.
  • the disclosure further provides cells and populations of cells (e.g., ex vivo cells and populations of cells) that can comprise a CRISPR-Cas orthologue nuclease protein (e.g., introduced to the cell as a RNP) or a nucleic acid encoding the CRISPR-Cas orthologue nuclease protein (e.g., DNA or mRNA) (optionally also encoding a gRNA).
  • a CRISPR-Cas orthologue nuclease protein e.g., introduced to the cell as a RNP
  • a nucleic acid encoding the CRISPR-Cas orthologue nuclease protein e.g., DNA or mRNA
  • the disclosure further provides cells and populations of cells comprising a gRNA of the disclosure (optionally complexed with a CRISPR-Cas orthologue nuclease protein) or a nucleic acid encoding the gRNA (e.g., DNA or mRNA) (optionally also encoding a CRISPR-Cas orthologue nuclease protein).
  • the cells and populations of cells can be, for example, human cells such as a stem cell, e.g., a hematopoietic stem cell (HSC), a pluripotent stem cell, an induced pluripotent stem cell (IPS), or an embryonic stem cell.
  • HSC hematopoietic stem cell
  • IPS induced pluripotent stem cell
  • a RNP can be produced by mixing a CRISPR-Cas orthologue nuclease protein and one or more guide RNAs in an appropriate buffer.
  • An RNP can be introduced to a cell, for example, via electroporation and other methods known in the art.
  • the cell populations of the disclosure can be cells in which gene editing by the systems of the disclosure has taken place, or cells in which the components of a system of the disclosure have been introduced or expressed but gene editing has not taken place, or a combination thereof.
  • a cell population can comprise, for example, a population in which at least 1%, at least 5%, at least 10%, at least 15%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, or at least 70% of the cells have undergone gene editing by a system of the disclosure.
  • Methods for CRISPR-Cas Protein PAM Identification [0319] In certain aspects, methods for predicting a PAM sequence recognized by one or more (e.g., some or all) CRISPR-Cas proteins in a CRISPR-Cas protein cluster are described.
  • Predicting a PAM sequence can comprise (a) mapping spacer sequences of CRISPR arrays corresponding to a CRISPR-Cas protein cluster to a set of viral genome sequences to identify putative protospacer sequences in the viral genomes and their flanking upstream and downstream sequences in the viral genomes; (b) for each spacer sequence mapped to one or more viral genome sequences, aligning the putative protospacer sequences and their flanking upstream and downstream viral genome sequences to generate a set of aligned putative protospacer and flanking sequences for each mapped spacer sequence; and (c) predicting a PAM sequence recognized by one or more CRISPR- Cas proteins in the cluster from the sets of aligned putative protospacer and flanking sequences.
  • the method is described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety.
  • the method can be computer implemented. Computer implemented methods can comprise executing, in a computer system having one or more processors coupled to a memory storing one or more computer readable instructions for execution by the one or more processors, the one or more computer readable instructions comprising instructions for steps (a)-(c) as described in the preceding paragraph. 6.6.1 PAM predictions for Cas9 clusters [0321] Interrogation of massive metagenomic datasets combined with a newly developed computational method allows for the identification of a vast number of unreported CRISPR- Cas loci and their respective PAM requirements.
  • FIG.1 depicts an exemplary schematic description of the pipeline used to generate PAM predictions using proprietary methods and systems as described in Ciciani et al.2022, which is incorporated herein by reference in its entirety.
  • FIG.1 depicts an exemplary schematic description of the pipeline used to generate PAM predictions using proprietary methods and systems as described in Ciciani et al.2022, which is incorporated herein by reference in its entirety.
  • Referring to FIG.1 from a compendium of 825,698 bacterial and archaeal genomes reconstructed via metagenomic assembly of human, host-associated, and non-host associated environmental microbiomes (see Methods section: Catalog of reference and metagenomic- assembled genomes) and 1222,670 genomes from microbial isolates are retrieved from the NCBI database, the sequences of 92,140 CRISPR-Cas9 loci can be identified.
  • the pipeline can perform Cas9 orthologues proteins clustering at multiple sequence identity levels (from 95% to 100%) and apply a PAM-prediction procedure on each cluster.
  • identified sequences adjacent to protospacers are identified, for instance, by aligning 613,478 unique spacers to phage genomes of the human microbiome: 142,809 viral genomes from the Gut Phage Database (Camarillo-Guerrero et al. Cell.2021;184:1098–1109.e9.189,680), from the Metagenomic Gut Viral catalog (Nayfach S, et al. Nat.
  • Microbiol.2021;6:960–970), and from de-novo assembled gut phages from highly enriched viromes as profiled via ViromeQC Zolfo M, et al. Detecting contamination in viromes using ViromeQC. Nat. Biotechnol.2019;37:1408–1412).
  • This procedure mostly focused on human-associated bacterial and viral genomes as they are largely overrepresented in the dataset, and in particular the human gut has been sufficiently densely sampled to permit multiple reliable host-virus associations via spacer matching (Supplementary FIG.1 in Ciciani et al.2022).
  • the set of viral genome sequences can comprise, for example, a set of phage genomes, for example phage genomes of the human microbiome.
  • the set of viral genomes contains at least 100,000 viral genomes, at least 200,000 viral genomes at least 300,000 viral genomes and/or up to 1 million viral genomes, up to 750,000 viral genomes, up to 500,000 viral genomes, or up to 400,000 viral genomes.
  • Spacer sequences of CRISPR arrays corresponding to one or more CRISPR-Cas proteins of a CRISPR-Cas protein cluster can be mapped to a set of viral genome sequences by performing sequence alignments of a given spacer sequence to the viral genome sequences, for example by using a sequence alignment algorithm such as BLASTN.
  • a given spacer sequence can be considered mapped to a viral genome sequence when, for example, the spacer sequence aligns over its full length to a viral genome sequence with a near-perfect or perfect nucleotide match.
  • a spacer is considered mapped to a viral genome sequence when there are no more than four nucleotide mismatches or gaps (e.g., 4, 3, 2, 1, or 0 nucleotide mismatches or gaps) in an alignment of the full-length spacer sequence and a viral genome sequence.
  • the viral genome sequences to which a given spacer sequence is mapped can be identified as putative protospacer sequences. Following identification of putative protospacer sequences corresponding to a given spacer sequence, the viral genomic sequences flanking the putative protospacer sequences both upstream and downstream can be retrieved. The upstream and downstream sequences are retrieved because true CRISPR-Cas protospacer sequences are adjacent to a PAM sequence.
  • flanking nucleotides up to 40 nucleotides upstream and up to 40 nucleotides downstream of the putative protospacer sequences are retrieved. In some embodiments, up to 30 nucleotides upstream and up to 30 nucleotides downstream of the putative protospacer sequences are retrieved. In some embodiments, up to 20 nucleotides upstream and up to 20 nucleotides downstream of the putative protospacer sequences are retrieved. In some embodiments, up to 10 nucleotides upstream and up to 10 nucleotides downstream of the putative protospacer sequences are retrieved.
  • the CRISPR-Cas protein cluster is shown as having sequences for three CRISPR-Cas proteins, “Cas protein 1”, “Cas protein 2”, and “Cas protein 3”. It should be understood that the features illustrated in FIG.2 are only exemplary.
  • a CRISPR- Cas cluster can have a few to many CRISPR-Cas protein sequences, for example at least one, at least 10, or at least 100 sequences, and/or up to 100, 1000, 2000, 4000, 5000, 10,000 or more than 10,000 sequences.
  • FIG.2 shows three spacers of CRISPR arrays corresponding to the CRISPR-Cas protein cluster (specifically, “Spacer 1”, “Spacer 2”, and “Spacer 3” from CRISPR arrays corresponding to Cas protein 1, Cas protein 2, and Cas protein 3, respectively).
  • the spacer sequences can be mapped to a set of viral genome sequences.
  • FIG.2 shows that the spacer sequences map to viral genomes 1-9 (specifically, Spacer 1 maps to viral genomes 1-3; Spacer 2 maps to viral genomes 4-6; and Spacer 3 maps to viral genomes 7-9).
  • putative protospacer sequences in the viral genomes (illustrated in FIG.2 as PP1 to PP9) and their flanking upstream and downstream sequences (illustrated in FIG.2 as black bars flanking PP1 to PP9) can be identified. Then, the putative protospacer sequences and their flanking sequences can be aligned, and consensus sequences generated for each set of alignments.
  • PP1, PP2, and PP3 and their flanking sequences are aligned and consensus sequence C1 is generated from the alignment because Spacer 1 maps to the viral genome sequences corresponding to PP1, PP2, and PP3.
  • PP4, PP5, and PP6 and their flanking sequences are aligned and consensus sequence C2 is generated from the alignment because Spacer 2 maps to the viral genome sequences corresponding to PP4, PP5, and PP6; and PP7, PP8, and PP9 and their flanking sequences are aligned and a consensus sequence C3 is generated because Spacer 3 maps to the viral genome sequences corresponding to PP7, PP8, and PP9.
  • Consensus sequences can be generated, for example, by taking the most frequent base at each position in an alignment. Columns in an alignment that contain many gaps (e.g., >30% gaps, >40% gaps, >50% gaps) can, in some embodiments, be discarded.
  • the PAM sequence for the CRIPSR-Cas protein cluster can be predicted by identifying conserved nucleotides in the consensus sequences, as one or more conserved nucleotides in either the upstream or downstream flanking sequences are indicative of a PAM sequence.
  • conserved nucleotides conserved among consensus sequences C1, C2, and C3 are identified by asterisks.
  • conserved nucleotides can be identified, for example, by calculating nucleotide frequencies at each position for the consensus sequences and identifying positions where a particular nucleotide is present at a relatively high frequency.
  • a nucleotide can be considered conserved at a position if its frequency at the position is larger (e.g., significantly larger) than the frequencies of nucleotides at other positions in the upstream and downstream flanking sequences.
  • a sequence logo can be generated (e.g., for example by Logomaker (Tareen & Kinney, 2020, Bioinformatics 36:2272–2274)) to represent the nucleotide frequencies at each position. 6.6.3 Predicting PAM sequences from putative protospacer sequences and flanking sequences [0328] Putative protospacer sequences and their flanking upstream and downstream sequences can be aligned to generate a set of aligned putative protospacer and flanking sequences.
  • a multiple sequence alignment tool such as MUSCLE (Edgar, 2004, Nucleic Acids Research 32(5):1792-97; Edgar, 2004, BMC Bioinformatics 5:113) or MAFFT (Katoh et al., 2002, Nucleic Acids Research 30(14):3059–3066) can be used.
  • MUSCLE is used.
  • MAFFT is used.
  • protospacer sequences are adjacent to PAM sequences in viral genomes
  • the sets of aligned putative protospacer and flanking sequences generated for the different spacer sequences can be used to predict the PAM sequence for the CRISPR-Cas proteins of a CRISPR-Cas protein cluster.
  • a consensus sequence can be generated from each set of aligned putative protospacer and flanking sequences (each set corresponding to a single spacer), for example by taking the most frequent base at each position in the alignment, discarding columns composed of many or composed mostly of (e.g., >50%) gaps.
  • the spacer sequence can be aligned to the consensus sequence to define the upstream and downstream regions of the consensus sequence.
  • a consensus sequence can be generated from each set of aligned putative protospacer and flanking sequences, for example as illustrated in FIG.2.
  • the nucleotide frequencies at each position in the upstream and downstream flanking sequences in the consensus sequences can be computed to identify one or more conserved nucleotides, which are indicative of a PAM sequence for a CRISPR-Cas protein cluster.
  • CRISPR-Cas protein clusters generated at a sequence homology of at least 98%, 99%, or 100% having the same predicted PAM are used for generating the PAM consensus sequence.
  • a PAM sequence for a cluster (and, by extension, each CRISPR- Cas protein whose sequence is in the cluster) can be predicted by analyzing the nucleotide frequencies at nucleotide positions in the upstream and downstream sequences to identify nucleotides present at a relatively high frequency (e.g., outliers), because nucleotides present at a relatively high frequency (e.g., outliers) are likely to correspond to nucleotides of the PAM.
  • a sequence logo can be generated from the nucleotide frequencies at positions in the upstream and downstream flanking sequences, which can provide a graphical representation of the sequence conservation.
  • a nucleotide in a sequence logo can be considered a conserved nucleotide when it has more than one bit of information. In some embodiments, a nucleotide is considered conserved when it has more than one bit of information and its bit level is larger than the median bit level in both flanking regions plus 1.5 times the interquartile range of bit levels in both flanking regions.
  • Tools for generating sequence logos include Logomaker (Tareen & Kinney, 2020, Bioinformatics 36:2272–2274). In some embodiments, the methods comprise generating a report with the sequence logo in a computerized system.
  • nucleotide frequencies in the consensus flanking sequences are computed and represented as sequence logos, as illustrated in FIGs.3A-3E, FIGs.4A-4C, FIGs.5A-5B, FIGs.6A-6D, FIGs.7A-7C, FIGs.8A-8B, FIGs.9A-9B, FIG.10, FIGs.11A-11B, FIGs.12A- 12D, FIG.13, FIGs.14A-14B, FIG.15, FIG.16, FIG.17, FIGs.18A-18B, FIG.19, FIG.20, FIG. 21, FIGs.22A-22B, FIGs.23A-23B, or FIGs.24A-24B.
  • a PAM is predicted for a Cas9 cluster if there is at least one conserved position in either the upstream or the downstream flanking sequence. Repeating the PAM-prediction procedure on the different Cas9 clustering identity thresholds, the highest reliability is obtained at 98% identity clustering. At this clustering stringency, PAM predictions for 2546 out of 2779 Cas9 clusters (representing 61,095 Cas9 sequences) with more than 10 mapped spacers are obtained. Detailed methods of PAM predictions for Cas9 clusters are described in Methods section in Ciciani et al.2022, which incorporated herein by reference in its entirety. 6.6.4 Cas9s sequence predictions corresponding to the predicted PAMs [0332] The PAM prediction approach can be validated.
  • the method can be further validated by cross checking the PAM predictions obtained with the pipeline with the sequences experimentally identified and recently reported and characterized in Gasiunas et al., 2020, Nat. Commun.11:5512. Of the 79 Cas9s reported, 21 can be used for the evaluation here as they have a close orthologue in the dataset (> 98% identity), and for them the accuracy of the prediction strategy was confirmed by obtaining PAM logos with high identity (assessed by Jensen- Shannon distance on nucleotide frequencies). The accuracy of PAM predictions generated by the method is as high as 85%.
  • Cas9 subtypes capable of recognizing the predicted PAM [0333]
  • Cas9 candidates can be searched for using parameters favoring the identification of functionally active enzymes (with preserved domain structures and located in complete CRISPR-Cas loci) and with reduced molecular size ( ⁇ 1,100 amino acids), thus potentially more convenient for genome editing applications.
  • Their PAM logos can be predicted and subsequently validated through an in vitro assay (Karvelis et al., 2019, Methods in Enzymology 616:219–240).
  • the method allows PAM prediction for the vast majority of Cas9 orthologues proteins identified in the databank with 10 or more mapped spacers, across all Cas9 subtypes (93.6% for A, 93.0% for B and 87.9% for C). 6.6.6 Generation of Cas9 orthologues protein clusters capable of recognizing the predicted PAMs [0334]
  • the PAM predictor method is then applied to the metagenomically extended set of 14909 Cas9 orthologues protein families (98% identity clustering) to identify PAM requirements and explore whether specific PAM clusters may exist.
  • a PAM preference may be generated (FIGs.3A-3E, FIGs.4A-4C, FIGs.5A-5B, FIGs.6A-6D, FIGs.7A- 7C, FIGs.8A-8B, FIGs.9A-9B, FIG.10, FIGs.11A-11B, FIGs.12A-12D, FIG.13, FIGs.14A- 14B, FIG.15, FIG.16, FIGs.17, FIGs.18A-18B, FIG.19, FIG.20, FIG.21, FIGs.22A-22B, FIGs.23A-23B, or FIGs.24A-24B).
  • the most prevalent PAM sequences represented only a small fraction of all possible PAMs.
  • CRISPR-Cas protein clusters can be generated from an initial set of CRISPR-Cas proteins (e.g., an initial set of CRISPR-Cas proteins having at least 10,000, at least 50,000, at least 75,000, at least 90,000 and/or up to 1 million, up to 750,000, up to 500,000, up to 200,000, or up to 100,000 CRISPR-Cas protein sequences) by use of a clustering algorithm that clusters amino acid sequences by percent identity.
  • an initial set of CRISPR-Cas proteins e.g., an initial set of CRISPR-Cas proteins having at least 10,000, at least 50,000, at least 75,000, at least 90,000 and/or up to 1 million, up to 750,000, up to 500,000, up to 200,000, or up to 100,000 CRISPR-Cas protein sequences
  • UCLUST Edgar, 2010, Bioinformatics 26:2460–2461
  • CD-HIT Weizhong & Godzik, 2006, Bioinformatics 22(13):1658-1659
  • UCLUST is used to cluster CRISPR-Cas proteins at the 95%, 96%, 97%, 98%, 99%, or 100% identity level.
  • UCLUST is used to cluster CRISPR-Cas proteins at the 98% identity level.
  • PAM prediction methods of the disclosure comprise a step of generating one or more CRISPR-Cas protein clusters. Alternatively, PAM prediction can be performed using one or more previously generated CRISPR-Cas protein clusters.
  • An initial set of CRISPR-Cas proteins used to make a CRISPR-Cas protein cluster can be obtained from bacterial and/or archaeal genomes. For example, genomes (e.g., at least 100,000, 200,000, 500,000, 800,000, 1 million, or more, and/or up to 2 million genomes) can be retrieved from a database, for example NCBI, and an algorithm can be used to identity CRISPR-Cas loci, for example CRISPR-Cas9 loci.
  • CRISPR-CasTyper (Russel et al., 2020, CRISPR J.3:462–469).
  • PAM prediction methods of the disclosure comprise a step of generating an initial set of CRISPR-Cas protein sequences, which is then used to generate one or more CRISPR-Cas protein clusters.
  • a pre-existing set of CRISPR-Cas protein sequences can be used to generate one or more CRISPR-Cas protein clusters.
  • the methods of the disclosure can be performed using various types of CRISPR-Cas protein sequences.
  • the CRISPR-Cas proteins in a CRISPR-Cas protein cluster can comprise or consist of Class II Cas proteins, e.g., Type II Cas proteins (e.g., Cas9), Type V Cas proteins (e.g., Cas12a), or Type VI Cas proteins (e.g., Cas13) (for reviews of Class II Cas proteins, see Makarova et al., 2020, Nat Rev Microbiol 18(2):67-83; Tong et al., 2020, Front Cell Dev Biol.8:622103; and Chylinksi et al., 2014 Nucleic Acids Res.42(10):6091-6105, the contents of which are incorporated herein by reference in their entireties).
  • Class II Cas proteins e.g., Type II Cas proteins (e.g., Cas9), Type V Cas proteins (e.g., Cas12a), or Type VI Cas proteins (e.g., Cas13)
  • Type II Cas proteins include, for example, Type II-A, Type II-B, and Type II-C.
  • Type V Cas proteins include, for example, Type V-A, Type V-B, Type V-C, Type V-D, Type V-E, Type V-F, Type V-G, Type V- H, Type V-I, Type V-J, and Type V-K.
  • the CRISPR-Cas proteins are Type II Cas proteins, for example Cas9 orthologues proteins.
  • the CRISPR-Cas proteins are Type V Cas proteins (e.g., Cas12a).
  • An initial set of CRISPR-Cas proteins contains protein sequences having a minimum and/or maximum sequence length.
  • a minimum length can be 150, 200, 400, 500, 600, 700, 800, 900, 950, 1000, 1100, 1200 or 1300 amino acids.
  • a maximum length can be, for example, 2200, 2100, 2000, 1900, 1800, 1232, 1600, 1500, 1400, 1300, 1200 or 1100 amino acids. Selecting a maximum length, for example of 1100 amino acids, can be used, for example, to limit a set of CRISPR-Cas proteins to those which can be packaged together with a gRNA in a single AAV vector genome.
  • an initial set of CRISPR-Cas proteins is not limited to those having a particular amino acid sequence length.
  • the number of CRISPR-Cas protein sequences in a CRISPR-Cas protein cluster can vary, for example depending on the number of CRISPR-Cas protein sequences in an initial set of CRISPR-Cas proteins subjected to clustering and/or depending on how divergent the sequences in the initial set of CRISPR-Cas protein are from each other.
  • a CRISPR-Cas protein cluster can contain, for example, at least 1, at least 10, or at least 100 CRISPR-Cas protein sequences.
  • Members in a CRISPR-Cas protein cluster may have at least one PAM preferences.
  • clusters having a low number of mapped spacers can in some embodiments be discarded. For example, in some embodiments, cluster having fewer than 50, fewer than 25, or fewer than 10 mapped spacers are discarded. In some embodiments, clusters having fewer than 10 mapped spacers are discarded. In some embodiments, clusters having fewer than 5 mapped spacers are discarded.
  • a PAM prediction method of the disclosure is repeated using a different CRISPR-Cas protein cluster(s) and/or a different set of viral genomes from those used initially. Repeating the method using a different CRISPR-Cas protein cluster(s) and/or a different set of viral genomes can be useful, for example, to update a CRISPR-Cas protein library with additional CRISPR-Cas protein sequences and/or to update a CRISPR-Cas protein library after a new set of viral genomes becomes available.
  • the methods can further comprise a step of generating a report with the predicted PAM for a CRISPR-Cas protein cluster.
  • the report can include, for example, identifying information for one or more CRISPR-Cas proteins (e.g., CRISPR-Cas protein name and/or ID number and/or CRISPR-Cas protein amino acid sequences and/or species information) and their corresponding predicted PAM sequences.
  • the information in a report can be included in a CRISPR-Cas protein library of the disclosure.
  • the predicted PAM can optionally be validated, for example by an in vitro assay using one or more CRISPR-Cas proteins of the cluster, for example as described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety.
  • tracrRNA sequences for the one or more CRISPR-Cas proteins to be used for PAM validation can be determined so that an appropriate gRNA for the in vitro assay can be designed.
  • tracrRNA sequences can be identified computationally, for example, as described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety.
  • tracrRNA sequences which contain an anti-repeat and a Rho-independent terminator (RIT) can be identified by aligning CRISPR repeats to sequences flanking a CRISPR-Cas locus (e.g., up to 1000 nucleotides) (for example, using BLASTN) to identify putative anti-repeats.
  • RITs can be predicted using RNIE software (Gardner et al., 2011, Nucleic Acids Res.39:5845–5852).
  • CRISPR- Cas protein sequences of the CRISPR-Cas protein clusters can be clustered according to their predicted PAM sequences to generate PAM clusters.
  • PAM clustering can be performed, for example, by computing an all-to-all PAM prediction distance matrix using a clustering tool such as usearch (Edgar, 2010, Bioinformatics 26:2460–2461).
  • Hierarchical clustering can be performed on a PAM cluster, for example as described in Example 1.
  • PAM clustering can be useful for measuring the diversity of a library, and for studying the hierarchical relationship between CRISPR-Cas proteins in a library.
  • the disclosure provides methods of selecting a CRISPR-Cas protein of interest, for example a previously uncharacterized CRISPR-Cas protein.
  • the method is based on the prediction and validation that a CRISPR-Cas protein (e.g., Cas9 orthologue) is capable of recognizing a putative PAM sequence (e.g., a consensus PAM sequence) as described in Ciciani et al., 2022, which is incorporated herein by reference in its entirety.
  • Such methods can comprise identifying a CRISPR-Cas protein in a library of the disclosure having relatively high (or highest) amino acid sequence identity with the CRISPR-Cas protein of interest (e.g., greater than 90%, greater than 91%, greater than 92%, greater than 93%, greater than 94%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, greater than 99%, or 100% identity), and predicting that the PAM sequence of the CRISPR-Cas protein of interest is the same as the PAM sequence of the CRISPR-Cas protein in the library.
  • such methods are computer implemented. After a PAM is predicted for the CRISPR-Cas protein of interest, the prediction can optionally be validated in vitro.
  • the disclosure provides methods for selecting a CRISPR-Cas protein for editing a genomic sequence comprising a PAM of interest, comprising (a) identifying, in a CRISPR-Cas library of the disclosure, one or more CRISPR-Cas proteins whose predicted PAM sequences correspond to the PAM sequence of interest; (b) evaluating the ability of the one or more CRISPR-Cas proteins to edit the genomic sequence (e.g., in an in vitro gene editing assay); and (c) selecting a CRISPR-Cas protein from the one or more CRISPR-Cas proteins following the evaluation.
  • the selected CRISPR-Cas protein(s) can be, for example, the CRISPR-Cas protein(s) having the highest editing activity and/or highest fidelity in an in vitro gene editing assay.
  • step (a) is computer implemented.
  • the disclosure provides methods for selecting a genomic sequence having a pathogenic mutation for editing with a CRISPR-Cas protein, comprising (a) identifying, in a CRISPR-Cas library of the disclosure, a PAM sequence associated with a pathogenic mutation (e.g., a PAM sequence created by the pathogenic mutation); and (b) selecting a genomic sequence that includes that PAM sequence for editing with a CRISPR- Cas protein.
  • predicted PAM sequences in a library of the disclosure can be aligned with a set of genomic sequences having pathogenic or likely pathogenic mutations, for example mutations in the ClinVar database (Landrum et al., 2018, Nucleic Acids Res.46: D1062–D1067) to identify genomic sequences that include a PAM sequence present in a library of the disclosure.
  • Genomic sequences that include a sequence that matches exactly to a PAM sequence in the library can be selected for editing.
  • CRISPR-Cas proteins in the library having that PAM sequence can be selected and evaluated for their ability to edit the selected genome sequence. For autosomal dominant diseases, if the PAM sequence is present in the mutant allele but not the wild-type allele, allele specific editing can be achieved.
  • the disclosure provides methods of designing a guide RNA (gRNA) molecule for a CRISPR-Cas protein in a CRISPR-Cas protein library of the disclosure.
  • the gRNAs can be designed, for example, for a CRISPR-Cas protein selected by a method of the disclosure to edit a genomic sequence of interest, for example, a genomic sequence having a PAM of interest, as described herein.
  • Design of a gRNA can be computer implemented.
  • design of a gRNA can comprise identifying a tracrRNA sequence for a selected CRISPR-Cas protein, and selecting targeting sequence that is complementary to a target genomic sequence and adjacent to the PAM for the selected CRISPR-Cas protein.
  • the designed gRNA can comprise separate crRNA and tracrRNA or can comprise an sgRNA.
  • the disclosure provides methods for editing a genomic sequence. Such methods can comprise contacting a cell (e.g., a mammalian cell such as a human cell) with a system comprising a CRISPR-Cas protein selected according to a method described herein and a gRNA for editing the genomic sequence. In some embodiments, the contacting is performed ex vivo. In other embodiments, the contacting is performed in vivo. In some embodiments, the method comprises contacting a cell with a nucleic acid (e.g., an AAV genome) encoding both the CRISPR-Cas protein and the gRNA.
  • a nucleic acid e.g., an AAV genome
  • the method comprises contacting a cell with a nucleic acid (e.g., an AAV genome) encoding the CRISPR-Cas protein and a different nucleic acid (e.g., an AAV genome) encoding the gRNA.
  • a nucleic acid e.g., an AAV genome
  • the method can comprise contacting the cell with an AAV particle comprising the AAV genome.
  • the method comprises contacting a cell with a ribonucleoprotein complex comprising the CRISPR-Cas protein and gRNA.
  • the invention provides a method of treating a subject in need thereof, the method comprising administering a polypeptide (e.g., a CRISPR-Cas orthologue nuclease and/or a gRNA polypeptide), system, nucleic acid, expression vector, or pharmaceutical composition as described herein to the subject.
  • a polypeptide e.g., a CRISPR-Cas orthologue nuclease and/or a gRNA polypeptide
  • system nucleic acid, expression vector, or pharmaceutical composition as described herein for use in therapy.
  • invention provides a polypeptide (e.g., a CRISPR-Cas orthologue nuclease and/or a gRNA polypeptide), system, nucleic acid, expression vector, or pharmaceutical composition as described herein for use in a method of treating or preventing a disease or condition caused by a genetic defect (e.g., a disease or condition caused by a protein deficiency or over-abundance), the method comprising administering said polypeptide (e.g., a CRISPR-Cas orthologue nuclease and/or a gRNA polypeptide), system, nucleic acid, expression vector, or pharmaceutical composition to the subject.
  • a polypeptide e.g., a CRISPR-Cas orthologue nuclease and/or a gRNA polypeptide
  • the invention does not comprise a process for modifying the germ line genetic identity of human beings, and/or the use of human embryo for industrial or commercial purposes. 7. FURTHER EMBODIMENTS [0354] Further embodiments are provided in the following numbered embodiments. 1.
  • CRISPR-Cas polypeptide comprising the following components: an amino acid sequence having at least about 90% identity to any one of the amino acid sequences set forth in Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A- 10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41
  • the guide polynucleotide comprises: (a) a CRISPR RNA (crRNA) comprising a 3' region and a 5' region, wherein the 3' region comprises at least about 10 consecutive nucleotides of a CRISPR repeat comprising the nucleotide sequence of the crRNA from the CRISPR-Cas polypeptide, and/or a functional fragment thereof, and the 5' region comprises at least about 10 consecutive nucleotides of a spacer sequence located upstream of the repeat; and (b) a trans-activating crRNA (tracrRNA) comprising a 5' and 3' region wherein at least a portion of the 5' region of the tracrRNA is complementary to the 3' region of the crRNA, wherein the tracrRNA comprises the nucleotide sequence of the tracrRNA from the CRISPR-Cas polypeptide.
  • crRNA CRISPR RNA
  • tracrRNA trans-activating crRNA
  • a protein-polynucleotide complex comprising: a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas polypeptide comprising an amino acid sequence having at least about 90% identity to any one of the amino acid sequences set forth in Tables 1A-1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A- 9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (C
  • the spacer sequence recognizes/hybridizes with the portion of the target polynucleotide located upstream of the PAM polynucleotide.
  • the guide polynucleotide comprises: (a) a CRISPR RNA (crRNA) comprising a 3' region and a 5' region, wherein the 3' region comprises at least about 10 consecutive nucleotides of a CRISPR repeat comprising the nucleotide sequence of the crRNA from the CRISPR-Cas polypeptide, and/or a functional fragment thereof, and the 5' region comprises at least about 10 consecutive nucleotides of a spacer sequence located upstream of the repeat; and (b) a trans-activating crRNA (tracrRNA) comprising a 5' and 3' region wherein at least a portion of the 5' region of the tracrRNA is complementary to the 3' region of the crRNA, wherein the tracrRNA comprises the nucleotide sequence of the tracrRNA from the CRISPR-Cas polypeptide.
  • crRNA CRISPR RNA
  • tracrRNA trans-activating crRNA
  • a method of cleaving a polynucleotide comprising: contacting a target polynucleotide having a target nucleotide sequence with the protein- polynucleotide complex according to any one of embodiments 7-11, wherein the PAM polynucleotide is recognized by the protein-polynucleotide complex; wherein the guide polynucleotide comprises a spacer sequence that is complementary to a portion of a target polynucleotide located upstream of the PAM polynucleotide and the spacer nucleotide sequence hybridizes with the portion of a target polynucleotide sequence to form a double stranded polynucleotide; wherein hybridization of the spacer sequence with the portion of the target polynucleotide sequence allows for site specific cleaving of the target polynucleotide; and thereby cleaving the target polynucleotide with the CRISPR-Ca
  • a method of cleaving a polynucleotide comprising: contacting a target polynucleotide with the protein-polynucleotide complex according to any one of embodiments 7-11, wherein the target polynucleotide comprises (a) a protospacer sequence comprising a nucleotide sequence that is at least 80% complementary to the spacer sequence in the crRNA of the protein-polynucleotide complex; and (b) a PAM polynucleotide comprising a nucleotide sequence as categorized in any one of Tables 1A-1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster
  • a nuclease system for performing nucleic acid-guided nuclease site specific genome targeting of a target polynucleotide comprising: (a) a nuclease having amino acid sequence identity of at least 90% to any one of the amino acid sequences set forth in Tables 1A-1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster
  • nuclease system wherein the nuclease has an amino acid sequence of 92% identity to any one of the amino acid sequences set forth in Tables 1A- 1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99), and/or a functional
  • nuclease system according to embodiment 14 or 15, wherein the nuclease has an amino acid sequence of 95% identity to any one of the amino acid sequences set forth in Tables 1A- 1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99), and/or
  • nuclease system according to any one of embodiments 14-16, wherein the nuclease has an amino acid sequence of 99% identity to any one of the amino acid sequences set forth in Tables 1A-1E (Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15 (Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99), and
  • nuclease system according to any one of embodiments 14-17, wherein the nuclease is encoded by a human codon optimized sequence.
  • the guide RNA molecule recognizes (hybridizes with) a portion of the target polynucleotide upstream of the PAM polynucleotide.
  • hybridization of the guide RNA molecule with the target polynucleotide allows for site specific genome editing or cleaving of the target polynucleotide. 21.
  • nuclease system according to embodiment 20, wherein the nuclease is a Cas9 nickase variant or a dCas9. 22.
  • sgRNA single guide RNA
  • crRNA CRISPR RNA
  • tracrRNA trans- activating small RNA
  • the crRNA and tracrRNA respectively comprises: (a) the crRNA comprises a 3' region and a 5' region, wherein the 3' region comprises at least about 10 consecutive nucleotides of a CRISPR repeat comprising the nucleotide sequence of the crRNA from the nuclease, and/or a functional fragment thereof, and the 5' region comprises at least about 10 consecutive nucleotides of a spacer sequence located upstream of the repeat; and (b) the tracrRNA comprises a 5' and 3' region wherein at least a portion of the 5' region of the tracrRNA is complementary to the 3' region (CRISPR repeat) of the crRNA, wherein the tracrRNA comprises the nucleotide sequence of the tracrRNA from the nuclease.
  • nuclease system according to any one of embodiments 22-24, wherein the sgRNA has a sequence with a length of from about 10 to about 250 nucleotides (nt), or from about 80 to about 110 nt, or from about 15 to about 30 nt, or from about 20 to about 53 nt, or from about 25 to about 53 nt, or from about 29 to about 53 nt, or from about 40 to about 50 nt.
  • nt nucleotides
  • the system is configured to perform specific genome targeting of a target polynucleotide in a prokaryotic cell or a eukaryotic cell.
  • nuclease system according to embodiment 26 wherein the prokaryotic cell is a bacterial cell.
  • the eukaryotic cell is a mammalian cell.
  • 29 The nuclease system according to embodiment 28, wherein the mammalian cell is a human cell.
  • An expression vector comprising a nucleic acid molecule encoding the polypeptide according to any one of embodiments 1-2, a nucleic acid molecule encoding the polypeptide according to any one of embodiments 3-6, a nucleic acid molecule encoding the protein of the protein- polynucleotide complex according to any one of embodiments 7-11, or a nucleic acid molecule encoding the nuclease according to any one of embodiments 14-29. 31.
  • An expression vector comprising a nucleic acid molecule encoding the guide polynucleotide according to any one of embodiments 1-2, a nucleic acid molecule encoding the guide polynucleotide according to any one of embodiments 3-6, a nucleic acid molecule encoding the guide polynucleotide according to any one of embodiments 7-11, or a nucleic acid molecule encoding the guide RNA molecule according to any one of embodiments 14-29.
  • a cell comprising the expression vector according to embodiment 30 or embodiment 31.
  • 33 The cell according to embodiment 32, wherein the cell is a prokaryotic cell or a eukaryotic cell. 34.
  • the method according to embodiment 37 comprising contacting a target molecule with the nuclease system, wherein the target molecule comprises a protospacer sequence; the nuclease recognizes a protospacer adjacent motif (PAM) polynucleotide sequence (downstream) of the protospacer sequence; the guide RNA molecule is a sgRNA sequence comprising a spacer sequence that is complementary to the protospacer sequence, wherein the spacer sequence hybridizes with the protospacer sequence to form a double stranded nucleotide sequence; and the nuclease cleaves the target molecule.
  • the target molecule comprises a protospacer sequence
  • the nuclease recognizes a protospacer adjacent motif (PAM) polynucleotide sequence (downstream) of the protospacer sequence
  • the guide RNA molecule is a sgRNA sequence comprising a spacer sequence that is complementary to the protospacer sequence, wherein the spacer sequence hybridizes with the protospacer sequence to
  • a method of modulating translation in a cell comprising administering to the cell an effective amount of the nuclease system according to any one of embodiment 14-29.
  • An in vivo method for site-specifically editing a cellular genome at a desired target sequence comprising: administering to a subject the system of any one of embodiment 14-29.
  • 41. A method for site-specifically editing a cellular genome at a desired target sequence comprising: introducing into the cell the system of any one of embodiment 14-29. 8.
  • Example 1 Identification of Cas9 orthologues proteins and corresponding PAMs
  • This Example illustrates identification of Cas9 orthologues proteins and corresponding PAMs, using the novel nucleases (e.g., Cas9 orthologues proteins) mined/identified using a version of the strategy outlined in FIGs.1 and 2 above.
  • the mined/identified Cas9 orthologues proteins were blasted against known Cas9 orthologues, and any Cas9 orthologues proteins having greater than 90% identity to a previously disclosed orthologue in the literature were discarded, which reduced the number of hits to 9554 families, each sequence having less than 90% identity to known Cas9 orthologues proteins.
  • the remaining Cas9 orthologues protein families were clustered by amino acid sequence identity (e.g., at least 90%) into 5897 clusters.
  • the Cas9 clusters having a single member were discarded, resulting in 1406 clusters.
  • the Cas9 orthologue/PAM pairs can be validated using either an in vitro assay.
  • Exemplary amino acid sequences of the mined/identified Cas9 orthologues proteins were determined and set forth in Tables 1A-1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A-5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A-12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38), Tables 21A-21B (Cluster 41), and Table 22 (Cluster 99).
  • Each of the Cas9 orthologues proteins was found to have at least one PAM polynucleotide (e.g., a PAM ID) and having a nucleic acid sequence (consensus sequence, e.g., motif) as shown in Tables 1A-1E(Cluster 9), Tables 2A-2C (Cluster 10), Tables 3A-3B (Cluster 11), Tables 4A-4D (Cluster 12), Tables 5A- 5C (Cluster 13), Tables 6A-6B (Cluster 14), Tables 7A-7D (Cluster 15), Table 8 (Cluster 16), Tables 9A-9B (Cluster 17), Tables 10A-10D (Cluster 18), Table 11 (Cluster 20), Tables 12A- 12B (Cluster 21), Table 13 (Cluster 22), Table 14 (Cluster 23), Table 15(Cluster 25), Tables 16A-16B (Cluster 28), Table 17 (Cluster 29), Table 18 (Cluster 32), Table 19 (Cluster 33), Tables 20A-20B (Cluster 38
  • FIGs.3A-3E correspondence to Nuclease ID shown in Tables 1A-1E, respectively
  • FIGs.4A-4C correlateence to Nuclease ID shown in Tables 2A-2C respectively
  • FIGs.5A-5B correlateence to Nuclease ID shown in Tables 3A- 3B, respectively
  • FIGs.6A-6D correspondence to Nuclease ID shown in Tables 4A-4D, respectively
  • FIGs.7A-7C correspondence to Nuclease ID shown in Tables 5A-5C, respectively
  • FIGs.8A-8B correspondence to Nuclease ID shown in Tables 6A-6B, respectively
  • FIGs.9A-9D correspondence to Nuclease ID shown in Tables 7A-7D, respectively
  • FIG.10 correlateence to Nuclease ID shown in Table 8
  • FIGs.11A-11B correspondence to Nuclease ID shown in Table
  • Table 1A (cluster 9_1) Nuclease ID PAM Motif (5’-3’) Cas9 orthologue nuclease amino acid sequence SEQ ID NO.: A JCV NNNNAA SEQ ID NO.: 1 AJGW ANATACNNT SEQ ID NO.: 2 DOEQ NNANAC SEQ ID NO.: 3 DKQW - SEQ ID NO.: 4 GBWG - SEQ ID NO.: 5 GCCE NNGNAA SEQ ID NO.: 6 AJDO - SEQ ID NO.: 7 DKQC - SEQ ID NO.: 8 DKRM - SEQ ID NO.: 9
  • Table 1B (Cluster 9_7) Nuclease ID PAM Motif (5’-3’) SEQ ID NO.: BCIM NTGANNA SEQ ID NO.: 10 CFXD NTGANNA SEQ ID NO.: 11 AWDL NTGANNA SEQ ID NO.: 12 DFBQ NTGANNA SEQ ID NO.: 13 GEDZ N
  • BLAST version 2.2.31 (with parameters -task blastn-short -gapopen 2 -gapextend 1 -penalty -1 -reward 1 -evalue 1 -word_size 8) was used to identify anti-repeats within a 3000 bp window flanking the CRISPR-Cas9 locus.
  • a custom version of RNIE was used to predict Rho-independent transcription terminators (RITs) near anti-repeats.
  • sgRNA scaffolds The secondary structure of sgRNA scaffolds was predicted using RNAsubopt version 2.4.14 (with parameters --noLP -e 5). sgRNAs lacking the functional modules, namely the repeat:anti-repeat duplex, nexus and 3’ hairpin-like folds, were discarded. 8.2.1.3. Mammalian expression plasmids and constructs [0360] CRISPR-Cas polypeptides (i.e. Cas9 orthologs) were expressed in mammalian cells from a plasmid vector characterized by an EF1alpha-driven cassette.
  • Each CRISPR-Cas polypeptide coding sequence was human codon-optimized and modified by the addition of an SV5 tag at the N-terminus and two bipartite nuclear localization signals (1 at the N-term and 1 at the C-term).
  • the relative sgRNAs were expressed from a U6-driven cassette located on an independent plasmid construct.
  • the human codon-optimized coding sequence of the CRISPR- Cas polypeptide, as well as the sgRNA scaffolds, were obtained as synthetic fragments from Twist Bioscience.
  • Spacer sequences were cloned into the sgRNA plasmid as annealed DNA oligonucleotides (Eurofins Genomics) using double BsaI sites present in the plasmid.
  • the list of spacer sequences and relative cloning oligonucleotides used in the present example is listed in Table 23. In all cases in which a spacer did not contain a matching native 5’-G, this nucleotide was appended upstream the targeting sequence in order to allow efficient transcription from a U6 promoter. Table 23.
  • nucleotides highlighted in bold represent 5’-G appended to favor transcription from canonical U6 Pol III promoters.
  • Exemplary amino acid sequences of nuclear localization signal or SV5 tag are listed in Table 24 and Table 25 below. Table 24.
  • Exemplary amino acid sequences of nuclear localization signals Sequence SEQ ID NO: RTADGSEFESPKKKRKV SEQ ID NO: 1800 KRTADGSEFESPKKKRKV SEQ ID NO: 1801 Table 24.
  • PKKKRKV nuclear localization signals Sequence SEQ ID NO: PKKKRKV SEQ ID NO: 1802 PKKKRRV SEQ ID NO: 1803 KRPAATKKAGQAKKKK SEQ ID NO: 1804 YGRKKRRQRRR SEQ ID NO: 1805 RKKRRQRRR SEQ ID NO: 1806 PAAKRVKLD SEQ ID NO: 1807 RQRRNELKRSP SEQ ID NO: 1808 VSRKRPRP SEQ ID NO: 1809 PPKKARED SEQ ID NO: 1810 PQPKKKPL SEQ ID NO: 1811 SALIKKKKKMAP SEQ ID NO: 1812 PKQKKRK SEQ ID NO: 1813 RKLKKKIKKL SEQ ID NO: 1814 REKKKFLKRR SEQ ID NO: 1815 KRKGDEVDGVDEVAKKKSKK SEQ ID NO: 1816 RKCLQAGMNLEARKTKK SEQ ID NO: 1817 NQSSNFGPMKGGNFGGRSSGPYGGGGQYFA
  • the identified Cas9 orthologs coding sequences were modified by the addition of an SV5 tag at the N-terminus and two nuclear localization signals (1 at the N-term and 1 at the C-term) and were human codon-optimized.
  • the constructs for the coding sequences were obtained as synthetic fragments from Twist Bioscience and cloned into an expression vector for in vitro transcription and translation (IVT) (pT7-N-His-GST, Thermo Fisher Scientific).
  • the sgRNAs used to identify Cas9 PAMs were in vitro transcribed (HighYield T7 RNA Synthesis Kit, Jena Bioscience) starting from the amplification (Phusion HF DNA polymerase, Thermo Fisher Scientific) of the plasmid bearing sgRNA.
  • the primers used to generate the IVT templates are listed in Table 24.
  • In vitro transcribed sgRNAs were subsequently purified using the MEGAClear Transcription Clean-up kit (Thermo Fisher Scientific). The in vitro transcription and translation reaction for Cas9 expression was performed according to the manufacturer’s protocol (1-Step Human High-Yield Mini IVT Kit, Thermo Fisher Scientific).
  • the hairpin structure of the sgRNA was generated for visualization after in silico folding using RNA folding form v2.3 (unafold.org) of the sgRNA scaffolds (not including the spacer sequence).
  • the Cas9 orthologous nuclease-guide RNA ribonucleoprotein (RNP) complex was assembled by combining 20 ⁇ L of the supernatant containing soluble Cas9 orthologous polypeptides proteins with 1 ⁇ L of RiboLock RNase Inhibitor (Thermo Fisher Scientific) and 2 ⁇ g of guide RNA (sgRNA, previously transcribed in vitro).
  • the RNP complex was used to digest 1ug of a PAM plasmid DNA library (containing a defined target sequence flanked at the 3’-end by a randomized 8 nucleotide PAM sequence) for 1 hour at 37°C. Sequences of the primers used for PCR amplification of gRNAs used as templates for in vitro transcription are listed in Table 26. Table 26.
  • PCR products were purified using the GeneJet PCR Purification Kit (Thermo Fisher Scientific). The library was analysed with a 71-bp single read sequencing, using a flow cell v2 micro, on an Illumina MiSeq sequencer. Table 28.
  • PAM heatmaps were used to display PAM enrichment, computed dividing the frequency of PAM sequences in the cleaved library by the frequency of the same sequences in a control uncleaved library.
  • 8.2.1.5 Cell lines and cell culture conditions [0368] U2OS-EGFP cells, harboring a single integrated copy of an EGFP reporter gene, were cultured in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies), 2 mM L-Glutamine (Life Technologies) and penicillin/streptomycin (Life Technologies). All cells were incubated at 37°C and 5% CO2 in a humidified atmosphere. All cells tested mycoplasma negative (PlasmoTest, Invivogen). 8.2.1.6.
  • the discovered Cas9 orthologs were filtered based on: i) the length of their coding sequence, discarding those amino acid sequences having less than 950; ii) their origin from putative unknown species, and iii) the presence of intact nuclease domains.
  • Cas9 orthologous proteins with high sequence similarity were clustered together and the orthologs with the greater sequence representation in the original metagenomic library were selected for each cluster.
  • a set of particular interest was identified.
  • each CRISPR-Cas polypeptide is arbitrary and unique: • FRDB CRISPR-Cas polypeptide, originating from an unclassified bacterium from the Lachnospiraceae family, 1159 aa long. • DNDB CRISPR-Cas polypeptide, originating from Eubacterium sp. AF22-8LB, 1355 aa long. • GMIE CRISPR-Cas polypeptide, originating from Alloprevotella tannerae, 1223 aa long. • ARHP CRISPR-Cas polypeptide, originating from Catenibacterium sp. AM22-15, 1113 aa long.
  • the in vitro PAM assay uses in vitro translated CRISPR-Cas polypeptide proteins coupled with an in vitro synthesized sgRNA to generate a functional ribonucleoprotein complex to cleave a plasmid library characterized by a defined target sequence followed by a randomized 8 nt stretch corresponding to the putative PAMs. Cleaved PAMs were identified and sequenced by next generation sequencing.
  • Table 32 below shows the PAM preferences as determined based on the assay outcome.
  • the PAM logos and the PAM heatmaps depicting the nucleotide preferences for specific positions along the PAMs are shown in FIGs.27A-27F and 28A-28D. Table 32.
  • Cas9 orthologue nuclease ID and corresponding PAM sequences Cas9 orthologue nuclease ID SEQ ID NO
  • FRDB SEQ ID NO: 721 NNNNRYA NNNNGYA NNNNACY DNDB
  • SEQ ID NO: 362 NNGG NNGGNG GMIE
  • SEQ ID NO: 444 NNNNNCC ARHP SEQ ID NO: 768 NYAAR NYAGG DUUZ
  • the human codon optimized nucleic acid sequences encoding the Cas9 orthologue nuclease ID and their amino acid sequences are listed in Table 33.
  • exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) FRDB ATGGGCAAACCTATCCCCAATCCCCTGCT SEQ ID MGKPIPNPLLGLDS SEQ ID GGGCCTGGACAGCACCAAGCGGACTGCC NO: TKRTADGSEFESPK NO: GACGGCAGTGAGTTCGAGTCTCCCAAGA 1378 KKRKVGSGEMSKT 1383 AAAAGAGGAAAGTGggatccGGCGAGATGT NMDRHRSYRIGLDI CCAAAACCAACATGGACAGACACAGAA GIASVGWAVLENN GCTACAGAATCGGCCTGGATATCGGCAT SQDEPIRILDLGVRI TGCCAGCGTGGGATGGGCCGTGTTAGAG FEAAEDSQ
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) CCATGAAAAAGATCATGCCCTACATGAA REFKERNLNDTKYI AGAAGGCATGGTGTACGACAAGGCCTGC TRVIYNMIRQNLEM GAGGCCGCTGGCTACGATTTCAAGAACG EPLNREDRKKQVF ACAACCACGGCAAGAAAATGAAGCTGCT AVNGSITSYLRKR GAAAGGGGAGAACATCACCGAGATCATC WGLPTKDRSTDTH AACGAGATCACAAACCCTGTGGTTAAGA HAMDAVVVACCT GAAGTGTGTCCCAGACCGTGAAGGTCAT DGMIQKISRSTQYR CAATGCCATCATCCTGAAGTACGGCTCTCTC EVLYKGQSH
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) ACAAGCCCAAGGCCGACGGCACCCAGGG CCCTGTGGTGAAAAAGGTGAAAACCTAC AAGAAAATGAGCCTGGGAGTGGAAATCA ATAAGGACGAGAATGGCATCGGTAGAGG CATCGCCAAAAACGGCGACATGATCAGA ATCGACATCTTCAGAGAAAACGGCAAGT ACTATTTCGTTCCTATCTACATCGCCGAT GCCCTGAAGAAGG CTGCCACCCACAGCAAGCCTTACAGCGA GTGGAAGGAGATGAAGGACGAATTTC CTGTTCAGCCTGTACAGTAGAGATCTGAT CGGATTCAAGAACCCCAAGGGTAAGAAG GTGAAG
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) TCCTGAACAACGCCGTGGAACTGCATGA FKYGYEYCKDALV TGTGATCGCCGTGTCCCAGATCCTGAAA NEEESIQNWKLNMI GATCACAATTACATCTGTGAAGTTGCTGT SHCTYLPGEYALPK GGAAAAGCTGAACGAGGTGCTGAAGGTG GSFIAETFNLLNEL CAGAAGCTGGAATATTCCGATCCCGACA NILTAIDKDENRYY AGTACAAGGAGGAAAAAAAGATTATCCA LSREDKLKVFDELF GTCTAAGATGAGCAAAGCTATCGGAGAA LKRTDYVSHKEVA AGACTGCGGGTGGTGAAGAATATGGAAA QLLDL
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag)
  • AGAACAAGAGAAAAGTGGAACTGTACAT ENGNVKEYYFRMR CAGACAGAACGGAAGGGACATGGTGAC VFKNDLFYETESNT
  • SHVYCFSYNDIYKD CTAGAGGCTTTGGAGATAATAGCATGGA LNYLKRELADKFD TAACCTGATGCTGATTAAGAAAGAGATC FKEYIVNPTGHHKF AACGGCAAAAAGGCCGACCGGCTGCCAC TDYTIDELLEFCLD TGGAATACATCGAGGCCGAAACCGTGA
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) TGGAAACCATCGTGTACCTGAACCTCAT CGTGAACCGGGAATGTAGTCCCCTGGAC AGATACAGACCCGGCACAGACATCGTGA AAAAAAAAAACAACGAGGATGCTCAGT ACATCAAGATTAAGAGCAGCATTCTGGG CATCCGGTATACCTACAATGAGAACGGC AAGCTGCTGATCAGCGGCCCTAGAAAGG CCCCTGGCAAATACTCTAAAATCAAGAA GGAGAAGTTCAGCTGGAAGATCTGCAGC GACGTGCTGCAGtctagaAAGCGGACAGCA GACGGCTCCGAATTTGAAAGCCCTAAGA AAAAGAGAAAGGTGTGA GMIE ATGGGCAAACCTATCCCCA
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) GAAGCCCCTGCCTGCCCCACAACGGCTC AYPDAKRNERFDC AGCTGATTAAGGTGTTCGGAGAGGATTG YQLGSPKTNAVRN GAAGGCCGCCCTGGCAGAGACATACACC PMAMRSLHMVRK ATGAGCGCTAAACAAGACGGCACCCTGA VINHLLRKHIIDEKT AAACCGAGGACGAGATCTGTACCGACAT EIHIEYARELNDAN CTGGAACGTCCTGTACTCCTTCAGCTCCA KRQAIADWQRELS AAGAGAAGCTGAAAGAGTTCGGCCTGAA KRHTAYAQNIRQL GAAGCTGCAGCTGGACGAGGCCCAGGCC YKAETGLE
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) TATACCATAGCTTGCATCGGCCCTTGTGA ATACGCCGCCTTAGCCGCATACTACAGA TCTGATGAGGAGTTTAAGTACGGCAGAC GGAGAGAAGCCTCAGTTCGAGAAACC TTGGCCCACCTTCACAGAGGATCTGCTG AAACTTCAGGAAGAACTGCTGATCGTGC ATCAGACCACCGACAAGCTGGGCAAGAG AGATAGAAGAAAGGTGAAAACGCCTAG AGGCAAGTTCCTCACCGGCGGCGACTCC GCTAGAGGCAGACTTCATCAGGAAACCT ACTACGGCGCGATCAATTACGACGGAAA TATCAAATACGTGGTGAGAAAACCCCTG GACTCCCTCACAGAAAAGG
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) TTAGAGTGAAGGGCCTGAAAGAAAAGCT DGKFICEVQLELLN GAGCAACGATGAGCTGGCCACTGCCATC NGESIRGHHNNFKT CTCCACATCACCAAACGGAGAGGCTCTA KDYIKELNEILSHQ CCATTGAGACTGTGGATGACACCGAGGA ELPEEACEAIVQIVS AGCCAGCAAGGAAACCGGCGAGCTGAA RKRAYYEGPGSQK GGAAATCCTTCAGGCCAATAGCAAGAAG SPTPYGRYIEANQK CTGGCCGACGGAAAATTCATCTGCGAGG EPIDLIEKMRGKCS TGCAGCTGGAACTGCTGAACAACGGCGA V
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) GGAAACAGAACCCCTATCGATGCCTACA KLKGENPLTAYKE ACAAGCATCTGTTTGACAATCCCACACT EFGKIRKFSKKGNG GGGCCTGTCCGGAGATCTGCAGACCTAC PEITCIKYYAEKLG ATCAGCGTGGTGCAGAGCAGCAAGGT NHIDISKNYDTKHG TCAGCAGACGCAAGAAGAACTACCTGCT KKVILKQISPYRTD GTACAAGGATGACATCACCAAAATGGAT FYRDTDGKIKMVTI GTAGTGCAGAAATTCATCGCAAGAAACC RYKDVSFKESLGLY TGATCGATACATCATACGCCAACAGAGT CIDRDWYQSE
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) TGTTCTGGGCAACATGTACGAGGTGAAG GACAACAACCTGAAACTGGAATTTAAGtc tagaAAGCGGACAGCAGACGGCTCCGAAT TTGAAAGCCCTAAGAAAAAGAGAAAGGT GTGA DUUZ ATGGGCAAACCTATCCCCAATCCCCTGCT SEQ ID MGKPIPNPLLGLDS SEQ ID GGGCCTGGACAGCACCAAGCGGACTGCC NO: TKRTADGSEFESPK NO: GACGGCAGTGAGTTCGAGTCTCCCAAGA 1382 KKRKVGSTKILGID 1387 AAAAGAGGAAAGTGggatccACCAAGATTC TGTNSLGWAIVENT TGGGAATCGATACAG
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) CGAGGCCAAGATCTTCAGCGAAATCAGC TDKDILKYVLWEE CTGCCCGCCGACTACGCCTCTCTGAGCCT QNHICLYTGKQIRIS GAAAACCATCTGCAAGATCCTGCCCTAC DFIGTNPKFDIEHTI CTGCGCAGAGGCCTGATCTATTCCCACG PRSVGGDSTKMNL CCGTGTTCCTGGGCAACCTGTGCGAGGT TLCDSRFNREVKKT CATGCCCAAATACGAATGGAGCATCAAG KLPVELPSQDEIME GAGATGCGGGAAACCATCATCGACAACAACA RIKEWKKKYELLDI TCATCATGGAAATGGATAGATGCGACAC QIRKL
  • Exemplary human codon optimized nucleic acid sequences and amino acid sequences of Cas9 orthologue nucleases Cas9 Human codon optimized sequence (including SEQ ID Amino acid sequence SEQ ID orthologue 2xNLS, 1 C-term + 1 N-term and detection NO: NO: nuclease ID tag) AGGGCGACGCCGCTAGAGGCTCTCTTCA TAATGACACCTACTACGGCGCCATCGAG AACGACGGCGCCGTGAAGTACGTGAAAC GGATCGACCTGGCTTCCCTGGAAGAGAA GGACGTGAAGAACATCGTGGACGACACA GTGAGGTCTATCATAGAAGCCGCCATCA AAGAAAAGGGCTTCAAGGACGCTATGGC CAGCACCATCTGGATGAATGAGGAAG AGAATCCCTATCAAGAAGGTGCGTTGCT TCACCTGCGTGAAAAATCCCCTCAACTTC GAAAACCGCAAACCCAGAGACATTAGCG ACAAGGAGTACAAGCGCAACTATGT GACAACCGAAGGCAACT
  • a Cas9 orthologue nuclease having a sequence as disclosed in Table 34.
  • a system comprising a Cas9 orthologue nuclease and a crRNA having a sequence as disclosed in Table 34.
  • a system comprising Cas9 orthologue nuclease and a tracrRNA having a sequence as disclosed in Table 34.
  • a system comprising a Cas9 orthologue nuclease and a sgRNA having a sequence as disclosed in Table 34.
  • a system comprising a Cas9 orthologue nuclease and a crRNA, tracrRNA and/or sgRNA having a sequence as disclosed in Table 34.
  • Table 34 Exemplary Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal) Cas9 orthologous nuclease ID AKKH GUUUUGU SEQ UUAUAUUC SEQ GUUUUGUUA SEQ NYAAC (SEQ ID UACCAUA ID UAGCAAAA ID CCAUAGAAA ID NO.: 886) UGAAUUU NO: AUUUUAU NO: UAUGACCUA NO: UUGCUAG 1388 GACCUAAC 1419 ACAAAACAA 1450 AACAGAA AAAACAAG GGGUUUAUC C GGUUUAUC
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologous nuclease ID GCGGCUCC UUUUUU BCRS GUUGUGA SEQ CCUUAUCU SEQ GUUGUGAUU SEQ Prediction SEQ ID UUUGCUU ID UUGCAGUA ID UGCGAAAGC ID : NO.: 254)
  • AGUAUCU 1393 UAAAGCAU 1424 AAGGAUAAU 1455 W UUGCAUA UCACAAUA UCCGUUGUG GGCACAU AGGAUAAU AAAACAUCA ACAAC UCCGUUGU GGUUACCUC GAAAACAU GUCCUAUAA CAGGUUAC AACGGGGCU
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal) Cas9 orthologous nuclease ID AAGZ GUUGUGG SEQ UUCCUACA SEQ GUUGUGGUU SEQ NNNYCC (SEQ ID UUUGAUG ID UCUUAUCA ID UGAGAAAUC ID NO.: 297) UAGGAAU NO: CAAUAAGG NO: UUAUCACAA NO: CAAAAGA 1397 CAGUAAUU 1428 UAAGGCAGU 1459 UAUACAA GCCGAAGG AAUUGCCGA C GUAAAACC AGGGUAAAA NNNCCC UAUAGUCC CCUAUAGUC CGCUUCGG CCGCUUCGG UGGGAUUU UGGGAUUUU UU BAGF GUUGUGG SEQ AAUUCUAC SEQ GU
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologous nuclease ID APSS GUUAUAG SEQ UUUUCAGA SEQ GUUAUAGUU SEQ NARNC SEQ ID UUGACCG ID GAUUAAUU ID GGAAACAAC ID NO.: 242
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologous nuclease ID CGGAUUCG UCUUCGGAG NCGAV GCUCUUCG CCUUUUCUU GAGCCUUU UUUU U CFWW GUUUGAG SEQ GUCUCAAA SEQ GUUUGAGAA SEQ NNYAAT (SEQ ID AAUAGUG ID ACCUUAUG ID UAGAAAUAU ID NO.: 833)
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologous nuclease ID GGGUUUUU U AVWK GUUUGAG SEQ UUUGUAGA SEQ GUUUGAGAG SEQ NRHABT SEQ ID AGCAGUG ID CCCCUAUG ID CAGGAAACU ID NO.: 373) UAAUUCC NO: GAAUUACG NO: GUUGAGUUC NO: AUAGGGG 1413 CUGUUGAG 1444 AAAUAAAAG 1475 UCUCAAA UUCAAAUA UUUACUCAG C AAAGUUUA ACCGUCGGC NRHATT CUCAGACC UUUGACCGA GUCGGCUU CUGCACAGU UGACCGAC GUGUGCGCA UGCACAGU GAGGACUUC GUGUGC
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal)
  • Cas9 orthologue nuclease/sgRNA/PAM systems crRNA (full SEQ tracrRNA SEQ sgRNA SEQ PAM length) ID (full length) ID (trimmed) ID (experime NO: NO: NO: ntal) Cas9 orthologous nuclease ID GCGCCUCU CUAGCGCCU NRNNHT CUUCGGAG CUCUUCGGA TA AGGCUUUU GAGGCUUUU UU UU [0376] The amino acid sequence of nuclease ID CEBN is SEQ ID NO: 1826 as shown: MDNRFVLGIDIGVVSCGYGVIDLVTGRFVDYGVRLFKEGTAEENEKRRGARSRRRLTSRR HTRIMDMQKLLKENGIMSDDYHPLQNVYELRCKGLSEKLTNDELTAVILHITKHRGSVIE TVEEDAKKSDDELSLKATLQHNEQLIKQGKYICQIQLDNLNNKNHIRGHENNFSTKDYV
  • sgRNAs targeting the EGFP coding sequence (3 for each tested CRISPR-Cas polypeptide) were thus designed for all the CRISPR-Cas polypeptides and introduced by electroporation into U2OS cells stably expressing a single copy of an EGFP reporter.
  • three different sgRNAs for each nuclease were amplified using primers listed in Table 23, and were tested in three replicate experiments.
  • most of the tested guides e.g., sgRNA guides
  • their respective CRISPR-Cas polypeptides significantly reduced EGFP expression in target cells to levels similar to the SpCas9 benchmark.
  • FRDB, DNDB and ARHP CRISPR-Cas polypeptides showed high knock-out activity (>80% EGFP KO) with multiple guides; the remaining CRISPR-Cas polypeptide proteins showed appreciable knock-out activity (>60% EGFP KO) with at least one of the tested sgRNAs.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

L'invention concerne des compositions et des procédés pour de nouveaux orthologues Cas9, comprenant, entre autres, des systèmes comprenant de nouvelles nucléases Cas9 et au moins une molécule d'ARN guide. La présente divulgation décrit également des procédés de clivage ou d'édition d'un polynucléotide cible et des procédés de modification génomique d'une séquence cible.
PCT/EP2024/070604 2023-07-21 2024-07-19 Nucléase d'orthologue cas9 et ses utilisations Pending WO2025021702A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363515022P 2023-07-21 2023-07-21
US63/515,022 2023-07-21

Publications (1)

Publication Number Publication Date
WO2025021702A1 true WO2025021702A1 (fr) 2025-01-30

Family

ID=92106990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/070604 Pending WO2025021702A1 (fr) 2023-07-21 2024-07-19 Nucléase d'orthologue cas9 et ses utilisations

Country Status (1)

Country Link
WO (1) WO2025021702A1 (fr)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4833080A (en) 1985-12-12 1989-05-23 President And Fellows Of Harvard College Regulation of eucaryotic gene expression
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
US20150059010A1 (en) 2013-08-22 2015-02-26 Pioneer Hi-Bred International Inc Genome modification using guide polynucleotide/cas endonuclease systems and methods of use
US9982278B2 (en) 2014-02-11 2018-05-29 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
WO2019102381A1 (fr) 2017-11-21 2019-05-31 Casebia Therapeutics Llp Matériaux et méthodes pour le traitement de la rétinite pigmentaire autosomique dominante
US10323123B2 (en) 2012-11-30 2019-06-18 Cambridge Display Technology Limited Method of forming polymers
WO2019165168A1 (fr) 2018-02-23 2019-08-29 Pioneer Hi-Bred International, Inc. Nouveaux orthologues de cas9
US10415058B2 (en) 2017-09-30 2019-09-17 Inscripta, Inc. Automated nucleic acid assembly and introduction of nucleic acids into cells
US10533152B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10532324B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
WO2020012335A1 (fr) 2018-07-10 2020-01-16 Alia Therapeutics S.R.L. Vésicules pour l'administration sans trace de molécules d'arn de guidage et/ou d'un complexe de nucléase guidé par une molécule d'arn guide (es) et son procédé de production
WO2020027982A1 (fr) * 2018-08-02 2020-02-06 Editas Medicine, Inc. Compositions et procédés pour traiter une maladie associée à cep290
WO2020053224A1 (fr) 2018-09-11 2020-03-19 INSERM (Institut National de la Santé et de la Recherche Médicale) Procédés pour augmenter la teneur en hémoglobine fœtale dans des cellules eucaryotes et utilisations correspondantes pour le traitement des hémoglobinopathies
WO2021118626A1 (fr) 2019-12-10 2021-06-17 Inscripta, Inc. Nouvelles nucléases mad
WO2023118349A1 (fr) 2021-12-21 2023-06-29 Alia Therapeutics Srl Protéines cas de type ii et leurs applications

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4833080A (en) 1985-12-12 1989-05-23 President And Fellows Of Harvard College Regulation of eucaryotic gene expression
US10323123B2 (en) 2012-11-30 2019-06-18 Cambridge Display Technology Limited Method of forming polymers
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
US20150059010A1 (en) 2013-08-22 2015-02-26 Pioneer Hi-Bred International Inc Genome modification using guide polynucleotide/cas endonuclease systems and methods of use
US20150082478A1 (en) 2013-08-22 2015-03-19 E I Du Pont De Nemours And Company Plant genome modification using guide rna/cas endonuclease systems and methods of use
US9982278B2 (en) 2014-02-11 2018-05-29 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10240167B2 (en) 2014-02-11 2019-03-26 Inscripta, Inc. CRISPR enabled multiplexed genome engineering
US10266849B2 (en) 2014-02-11 2019-04-23 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10465207B2 (en) 2014-02-11 2019-11-05 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10351877B2 (en) 2014-02-11 2019-07-16 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10364442B2 (en) 2014-02-11 2019-07-30 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10435715B2 (en) 2014-02-11 2019-10-08 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10415058B2 (en) 2017-09-30 2019-09-17 Inscripta, Inc. Automated nucleic acid assembly and introduction of nucleic acids into cells
US10435713B2 (en) 2017-09-30 2019-10-08 Inscripta, Inc. Flow through electroporation instrumentation
US10443074B2 (en) 2017-09-30 2019-10-15 Inscripta, Inc. Modification of cells by introduction of exogenous material
WO2019102381A1 (fr) 2017-11-21 2019-05-31 Casebia Therapeutics Llp Matériaux et méthodes pour le traitement de la rétinite pigmentaire autosomique dominante
WO2019165168A1 (fr) 2018-02-23 2019-08-29 Pioneer Hi-Bred International, Inc. Nouveaux orthologues de cas9
WO2020012335A1 (fr) 2018-07-10 2020-01-16 Alia Therapeutics S.R.L. Vésicules pour l'administration sans trace de molécules d'arn de guidage et/ou d'un complexe de nucléase guidé par une molécule d'arn guide (es) et son procédé de production
WO2020027982A1 (fr) * 2018-08-02 2020-02-06 Editas Medicine, Inc. Compositions et procédés pour traiter une maladie associée à cep290
US10533152B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10532324B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10550363B1 (en) 2018-08-14 2020-02-04 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
WO2020053224A1 (fr) 2018-09-11 2020-03-19 INSERM (Institut National de la Santé et de la Recherche Médicale) Procédés pour augmenter la teneur en hémoglobine fœtale dans des cellules eucaryotes et utilisations correspondantes pour le traitement des hémoglobinopathies
WO2021118626A1 (fr) 2019-12-10 2021-06-17 Inscripta, Inc. Nouvelles nucléases mad
WO2023118349A1 (fr) 2021-12-21 2023-06-29 Alia Therapeutics Srl Protéines cas de type ii et leurs applications

Non-Patent Citations (67)

* Cited by examiner, † Cited by third party
Title
"Methods in Enzymology", vol. 266, 1996, ACADEMIC PRESS, INC., article "Computer Methods for Macromolecular Sequence Analysis"
"Remington's Pharmaceutical Sciences", 1980, MACK PUBLISHING CO.
ALIREZA EDRAKI ET AL: "A Compact, High-Accuracy Cas9 with a Dinucleotide PAM for In Vivo Genome Editing", MOLECULAR CELL, vol. 73, no. 4, 20 December 2018 (2018-12-20), AMSTERDAM, NL, pages 714 - 726.e4, XP055585186, ISSN: 1097-2765, DOI: 10.1016/j.molcel.2018.12.003 *
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
ANTONIO CASINI ET AL: "A highly specific SpCas9 variant is identified by in vivo screening in yeast", NATURE BIOTECHNOLOGY, vol. 36, no. 3, 29 January 2018 (2018-01-29), New York, pages 265 - 271, XP055619847, ISSN: 1087-0156, DOI: 10.1038/nbt.4066 *
BARRANGOUDOUDNA, NAT BIOTECHNOL., vol. 34, no. 9, 2016, pages 933 - 941
BARRANGOUMAY, EXPERT OPIN BIOL THER, vol. 15, 2015, pages 311 - 314
BENSON ET AL., NUCLEIC ACIDS RES., vol. 41, 2013, pages 36 - 42
BISWAS ET AL., BMC GENOMICS, vol. 17, 2016, pages 356
BUJARDGOSSEN, PNAS, vol. 89, no. 12, 1992, pages 5547 - 5551
BUTIUC-KEUL ET AL., MICROB PHYSIOL, vol. 32, 2022, pages 2 - 17
CAHILL ET AL., FRONT. BIOSCI., vol. 11, 2006, pages 1958 - 1976
CAMARILLO-GUERRERO ET AL., CELL, vol. 184, 2021, pages 1098 - 1109
CHYLINKSI ET AL., NUCLEIC ACIDS RES., vol. 42, no. 10, 2014, pages 6091 - 6105
CHYOURBROWN, RNA BIOL., vol. 16, no. 4, 2019, pages 423 - 434
CICIANI ET AL., NAT COMMUN, vol. 13, 2022, pages 6474
CICIANI MATTEO ET AL: "Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data", NATURE COMMUNICATIONS, vol. 13, no. 1, 29 October 2022 (2022-10-29), XP093028371, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-022-34213-9.pdf> DOI: 10.1038/s41467-022-34213-9 *
DATABASE Geneseq [online] 16 July 2015 (2015-07-16), CHARPENTIER E: "Type II CRISPR-Cas system tracrRNA, SEQ ID 2746.", XP093216667, retrieved from EBI accession no. GSN:BCA35771 Database accession no. BCA35771 *
DATABASE Genseq [online] 9 October 2014 (2014-10-09), ZHANG F: "Roseburia intestinalis guide RNA, SEQ ID 154.", XP093216675, Database accession no. BBM45310 *
DATABASE UniProt [online] 7 November 2018 (2018-11-07), CHAUMEIL R A: "CRISPR-associated endonuclease Cas9", XP093216219, Database accession no. A0A354D3T4 *
DOOLEY ET AL., CRISPR J., vol. 4, no. 3, 2021, pages 438 - 447
DOUDNA., NATURE, vol. 578, 2020, pages 229 - 236
DOUDNACHARPENTIER, SCIENCE, vol. 28, no. 6213, 2014, pages 11223096
DUCOEUR ET AL., STRATEGIES, vol. 5, no. 3, 1992, pages 70 - 72
EDGAR, BIOINFORMATICS, vol. 26, 2010, pages 2460 - 2461
EDGAR, BMC BIOINFORMATICS, vol. 5, 2004, pages 113
EDGAR, NUCLEIC ACIDS RESEARCH, vol. 32, no. 5, 2004, pages 1792 - 97
ERIN R. BURNIGHT ET AL: "CRISPR-Cas9 genome engineering: Treating inherited retinal degeneration", PROGRESS IN RETINAL AND EYE RESEARCH, vol. 65, 1 July 2018 (2018-07-01), GB, pages 28 - 49, XP055751294, ISSN: 1350-9462, DOI: 10.1016/j.preteyeres.2018.03.003 *
GARDNER ET AL., NUCLEIC ACIDS RES., vol. 39, 2011, pages 5845 - 5852
GARNEAU ET AL., NATURE, vol. 468, 2010, pages 67 - 71
GASIUNAS ET AL., NAT. COMMUN., vol. 11, 2020, pages 5512
GASIUNAS GIEDRIUS ET AL: "A catalogue of biochemically diverse CRISPR-Cas9 orthologs", NATURE COMMUNICATIONS, vol. 11, no. 1, 2 November 2020 (2020-11-02), pages 1 - 10, XP055929454, Retrieved from the Internet <URL:http://www.nature.com/articles/s41467-020-19344-1> DOI: 10.1038/s41467-020-19344-1 *
HENIKOFFHENIKOFF, PROC. NATL. ACAD. SCI. USA, vol. 89, 1989, pages 10915
HORVATH ET AL., J. BACTERIOL., vol. 190, 2008, pages 1401 - 1412
HSU ET AL., CELL, vol. 157, 2013, pages 11228 - 1278
J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
JINEK ET AL., SCIENCE, vol. 337, 2012, pages 816 - 821
JINEK ET AL., SCIENCE, vol. 337, 2021, pages 816 - 821
JORE, M.M. ET AL., NAT. STRUCT. MOL. BIOL., vol. 18, 2011, pages 529 - 536
KARVELIS ET AL.: "A pipeline for characterization of novel Cas9 orthologs", METHODS IN ENZYMOLOGY, vol. 616, 2019, pages 219 - 240, XP055902982, DOI: 10.1016/bs.mie.2018.10.021
KATOH ET AL., NUCLEIC ACIDS RESEARCH, vol. 30, no. 14, 2002, pages 3059 - 3066
LANDRUM ET AL., NUCLEIC ACIDS RES., vol. 46, 2018, pages 1062 - 1067
MAKAROVA ET AL., NAT REV MICROBIOL, vol. 18, no. 2, 2020, pages 67 - 83
MARTIN ET AL., HELV. CHIM. ACTA, vol. 78, 1995, pages 486
METH. MOL. BIOL., vol. 70, 1997, pages 173 - 187
MIYAGISHI ET AL., NATURE BIOTECHNOLOGY, vol. 20, 2002, pages 497 - 500
MULLICK ET AL., BMC BIOTECHNOLOGY, vol. 6, 2006, pages 43
NAKAMURA, Y. ET AL.: "Codon usage tabulated from the international DNA sequence databases: status for the year 2000", NUCL. ACIDS RES., vol. 28, 2000, pages 292, XP002941557, DOI: 10.1093/nar/28.1.292
NAYFACH S ET AL., NAT. MICROBIOL., vol. 6, 2021, pages 960 - 970
NO ET AL., PNAS, vol. 93, no. 8, 1996, pages 3346 - 3351
PASOLLI ET AL., CELL, vol. 176, 2019, pages 649 - 662
RAN ET AL., NATURE, vol. 520, 2015, pages 186 - 191
RUSSEL ET AL., CRISPR J, vol. 3, 2020, pages 462 - 469
SHIELDS ET AL., PLOS PATHOG., vol. 16, 2020, pages e1008344
SHMAKOV ET AL., CRISPR J., vol. 3, no. 6, 2020, pages 462 - 469
SINKUNAS, T. ET AL., EMBO J., vol. 32, 2013, pages 385 - 394
SMITHJOHNSON, GENE, vol. 67, 1988, pages 31 - 40
SMITHWATERMAN, ADV. APPL. MATH, vol. 2, 1981, pages 482 - 489
TAREENKINNEY, BIOINFORMATICS, vol. 36, 2020, pages 2272 - 2274
TONG ET AL., FRONT CELL DEV BIOL, vol. 8, 2020, pages 622103
WEIZHONGGODZIK, BIOINFORMATICS, vol. 22, no. 13, 2006, pages 1658 - 1659
WESTRA, E.R. ET AL., MOLECULAR CELL, vol. 46, 2012, pages 595 - 605
WYBORSKI ET AL., ENVIRON MOL MUTAGEN, vol. 28, no. 4, 1996, pages 447 - 58
XIA ET AL., NUCLEIC ACIDS RES., vol. 17, 1 September 2003 (2003-09-01), pages 31
ZHANG ET AL., NUCLEIC ACIDS RESEARCH, vol. 24, 1996, pages 543 - 548
ZHANGMADDEN, GENOME RES., vol. 7, 1997, pages 649 - 656
ZOLFO M ET AL.: "Detecting contamination in viromes using ViromeQC.", NAT. BIOTECHNOL., vol. 37, 2019, pages 1408 - 1412, XP036954237, DOI: 10.1038/s41587-019-0334-5

Similar Documents

Publication Publication Date Title
JP6974349B2 (ja) ヘモグロビン異常症の処置のための材料及び方法
JP7038079B2 (ja) Crisprハイブリッドdna/rnaポリヌクレオチドおよび使用方法
US12152259B2 (en) Modified CAS9 protein, and use thereof
US20210324382A1 (en) Chimeric DNA:RNA Guide for High Accuracy Cas9 Genome Editing
CN109715801B (zh) 用于治疗α1抗胰蛋白酶缺乏的材料和方法
KR102728416B1 (ko) A형 혈우병을 위한 유전자 편집용 조성물 및 방법
JP2024503437A (ja) プライム編集効率及び精度を向上させるためのプライム編集因子バリアント、構築物、及び方法
US20190382798A1 (en) Materials and methods for treatment of glycogen storage disease type 1a
WO2019092505A1 (fr) Systèmes crispr/cas ou crispr/cpf1 à auto-inactivation (sin) et leurs utilisations
JP2020508659A (ja) プロタンパク質転換酵素サブチリシン/ケキシン9型(pcsk9)関連障害の処置のための組成物および方法
BR112016000571B1 (pt) Métodos in vitro para modular a expressão e para alterar um ou mais ácidos nucleicos alvo em uma célula simultaneamente com a regulação da expressão de um ou mais ácidos nucleicos alvo em uma célula, bem como célula de levedura ou bactéria compreendendo ácidos nucleicos
WO2019204668A1 (fr) Compositions et procédés d&#39;inactivation de l&#39;apo (a) par édition génique pour le traitement d&#39;une maladie cardiovasculaire
CN117222743A (zh) 向导RNA设计和用于无Tracr V型Cas系统的复合物
JP2024501892A (ja) 新規の核酸誘導型ヌクレアーゼ
WO2025021702A1 (fr) Nucléase d&#39;orthologue cas9 et ses utilisations
CN113710284A (zh) 具有改善的因子viii表达的血友病a基因编辑
Esquerra et al. Identification of the EH CRISPR-Cas9 system on a metagenome and its application to genome engineering
Chen et al. Post-cleavage target residence determines asymmetry in non-homologous end joining of Cas12a-induced DNA double strand breaks
WO2025076291A1 (fr) Endonucléases de type v modifiées programmables par arn et leurs utilisations
Doyon A marker-free co-selection strategy for high efficiency human genome engineering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24748310

Country of ref document: EP

Kind code of ref document: A1