WO2025132815A1 - Novel cas nucleases and polynucleotides encoding the same - Google Patents
Novel cas nucleases and polynucleotides encoding the same Download PDFInfo
- Publication number
- WO2025132815A1 WO2025132815A1 PCT/EP2024/087441 EP2024087441W WO2025132815A1 WO 2025132815 A1 WO2025132815 A1 WO 2025132815A1 EP 2024087441 W EP2024087441 W EP 2024087441W WO 2025132815 A1 WO2025132815 A1 WO 2025132815A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- nuclease
- cell
- sequence
- polypeptide
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/80—Vectors or expression systems specially adapted for eukaryotic hosts for fungi
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
- C12N9/222—Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
- C12N9/226—Class 2 CAS enzyme complex, e.g. single CAS protein
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/645—Fungi ; Processes using fungi
- C12R2001/66—Aspergillus
- C12R2001/685—Aspergillus niger
Definitions
- the present invention relates to novel CRISPR-associated (Cas) nucleases, variants thereof, and polynucleotides encoding the same.
- the invention also relates to nucleic acid constructs, vectors, and host cells comprising the polynucleotides and Cas nuclease, as well as fusion polypeptides, gene editing methods, methods of producing the Cas nuclease, formulations comprising the Cas nuclease, and use of the Cas nuclease.
- RNA-guided nucleases particularly CRISPR-associated (Cas) proteins.
- CRISPR-associated (Cas) proteins allow specific targeting of genetic sequences using guide RNA, streamlining genome editing by eliminating the need for custom-engineered nucleases.
- RNA-guided nucleases including CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA), offer versatile genome editing options, from introducing mutations via non-homologous end-joining (NHEJ), to precise base editing when fused with deaminases.
- NHEJ non-homologous end-joining
- RNA-guided nucleases bind and cleave nucleic acids with sequence-specificity. They exhibit activities like cis cleavage or nickase activity, guided by specialized RNA molecules. These nucleases can be engineered to reduce catalytic activity while maintaining sequence specificity, expanding their utility.
- CRISPR systems in bacterial and archaeal adaptive immunity display diverse characteristics. Differences in size, PAM site, on-target activity, and cleavage pattern offer unique advantages for various applications, but can also represent limitations (e.g., low frequency of PAM sites in the target cell genome, or low expressability). Novel Cas nucleases are essential to address evolving genome engineering demands. Summary of the Invention
- the inventors of instant invention have identified novel Cas nucleases which showed nuclease activities across different organisms, including bacterial and fungal species.
- the novel nucleases possess several advantages over the nucleases known from the prior art.
- the herein identified novel Cas nucleases are compatible with more flexible PAM sites, resulting in a higher number of possible target sites per genome.
- the herein identified Cas nucleases have a relatively small size and have thus a promising outlook to be used in pharmaceutical approaches.
- the relatively small nuclease size is beneficial for high expression in recombinant cells, particularly bacterial cells.
- Cas nucleases that have different PAM specificities.
- Cas nucleases such as Cas9 from S. pyogenes (spCas9), require a canonical “nGG” PAM sequence to bind a particular nucleic acid region. This may limit the ability to target desired bases within a genome.
- the Cas nucleases provided herein may need to be placed at a precise location, for example where a target base is placed within a 4-base region (e.g., a “editing window”), which is approximately 15 bases upstream of the PAM.
- the invention in a 2 nd aspect, relates to a fusion polypeptide comprising the Cas nuclease of the 1 st aspect, and one or more second polypeptide.
- the present invention relates to a non-naturally occurring composition
- a non-naturally occurring composition comprising (i) the Cas nuclease of the 1 st aspect and/or the fusion polypeptide of the 2 nd aspect, or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease of the 1 st aspect and/or the fusion polypeptide of the 2 nd aspect.
- the present invention relates to a method of modifying a nucleotide sequence at a DNA target site in the genome of a cell, comprising introducing into the cell the Cas nuclease of the 1 st aspect, the fusion polypeptide according to the 2 nd aspect, the composition of the 3 rd aspect, the polynucleotide of the 5 th aspect, and/or the nucleic acid construct or expression vector of the 6 th aspect.
- the present invention relates to a polynucleotide encoding the Cas nuclease of the 1 st aspect, and/or the fusion polypeptide of the 2 nd aspect.
- the present invention relates to a nucleic acid construct or expression vector comprising the polynucleotide of the 5 th aspect, operably linked to one or more control sequences that direct the production of the polypeptide in a cell.
- the present invention relates to a formulation comprising (i) the Cas nuclease according to the 1 st aspect, the fusion polypeptide according to the 2 nd aspect, a composition according to the 3 rd aspect, the polynucleotide according to the 5 th aspect, the nucleic acid construct or expression vector according to the 6 th aspect, the cell according to the 7 th aspect, or the cell according to the 8 th aspect, and optionally, (ii) one or more of a lipid, a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle.
- Figure 1 shows the bioinformatic pipeline developed to identify novel CRISPR-Cas systems in genome sequences.
- Genome sequences from various sources are processed with three state-of-the-art sequence mining tools to identify Cas nuclease genes, CRISPR arrays, and tracrRNA sequences.
- the pool of potential Cas proteins is further enriched by scanning the genomes with custom HMMs and filtered for the presence of domains and residues required for endonuclease activity.
- the three functional elements Cas, CRISPR array, and tracrRNA are mapped to each other via there genomic locus as well as the complementarity of the CRISPR repeats and the tracrRNAs.
- Figures 3-6 show ratios of white/black colonies after transformation of A. niger with CRISPR nucleases.
- Figures 7-10 show insertions I deletions generated by the CRISPR nucleases in the A. niger genome.
- FIGS 11-13 show CRISPR plasmids used in A. niger.
- Figure 14 shows a schematic drawing of the plasmid pTNA665 (PamyL-NZ0076).
- Figure 15 shows a schematic drawing of the plasmid pTNA666 (Pgrac-NZ0076).
- Figure 16 shows a schematic drawing of the plasmid pTNA669 (separate guide RNA for NZ0076).
- Figure 17 shows a schematic drawing of the plasmid pTNA670 (single guide RNA for NZ0076).
- Figure 18 shows the killing effect based on nuclease activity in E. coli.
- Figure 19 shows gene editing efficiency in E. coli.
- Figure 20 shows a ribbon alignment between the protein structures from nucleases 0076 and 0172.
- Figure 21 shows a ribbon alignment between the protein structures from nucleases 0100 and 0172.
- Figure 22 shows a ribbon alignment between the protein structures from nucleases 0076 and 0102.
- Figure 23 shows a ribbon alignment between the protein structures from nucleases 0076 and S. pyogenes Cas9.
- Figure 24 shows a ribbon alignment between the protein structures from nucleases 0076 and 0100.
- the nuclease is comprising one or more functional RuvC domain.
- the RuvC domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 528, 549 or 550.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a S. thermophilus cell. In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a filamentous fungal cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a P. pastoris cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in an Aspergillus oryzae cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a Trichoderma reesei cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a Lactobacillus cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a probtiotic cell.
- the polynucleotide encoding the nuclease is codon-optimized for expression in a S. cerevisiae cell.
- the nuclease is a Class 2 Cas nuclease.
- the nuclease is a Class 2 Type II Cas nuclease.
- the nuclease is a Class 2 Type-ll-A Cas nuclease.
- the nuclease is a Class 2 Type-ll-B Cas nuclease.
- the nuclease is a Class 2 Type I l-C Cas nuclease.
- the nuclease utilizes a protospacer adjacent motif (PAM) sequence provided for the nuclease in Table 1.
- PAM protospacer adjacent motif
- the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnAY”.
- PAM protospacer adjacent motif
- the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnGHMA”.
- PAM protospacer adjacent motif
- the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnGTA”.
- PAM protospacer adjacent motif
- the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnAMA”.
- PAM protospacer adjacent motif
- the nuclease is non-naturally occurring, e.g., wherein the nuclease is engineered and comprises unnatural or synthetic amino acids.
- the HNH domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K640 corresponding to position 640 of SEQ ID NO: 1.
- the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D593 corresponding to position 593 of SEQ ID NO: 21.
- the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N617 corresponding to position 617 of SEQ ID NO: 21.
- the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K620 corresponding to position 620 of SEQ ID NO: 21.
- the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H611 corresponding to position 611 of SEQ ID NO: 40.
- the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N634 corresponding to position 634 of SEQ ID NO: 40.
- the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K637 corresponding to position 637 of SEQ ID NO: 40.
- the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D601 corresponding to position 601 of SEQ ID NO: 39. In one embodiment, the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H602 corresponding to position 602 of SEQ ID NO: 39.
- the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N625 corresponding to position 625 of SEQ ID NO: 39.
- the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K628 corresponding to position 628 of SEQ ID NO: 39.
- the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D605 corresponding to position 605 of SEQ ID NO: 48.
- the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H606 corresponding to position 606 of SEQ ID NO: 48.
- the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N629 corresponding to position 629 of SEQ ID NO: 48.
- the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K632 corresponding to position 632 of SEQ ID NO: 48.
- the polypeptide is derived from SEQ ID NO: 1 - 52 by substitution, deletion or addition of one or several amino acids.
- the polypeptide is a variant of SEQ ID NOs: 1 - 52 comprising a substitution, deletion, and/or insertion at one or more positions.
- the number of amino acid substitutions, deletions and/or insertions introduced into the polypeptide of SEQ ID NOs: 1-52 is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15.
- the amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding of the Cas nuclease; small deletions, typically of 1-30 amino acids; small amino or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a polyhistidine tract, an antigenic epitope or a binding module.
- the nuclease is a nickase having one or more inactivated RuvC domain created by an amino acid substitution, insertion, or deletion at a position provided for the nuclease in column 3 of Table 2.
- the RuvC domain is derived from an amino acid sequence provided for in column 2 of Table 2, and/or at one or more positions provided for in column 3 of Table 2, by substitution, deletion or addition of one or several amino acids.
- the number of amino acid substitutions, deletions and/or insertions introduced into the RuvC domain is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15.
- the Cas nuclease is a polypeptide obtained from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell.
- the Cas nuclease is a polypeptide obtained from a Vagococcus cell, e.g., a Vagococcus penaei cell.
- a cell from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell, from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell, from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell, from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell, or from a Vagococcus cell, e.g., a Vagococcus penaei cell
- the Cas nuclease is obtained from or obtainable from a Lactobacillus cell, e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
- a Lactobacillus cell e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jen
- the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivalents.
- TM-align is applied.
- TM-score is integrated in the TM-align software, which is available from the author’s website.
- the version of TM-align is preferably updated 2019-08-22 or later, and the TM-score between a reference and query protein is determined by running this command:
- the one or more RNA molecule is a single-molecule RNA (sgRNA), e.g., wherein the crRNA and the tracrRNA are part of the same RNA molecule.
- sgRNA single-molecule RNA
- the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO
- the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53.
- the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 92.
- the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 100.
- the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 81.
- the one or more RNA molecule comprises a trans activating RNA (tracrRNA) encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 157 - 208.
- tracrRNA trans activating RNA
- the one or more RNA molecule comprises a trans activating RNA (tracrRNA) encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ NO: 157, SEQ ID NO: 177, SEQ ID NO: 196, SEQ ID NO: 195, SEQ ID NO: 204, or SEQ ID NO: 185.
- tracrRNA trans activating RNA
- At least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID Nos: 209 - 260.
- crRNA CRISPR RNA
- At least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NO: 209, SEQ ID NO: 229, SEQ ID NO: 248, SEQ ID NO: 247, SEQ ID NO: 256, or SEQ ID NO: 237.
- crRNA CRISPR RNA
- At least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 261 - 312.
- At least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NO: 261 , SEQ ID NO: 281 , SEQ ID NO: 300, SEQ ID NO: 299, SEQ ID NO: 308, or SEQ ID NO: 289.
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule is a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1
- the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleot
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21
- the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleot
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleo
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39
- the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleot
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleo
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1
- the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21
- the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleot
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39
- the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 29, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of S
- the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any of the polynucleotide sequences of column
- the composition further comprises a base editor enzyme.
- the base editor enzyme is an adenosine deaminase or a cytidine deaminase.
- the composition further comprises a reverse transcriptase enzyme.
- Table 4 discloses which crRNA and tracrRNA coding sequences are associated with each of novel Cas nuclease.
- the Cas nuclease 0076 with SEQ ID NO: 1 utilizes the crRNA sequence encoded by SEQ ID NO: 209, and the tracrRNA sequence encoded by SEQ ID NO: 157.
- the Cas nuclease with SEQ ID NO: 1 utilizes a gRNA or sgRNA sequence encoded by SEQ ID NO: 261 , comprising both the crRNA sequence encoded by SEQ ID NO: 209 and the tracrRNA sequence encoded by SEQ ID NO: 157.
- the DNA-break is a double-strand break.
- the method is carried out under conditions that are permissive for non-homologous end joining (NHEJ), and homology-directed repair (HDR).
- NHEJ non-homologous end joining
- HDR homology-directed repair
- the method is carried out under conditions that are permissive for non-homologous end joining (NHEJ).
- NHEJ non-homologous end joining
- the method is carried out under conditions that are permissive for homology-directed repair (HDR).
- HDR homology-directed repair
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a PAM sequence, e.g., adjacent to the PAM sequence “nnAY”, or adjacent to any one of the PAM sequences mentioned in Table 1.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnGHMA”, e.g., “nnGHCA”, “nnGGCA”, or “nnGCMA”.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand, e.g. Aspergillus DNA strand, adjacent to the PAM sequence “nnGTA”.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnAMA”, e.g., “nnAAA”.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnRHRD”, e.g., “nnACAG”, “nnACAR”, “nnATGT”, or “nnAYRD”.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand, e.g. Bacillus subtilis DNA, adjacent to the PAM sequence “ATGTCA”, “CCATA”, “TTACA”, or “TTACAA”. In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in an Aspergillus DNA strand adjacent to the PAM sequence “nnGHCA”, “nnGTA”, or “nnACAG”.
- the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a sequence that is complementary to the PAM sequence.
- the target site is within a coding region of a protein.
- the target site is within a non-coding region of a protein.
- the target site is within a regulatory region of a protein, e.g., a promoter.
- the cell is a eukaryotic cell.
- the cell is a prokaryotic cell.
- the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- a mammalian cell such as a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
- the fungal cell is a Pichia cell, e.g., a Pichia pastoris cell.
- the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- yeast cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomy
- the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger
- the cell is a Trichoderma cell.
- the cell is a Trichoderma reesei cell.
- the cell is an Aspergillus cell.
- the cell is an Aspergillus niger cell.
- the cell is an Aspergillus oryzae cell.
- the cell is a plant cell.
- the plant cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
- Lacticaseibacillus casei Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus helveticus, Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausi
- the cell is a Bacillus cell.
- the cell is a Bacillus subtilis cell. In one embodiment, the cell is a Bacillus licheniformis cell.
- the cell is a Lacticaseibacillus paracesei cell.
- the cell is a Streptococcus thermophilus cell.
- the cell is is a E. coli cell.
- Cas nucleases Upon target recognition, Cas nucleases induce double-strand breaks in the target sequence, which when repaired by non-homologous end joining (NHEJ) can result in frameshift mutations and gene knockdown.
- the frameshift mutation caused by error-prone NHEJ may include nucleotide insertions or deletions (indels).
- indels nucleotide insertions or deletions
- HDR homology-directed repair
- HDR refers to a mechanism for repairing DNA damage in cells, for example, during repair of double-stranded and single- stranded breaks in DNA.
- HDR requires nucleotide sequence homology and uses a "nucleic acid template” (nucleic acid template or donor template used interchangeably herein) to repair the sequence where the doublestranded or single break occurred (e.g., DNA target sequence). This results in the transfer of genetic information from, for example, the nucleic acid template to the DNA target sequence.
- HDR may result in alteration of the DNA target sequence (e.g., insertion, deletion, mutation) if the nucleic acid template sequence differs from the DNA target sequence and part or all of the nucleic acid template polynucleotide or oligonucleotide is incorporated into the DNA target sequence.
- an entire nucleic acid template polynucleotide, a portion of the nucleic acid template polynucleotide, or a copy of the nucleic acid template is integrated at the site of the DNA target sequence.
- nucleic acid template and “donor”, refer to a nucleotide sequence that is inserted or copied into a genome.
- the nucleic acid template comprises a nucleotide sequence, e.g., of one or more nucleotides, that will be added to or will template a change in the target nucleic acid or may be used to modify the target sequence.
- a nucleic acid template sequence may be of any length, for example between 2 and 10,000 nucleotides in length (or any integer value there between or there above), preferably between about 100 and 1 ,000 nucleotides in length (or any integer there between), more preferably between about 200 and 500 nucleotides in length.
- a nucleic acid template may be a single-stranded nucleic acid, a double-stranded nucleic acid.
- the nucleic acid template comprises a nucleotide sequence, e.g., of one or more nucleotides, that corresponds to wild type sequence of the target nucleic acid, e.g., of the target position.
- the nucleic acid template comprises a ribonucleotide sequence, e.g., of one or more ribonucleotides, that corresponds to wild type sequence of the target nucleic acid, e.g., of the target position.
- the nucleic acid template comprises modified ribonucleotides.
- donor sequence also called a "donor sequence,” donor template” or “donor”
- donor sequence can be carried out.
- the donor sequence is typically not identical to the genomic sequence where it is placed.
- a donor sequence can contain a non- homologous sequence flanked by two regions of homology to allow for efficient HDR at the location of interest.
- donor sequences can comprise a vector molecule containing sequences that are not homologous to the region of interest in cellular chromatin.
- a donor molecule can contain several, discontinuous regions of homology to cellular chromatin. For example, for targeted insertion of sequences not normally present in a region of interest, said sequences can be present in a donor nucleic acid molecule and flanked by regions of homology to sequence in the region of interest.
- Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues.
- embodiments of the present invention using a donor template for repair may use a DNA or RNA, single-stranded and/or double-stranded donor template that can be introduced into a cell in linear or circular form.
- a donor sequence may also be an oligonucleotide and be used for gene correction or targeted alteration of an endogenous sequence.
- the oligonucleotide may be introduced to the cell on a vector, may be electroporated into the cell, or may be introduced via other methods known in the art.
- the oligonucleotide can be used to correct a mutated sequence in an endogenous gene (e.g., the sickle mutation in beta globin), or may be used to insert sequences with a desired purpose into an endogenous locus.
- a polynucleotide can be introduced into a cell as part of a vector molecule having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance.
- donor polynucleotides can be introduced as naked nucleic acid, as nucleic acid complexed with an agent such as a liposome or poloxamer, or can be delivered by recombinant viruses (e.g., adenovirus, AAV, herpesvirus, retrovirus, lentivirus and integrase defective lentivirus (IDLV)).
- recombinant viruses e.g., adenovirus, AAV, herpesvirus, retrovirus, lentivirus and integrase defective lentivirus (IDLV)
- the donor is generally inserted so that its expression is driven by the endogenous promoter at the integration site, namely the promoter that drives expression of the endogenous gene into which the donor is inserted.
- the donor may comprise a promoter and/or enhancer, for example a constitutive promoter or an inducible or tissue specific promoter.
- the donor molecule may be inserted into an endogenous gene such that all, some or none of the endogenous gene is expressed.
- a transgene as described herein may be inserted into an endogenous locus such that some (N-terminal and/or C-terminal to the transgene) or none of the endogenous sequences are expressed, for example as a fusion with the transgene.
- the transgene (e.g., with or without additional coding sequences such as forthe endogenous gene) is integrated into any endogenous locus, for example a safe-harbor locus, for example a CCR5 gene, a CXCR4 gene, a PPPIR12c (also known as AAVS1) gene, an albumin gene or a Rosa gene.
- a safe-harbor locus for example a CCR5 gene, a CXCR4 gene, a PPPIR12c (also known as AAVS1) gene, an albumin gene or a Rosa gene.
- the endogenous sequences When endogenous sequences (endogenous or part of the transgene) are expressed with the transgene, the endogenous sequences may be full-length sequences (wild-type or mutant) or partial sequences. Preferably the endogenous sequences are functional. Non-limiting examples of the function of these full length or partial sequences include increasing the serum half-life of the polypeptide expressed by the transgene (e.g., therapeutic gene) and/or acting as a carrier.
- exogenous sequences may also include transcriptional or translational regulatory sequences, for example, promoters, enhancers, insulators, internal ribosome entry sites, sequences encoding 2A peptides and/or polyadenylation signals.
- the donor molecule comprises a sequence selected from the group consisting of a gene encoding a protein (e.g., a coding sequence encoding a protein that is lacking in the cell or in the individual or an alternate version of a gene encoding a protein), a regulatory sequence and/or a sequence that encodes a structural nucleic acid such as a microRNA or siRNA.
- the present invention also relates to polynucleotides encoding the Cas nuclease of the 1 st aspect, and/or the fusion polypeptide of the 2 nd aspect.
- the polynucleotide comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 53
- the polynucleotide comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81.
- the polynucleotide is obtained from a Bacillus cell, e.g., a Bacillus sp- 63030 cell.
- the polynucleotide is obtained from a Ruminococcus cell, e.g., a Ruminococcus sp. cell.
- the polynucleotide is obtained from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell.
- the polynucleotide is obtained from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell.
- a Enterococcus cell e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell.
- the polynucleotide is obtained from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell.
- the polynucleotide is obtained from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell.
- the polynucleotide is obtained from a Vagococcus cell, e.g., a Vagococcus penaei cell.
- the polynucleotide is obtained from a Lactobacillus cell.
- the Lactobacillus cell is a Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
- DSM 20184 Lactobacillus farciminis
- Lactobacillus farciminis Lactobacillus murinus
- Lactobacillus ruminis Lactobacillus salivarius
- Lactobacillus jensenii Lactobacillus hamster
- polynucleotide encoding the Cas nuclease is isolated from a Lactobacillus cell.
- the polynucleotide is a subsequence encoding a fragment having Cas nuclease activity and/or DNA binding activity of the present invention.
- the polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence.
- nucleotide substitutions see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
- the present invention relates to a nucleic acid construct or expression vector comprising the polynucleotide according to the 5 th aspect of the invention, operably linked to one or more control sequences that direct the production of the nuclease or fusion polypeptide in a cell.
- the present invention also relates to nucleic acid constructs or expression vectors comprising a polynucleotide of the present invention, wherein the polynucleotide is operably linked to one or more control sequences that direct the expression of the coding sequence in a suitable host cell under conditions compatible with the control sequences.
- the polynucleotide may be manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the polynucleotide prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying polynucleotides utilizing recombinant DNA methods are well known in the art.
- the genome of the recombinant cell comprises at least two copies, e.g., three, four, or five, or more copies of a polynucleotide encoding the Cas nuclease of the 1 st aspect or fusion polypeptide of the 2 nd aspect, of a polynucleotide of the 5 th aspect, or of a nucleic acid construct or expression vector of the 6 th aspect.
- the invention relates to cells comprising a genome which was modified by the Cas nuclease of the 1 st aspect, the fusion polypeptide of the 2 nd aspect, the composition of the 3 rd aspect, the method of the 4 th aspect, the polynucleotide of the 5 th aspect, and/or the nucleic acid construct or expression vector of the 6 th aspect.
- the cell is a eukaryotic cell.
- the cell is a prokaryotic cell.
- the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
- the cell is a Pichia pastoris cell.
- the cell is a Saccharomyces cerevisiae cell.
- the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- yeast cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomy
- the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger
- the bacterial host cell may also be any Streptococcus cell including, but not limited to, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus cells.
- a fwnA (wA) knockout-induced spore colour change assay was used.
- 21bp spacer sequences with expected activity targeting A. niger fwnA were designed based on the hypothetical PAMs of the respective nucleases.
- Screening plasmids were designed to contain two expression cassettes to express the sgRNA (each targeting a different region of fwnA) and nuclease.
- sgRNAs were derived from concatenation of the spacer, direct repeat and tracrRNA, while the nuclease gene sequences were codon optimized for A. niger.
- Transformation of nuclease 076 with SEQ ID NO: 1 was done using a thiamine- regulatable system, with three different thiamine concentrations tested (0.02, 0.4, and 5 pM).
- Figure 4 shows colony number and ratio of white/black phenotypes obtained after 076 transformation with thiamine supplementation at concentrations of 0.02 pM (A), 0.4 pM (B) and 5pM (C).
- Figure 11 shows Plasmid vectors used for S. pyogenes CRISPR/Cas9 positive control 1.
- A The intermediate plasmid vector pHUda2351 containing the S. pyogenes Cas9 system used for control plasmid construction and
- B positive control 1 containing Cas9 and spacers targeting regions of the spacer 32 or spacer 112.
- ampR E. coli ampicillin resistance gene
- AMA1 A. nidulans AMA1 origin of replication
- the plasmid vector pBKHMOOOl ( Figure 12) was first digested with Asci and Sbf ⁇ , and the plasmid backbone fragment of length 11097bp length was recovered by gel extraction.
- Figure 12 shows the plasmid vector pBKHMOOOl used as a backbone sequence.
- A. oryzae U6 promoter A. oryzae U6 promoter; Af tRNA gly, A. fumigatus glycine tRNA; A. oryzae U6 term, A. oryzae U6 terminator; AnPtefl , A. niger tef1 promoter; CRISPR-Cas ⁇ t>, Cas ⁇ t> nuclease gene; Ttef(nid), A. nidulans tef1 terminator; Ptef 1 , A.
- niger tef1 promoter Nourseothricin resistance gene
- TniaD niaD terminator
- pUC pUC plasmid backbone sequence
- ampR E. coli ampicillin resistance gene
- AMA1 A. nidulans AMA1 origin of replication .
- the gene sequences encoding the nucleases 0076, 0100, 0105 and 0149 were first codon optimized for A. niger, and the nucleoplasmin and SV40 nuclear localization signals were added on to the 5’ and 3’ ends respectively. Their respective hypothetical scaffold sequences without spacer (Direct repeat) were used without codon-optimization.
- the used nuclease sequences and their corresponding sgRNAs without spacers can be found in Table 6. Table 6. Sequences of nuclease genes and corresponding sgRNAs used .
- Insert fragments for 0100 and 0149 contained constitutive expression cassettes for their respective gRNAs and nucleases.
- insert fragments contained a constitutive expression cassette encoding for their gRNAs and a Thiamine inducible cassette encoding for their nucleases. 25bp regions homologous with the plasmid backbone insert site were added to the 5’ and 3’ flanks to aid future plasmid construction, and the resulting fragments were synthesized using commercial gene synthesis services.
- the plasmid backbones and inserts were joined to form their corresponding intermediate plasmids using HiFi DNA Assembly (New England Biolabs, USA). These intermediate plasmids were designated pBKHM-076-thiA-int, pBKHM-0100-int, pBKHM-0105-thiA-int and pBKHM- 0149-int ( Figures 13A-13D). These plasmids were used directly as negative controls.
- Figure 13 shows the intermediate plasmid/ negative control vectors (A) pBKHM-076-thiA- int, (B) pBKHM-0100-int, (C) pBKHM-0105-thiA-int and (D) pBKHM-0149-int.
- A.oryzae U6 promoter A. oryzae U6 promoter; Af tRNA gly, A. fumigatus glycine tRNA; A.oryzae U6 term, A. oryzae U6 terminator; DR, direct repeat; tracr RNA, tracr RNA; sgRNA, concatenation of the respective direct repeat and tracrRNA sequence; AnPtefl , A.
- niger tefl promoter Ttef(nid), A. nidulans tef1 terminator; Ptefl , A. niger tefl promoter; NATr, Nourseothricin resistance gene; TniaD, niaD terminator; pUC, pUC plasmid backbone sequence; ampR, E. coli ampicillin resistance gene; AMA1 , A. nidulans AMA1 origin of replication
- Aspergillus niger strain MBinl 18 (NN049549) was used for all procedures.
- COVE-N top agar solution 342.3 g/L Sucrose, 20ml/L COVE salt solution, 3 g/L NaNO2, 10 g/L Nippon gene agarose L Low melt agarose, 6 drops/L 5N NaOH
- STC 0.8 M sorbitol, 50 mM Tris pH 8, 50 mM CaCh STPC: 40 % PEG4000 in STC buffer.
- An agar slant (COVE-N-glyX) was inoculated with spores of MBinl 18, and the strain was grown at 30°C until completely sporulated. 9 ml of 0.1 % tween20 water was added to the slant, and the spores were suspended manually. The spore suspension was transferred to shake flasks (500 ml) with baffles containing 100 ml YPG medium. The flask was incubated at 30 or 32°C for 15-20 hrs (60-80 rpm). Mycelia was collected by filtering through Mira-cloth. Mycelia was washed 2-3 times by 0.6 M KCI or 0.7M KCI+10mM CaCh.
- Mycelia was resuspended in 20-30 ml 0.6M KCI or 0.7M KCI+10mM CaCh with 20-48 mg/ml Glucanex and 1.2mg/ml BSA in 50 ml Centrifuge tube. The sample was incubated for 1-1.5 hrs at 30 or 32 °C, 80 rpm, and the protoplasting was monitored frequently by microscopy. After protoplasting was observed, the solution was filtered through Mira-cloth to 25 ml Universal container (Nunc 364211). The solution was then centrifuged at 2000 rpm for 10 minutes with slow acceleration.
- the transforming DNA was added to 100 l of protoplasts in 14 ml Falcon tube, mixed gently and incubated on ice for more than 30 minutes. 1 ml SPTC buffer was added and the solution was mixed gently, then incubated at 37 °C in water bath for 20 minutes. 10-15 ml COVE-N top agar solution containing 50 pg/ml of Nourseothricin was added to the solution, mixed and poured onto transformation plates. After the agar solidified, the plates were incubated at 34 °C until colonies were clearly visible. Transforming DNA volume is less than 10 ul (1 - 10 ug).
- Colonies were picked for each transformation and isolated to COVE-N-glyX agar. The colonies were allowed to sporulate by incubation at 30 °C for 1 week.
- DNA purification was done by gel electrophoresis, excision of DNA with correct band length and purification by silica column (QIAquick Gel Extraction Kit, QIAGEN) using standard molecular biology procedures. Purified DNA samples were sequenced by commercial Sanger sequencing services.
- Deletion cassette of DsRED gene (SEQ ID NO: 384) was designed to delete 67-bp DNAs in the middle of DsRED coding sequences.
- the protospacer sequences for disruption of the DsRED gene are shown in SEQ ID NOs: 385 - 404.
- Protospacers were designed as 21-bp long DNAs within the region to be deleted after disruption.
- DNA maps representing a typical nuclease vector and a guide RNA expression vector are shown in Figs. 14 - 17.
- synthetic DNAs were PCR-amplified by the oligo DNAs shown in Table 8 (SEQ ID NOs: 485 - 507) and gel-purified before SOE PCR and subsequent transformation to B. subtilis host PP3724. Finally, transformants were isolated by tetracycline or erythromycin resistance.
- Nuclease 0076 was expressed by the IPTG-inducible Pgrac promoter.
- the green fluorescent colonies were found from the double resistant medium without IPTG (EXP_08 in Table 9), indicating that leaky expression from Pgrac was sufficient for introducing desired modification to the chromosome.
- genomic PCR and sanger sequencing were performed to selected colonies from EXP_08, _18, _20, _22, _27, _33, _37, _38, _43, _48, _49, _53, _55, _58, and _61 , respectively.
- Oligo DNAs for amplification of genomic region and sanger sequencing are shown in Table 10 (SEQ ID NO:508 - 511).
- Chemicals used as buffers and substrates were commercial products of at least reagent grade.
- erythromycin resistance To select for erythromycin resistance, agar and liquid media were supplemented with 5 micro-gram/ml erythromycin. To select for tetracycline resistance, agar and liquid media were supplemented with 15 micro-gram/ml tetracycline. Where needed, IPTG (Isopropyl p-d-1- thiogalactopyranoside) was added with 1 mM as a final concentration.
- Oligonucleotide primers were obtained from Macrogen, Korea. DNA manipulations (plasmid and genomic DNA preparation, restriction digestion, purification, ligation, DNA sequencing) was performed using standard textbook procedures with commercially available kits and reagents.
- DNA was introduced into B. subtilis rendered naturally competent, either using a two step procedure (Yasbin et al., 1975, J. Bacteriol. 121 : 296-304.), or a one step procedure, in which cell material from an agar plate was resuspended in Spizisen 1 medium (WO 2014/052630), 12 ml shaken at 200 rpm for appr. 4 hours at 37 °C, DNA added to 400 microliter aliquots, and these further shaken 150 rpm for 1 hour at the desired temperature before plating on selective agar plates.
- DNA was introduced into B. licheniformis by conjugation from B. subtilis, essentially as previously described (EP2029732 B1), using a modified B. subtilis donor strain PP3724, containing pLS20, wherein the methylase gene M.blil 90411 (LIS20130177942) is expressed from a triple promoter at the amyE locus, the pBC16-derived orf beta and the B. subtilis comS gene (and a kanamycin resistance gene) are expressed from a triple promoter at the air locus (making the strain D-alanine requiring), and the B. subtilis comS gene (and a cat gene) are expressed from a triple promoter at the pel locus.
- M.blil 90411 LIS20130177942
- This strain is a B. subtilis derivative, containing pLS20, wherein the methylase gene M.blil 90411 (US20130177942) is expressed from a triple promoter at the amyE locus, the pBC16-derived orf beta and the B. subtilis comS gene (and a kanamycin resistance gene) are expressed from a triple promoter at the air locus (making the strain D-alanine requiring), and the B. subtilis comS gene (and a cat gene) are expressed from a triple promoter at the pel locus.
- M.blil 90411 US20130177942
- SJ1904 This strain is a B. licheniformis derivative, described in WO 2008/066931.
- Fig. 19 shows the editing efficiency of each CRISPR system compared to the traditional method using only the A-Red recombinase system. All four conditions were supplied with the donor DNA fragment. Expression of nucleases 0149 (both OPT and WT) was induced overnight with 0.4 pg/ml anhydrotetracycline. 0149 WT Ctrl contains 0149 nuclease (encoded by WT sequence) but without any guide plasmid. pSIM6 Ctrl was the A-Red recombinase system only control, without any CRISPR nuclease. Both conditions (pSIM6, and 0149 WT Ctrl) served as negative controls for the experiment.
- 0149 OPT encoded by the codon-optimized sequence has shown 100% editing efficiency, with 79 out of 79 clones being positive for the desired insert.
- 0149 encoded by the WT sequence has exhibited 62.2% editing efficiency, with 28 out of 45 clones being positive for the desired insert.
- the insert was sequence-verified from selected clones (data not shown).
- both 0149 WT Ctrl and pSIM6 controls (considered traditional editing methods) only demonstrated 2% (1 out of 48 clones) and 3% (1 out of 33 clones) editing efficiency, respectively.
- the novel CRISPR nuclease 0149 has significantly improved editing efficiency compared to the classic homologous recombination method. Additionally, codon optimization of the 0149 nuclease further improved editing efficiency.
- nucleases 0100, and 0149 have demonstrated nuclease activity in E. coli.
- the codon-optimized 0149 nuclease has shown 100% editing efficiency at the adhE locus, while the 0149 encoded by the wild-type DNA sequence has shown 62.2% efficiency.
- E. coli BL21 (DE3) strain (Novagen) was used in this study.
- the CRISPR plasmid was constructed on the pSIM6 vector [6 Datta et al. (2006), Gene, 379, 109-115] with an inducible P te t promoter controlling the expression of different CRISPR nucleases.
- the CRISPR plasmid carries an ampicillin resistant marker and was synthesized by Twist Biosciencea vendor.
- the gRNA sequence is under the control of a synthetic promoter.
- the guide plasmid carries a kanamycin resistant marker.
- the gRNA and the PAM sequence are indicated in Table 1.
- the gRNA sequence for the spCas9 adhE locus was ordered according to Shukal et al. [Shukal et al. (2022), Microbial Cell Factories, 21(1), 19], The donor DNA fragment Pj23io4-mCherry, containing a 100 bp homology flanking area from the adhE locus, was synthesized and further amplified via PCR. It was then purified with the NucleoSpin Gel and PCR Clean-up Mini kit (Macherey-Nagel) for further experiments.
- All the cells were grown in 2x Yeast Extract Tryptone medium. Briefly, cells from overnight culture or 2xYT plate were inoculated into 20 mL fresh media in shaking flask. Cells were incubated at 30 °C for 16 h or longer before harvest. The media were supplemented with appropriate antibiotics (100 mg/L ampicillin and 30 mg/L kanamycin) to maintain corresponding plasmids.
- BL21 electrocompetent cells were prepared for transforming the corresponding plasmids.
- 10 ng of the plasmid were mixed with 60 pl of the competent cells in a 1 mm Gene Pulser cuvette (Bio-Rad) and electroporated at 1.8 kV.
- the cells were recovered in 600 pl of SOC medium at 30 °C, 120 rpm for 1 hour before spreading onto a 2xYT agar plate containing ampicillin (100 pg/ml) and incubated overnight at 30 °C.
- the method is modified from Shukal et al. [Shukal et al. [ Shukal et al.
- the guide plasmid was then introduced to the corresponding BL21-CRISPR plasmid-harboring strain with the same set-up for electroporation. After recovery, 100 pl of the culture was spread onto an antibiotic-containing 2xYT agar plate (100 pg/ml ampicillin and 30 pg/ml kanamycin) with or without 0.4 pg/ml anhydrotetracycline and incubated at 30 °C for at least 1-2 days to observe the killing effect.
- an antibiotic-containing 2xYT agar plate 100 pg/ml ampicillin and 30 pg/ml kanamycin
- the cells were induced for 15 minutes at 42 °C for A-Red recombinase expression and subsequently made electrocompetent [Datta et al. (2006), Gene, 379, 109-115], Then, 300-600 ng of the donor DNA fragment PJ23104- mCherry was introduced following the same protocol for electroporation as described above.
- the cells were recovered in 600 pl of SOC medium at 30 °C, 120 rpm for 1 hour. 100 pl of the recovered culture was taken out, serially diluted, and then spread onto a 2xYT agar plate supplied with 100 pg/ml ampicillin and 30 pg/ml kanamycin and incubated at 30 °C overnight as a non-induced control. Additionally, 200 pl of the recovered culture was transferred into 20 ml of 2xYT medium supplied with 0.4 pg/ml anhydrotetracycline, 100 pg/ml ampicillin, and 30 pg/ml kanamycin (to maintain the plasmids) and shaken at 30 °C, 120 rpm overnight.
- Example 6 Novel nucleases for gene editing in Bacillus subtilis
- the pJOE8999 plasmid as described by Sachla et al. (2021) in "A simplified method for CRISPR-Cas9 engineering of Bacillus subtilis” (Microbiol Spectr 9:e00754-21), was obtained from the Bacillus Genetic Stock Center (BGSC #ECE358).
- This plasmid expresses the Cas9 nuclease under a mannose-inducible promoter and includes the Pvan promoter for constitutive expression of the guide sgRNA when cloned into the vector.
- the following derivative plasmids of pJOE899 (SEQ ID NO: 542), including pJOE_Cas9_001 , pJOE_NZ0149_002, and pJOE_NZ0149_empty, were generated using standard Golden Gate Assembly of either two or three DNA fragments with the Bsal restriction enzyme and T4 DNA ligase. The Golden Gate assemblies were then transformed into E. coli TOP10 with selection on kanamycin.
- a B. subtilis strain containing the full length dsRed gene (SEQ ID NO:544) integrated in the Pel locus was constructed by homology recombination. This strain was then made competent and used to transform the pJOE plasmids and repair DNA. Preparation of repair DNA of the dsRed gene containing a 130 bp deletion.
- the synthetic repair DNA (SEQ ID NO:545) was amplified by PCR using forward and reverse oligonucleotides identical to the 5’ and 3’ sequences. The PCR-amplified product was then used to co-transform B. subtilis with pJOE plasmids.
- nuclease is a fragment of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 34, S
- nuclease of any one of paragraphs 1-6 which is encoded by a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the mature polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 53
- nuclease of any one of paragraphs 1-7 comprising an N-terminal extension and/or C-terminal extension of 1-10 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids, preferably and extension of 1-10 amino acid residues in the N- terminus and/or 1-10 amino acids in the C-terminus, such as 1-5 amino acids.
- Streptococcus pacificus Streptococcus orisratti DSM 15617, Streptococcus salivarius, or Streptococcus ruminantium cell, from a Bacillus cell, e.g., a Bacillus sp-63030 cell, from a Turicibacter cell, e.g., a Turicibacter sp.
- Bacillus cell e.g., a Bacillus sp-63030 cell
- Turicibacter cell e.g., a Turicibacter sp.
- a cell from a Ureibacillus cell, e.g., a Ureibacillus thermosphaericus cell, from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell, from a Clostridia cell, from a Ruminococcus cell, e.g., a Ruminococcus sp.
- a Ureibacillus cell e.g., a Ureibacillus thermosphaericus cell
- Lentihominibacter cell e.g., a Lentihominibacter hominis cell
- Clostridia cell from a Ruminococcus cell, e.g., a Ruminococcus sp.
- a cell from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell, from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell, from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell, from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell, or from a Vagococcus cell, e.g., a Vagococcus penaei cell
- nuclease of any one of paragraphs 1-10 which is obtained from or obtainable from a Lactobacillus cell, e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
- a Lactobacillus cell e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lac
- nuclease of any one of paragraphs 1-14 comprising one or more domain selected from the group consisting of:
- a RuvC domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318;
- a HNH domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319 or 320;
- a RuvC domain derived from any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318, by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318;
- nuclease a fragment of the catalytic domain of (a), (b), (c), or (d); preferably wherein the nuclease has nuclease activity, or wherein the nuclease has nickase activity.
- HNH domain has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320.
- HNH domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 144, 146, 145, 154, 319, or 320.
- HNH domain is a variant of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, comprising a substitution, such as a conservative amino acid substitution, a deletion, and/or an insertion at one or more positions.
- HNH domain differs from any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
- the HNH domain is a fragment of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, wherein the fragment preferably contains at least 20 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1), or at least 27 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1).
- HNH domain comprises, consists essentially of, or consists of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320.
- the RuvC domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
- nuclease of any one of paragraphs 1-31 wherein the nuclease is a nickase having one or more inactivated HNH domain created by an amino acid substitution, insertion or deletion at a position provided for the nuclease in column 3 of Table 3.
- nuclease of any one of paragraphs 1 -32 wherein the nuclease has a single-stranded break activity towards a DNA target site.
- nuclease is a catalytically dead nuclease.
- nuclease of any one of paragraphs 34-35, wherein the catallytically dead nuclease comprising one or more inactivated RuvC domain and one or more inactivated HNH domain is created by one or more amino acid substitution, deletion or insertion at the positions provided for the nuclease in column 3 of Table 2 or column 3 of Table 3.
- nuclease of any one of the preceding paragraphs wherein polynucleotide encoding the nuclease is codon-optimized for expression in a E. coli cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 512- 520.
- nuclease of any one of the preceding paragraphs wherein polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus licheniformis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 405, 416, 417, 434, 449, 465, or 466.
- nuclease of any one of the preceding paragraphs wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Aspergillus niger cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 347, 349, 351 , or 353.
- nuclease of any one of the preceding paragraphs wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 528, 549 or 550.
- nuclease of any one of paragraphs 1-51 wherein the nuclease is a Class 2 Type II Cas nuclease.
- nuclease is a Class 2 Type-ll- B Cas nuclease.
- PAM protospacer adjacent motif
- a fusion polypeptide comprising the Cas nuclease of any one of paragraphs 1-58, and one or more second polypeptide.
- fusion polypeptide according to any one of paragraphs 59-60, wherein one or more second polypeptide is a nuclear localization sequence (NLS), a cell penetrating peptide, and/or an affinity tag.
- NLS nuclear localization sequence
- cell penetrating peptide a cell penetrating peptide
- affinity tag an affinity tag
- fusion polypeptide according to any one of paragraphs 59-61 , wherein the fusion polypeptide comprises 1-10 or more NLS at or near the amino-terminus, 1-10 or more NLS at or near the carboxy-terminus, or a combination of 1-10 or more NLS at or near the amino-terminus and 1-10 or more NLS at or near the carboxy-terminus.
- the fusion polypeptide according to paragraph 68, wherein the linker between the first NLS and the second NLS comprises at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 amino acids.
- fusion polypeptide according to any one of paragraphs 59-71 , wherein the fusion polypeptide comprises a linker between the Cas nuclease and the base-editing polypeptide.
- fusion polypeptide according to any one of paragraphs 59-72, wherein the baseediting polypeptide comprises a deaminase, e.g., a cytidine deaminase, such as a APOBEC3A deaminase, or an adenosine deaminase.
- a deaminase e.g., a cytidine deaminase, such as a APOBEC3A deaminase, or an adenosine deaminase.
- fusion polypeptide according to any one of paragraphs 59-73, wherein the one or more second polypeptide comprises a reverse transcriptase, the reverse transcriptase preferably comprising a reverse transcriptase domain.
- fusion polypeptide according to any one of paragraphs 59-74, wherein the nuclease is fused to one or more NLS of sufficient strength to drive accumulation of a CRISPR complex comprising the Cas nuclease in a detectable amount in the nucleus of a eukaryotic cell.
- sequence Identity The nuclease or fusion polypeptide according to any one of paragraphs 1-77, wherein sequence identity is determined by the method described in the definition section under “Sequence Identity”.
- a non-naturally occuring composition comprising (i) the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, and/or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78.
- composition according to paragraph 79, wherein the nucleic acid molecule is a chemically modified nucleic acid molecule.
- composition according to any one of paragraphs 79-81 wherein the nucleic acid molecule is RNA.
- RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
- UTR untranslated regions
- ORF open reading frame
- polyA poly-adenylyl
- nucleic acid molecule is linear.
- composition according to any one of paragraphs 79-87 further comprising one or more RNA molecules, or a DNA polynucleotide encoding one or more of the one or more RNA molecules, wherein the one or more RNA molecules and the Cas nuclease or fusion polypeptide do not naturally occur together, and the one or more RNA molecules are configured to form a complex with the Cas nuclease or fusion polypeptide and/or target the complex to a target site.
- RNA molecule comprises a guide RNA (gRNA), which gRNA is comprising a CRISPR RNA (crRNA) and a trans activating RNA (tracrRNA).
- gRNA guide RNA
- crRNA CRISPR RNA
- tracrRNA trans activating RNA
- sgRNA single-molecule RNA
- composition according to any one of paragraphs 79-91 further comprising a donor template for homology directed repair (HDR).
- HDR homology directed repair
- composition according to any one of paragraphs 79-92, wherein the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ
- RNA molecule comprises a trans activating RNA (tracrRNA) sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ I D NO: 157 - 208, preferably any of SEQ NO: 157, SEQ I D NO: 177, SEQ I D NO: 196, SEQ ID NO: 195, SEQ ID NO: 204, or SEQ ID NO: 185.
- tracrRNA trans activating RNA
- composition according to any one of paragraphs 79-93, wherein at least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID Nos: 209 - 260, preferably any of SEQ ID NO: 209, SEQ ID NO: 229, SEQ ID NO: 248, SEQ ID NO: 247, SEQ ID NO: 256, or SEQ ID NO: 237 .
- crRNA CRISPR RNA
- composition according to any one of paragraphs 79-95 wherein at least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 261 - 312, preferably of any of SEQ ID NO: 261 , SEQ ID NO: 281 , SEQ ID NO: 300, SEQ ID NO: 299, SEQ ID NO: 308, or SEQ ID NO: 289.
- composition according to any one of paragraphs 79-96 wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule is a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at
- composition according to any one of paragraphs 79-97 wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1
- the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 9
- composition according to any one of paragraphs 79-98e wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at
- the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least
- composition according to any one of paragraphs 79-1 OOe wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%
- composition according to any one of paragraphs 79-101 , wherein the composition further comprises a base editor enzyme.
- composition according to any one of paragraphs 79-103, wherein the composition further comprises a reverse transcriptase enzyme.
- a method of modifying a nucleotide sequence at a DNA target site in the genome of a cell comprising introducing into the cell the Cas nuclease or fusion polypeptide according to any one of paragraphs 1-78, a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, and/or the composition of any one of paragraphs 79-104.
- the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- a mammalian cell such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- yeast cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomy
- the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus,
- the plant cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
- the cell is a prokaryotic cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E.
- a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lac
- Lacticaseibacillus casei Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus helveticus, Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausi
- the polynucleotide of paragraph 132 which comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO:
- polynucleotide according to any one of paragraphs 132-133, wherein the polynucleotide is a chemically modified nucleic acid molecule.
- RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
- UTR untranslated regions
- ORF open reading frame
- polyA poly-adenylyl
- poly-A sequence comprises non-adenine nucleotides.
- poly-A sequence comprises 100-400 nucleotides.
- a nucleic acid construct or expression vector comprising the polynucleotide according to any one of paragraphs 132-147, operably linked to one or more control sequences that direct the production of the nuclease or fusion polypeptide in a cell.
- a cell comprising the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, a polynucleotide of any one of paragraphs 132-147, the nucleic acid construct or expression vector of paragraph 148, or the composition of any one of paragraphs 79-104.
- a cell comprising a genome modified by the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, by a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, by the composition of any one of paragraphs 79-104, by the polynucleotide of any one of paragaphs 132-147, by the nucleic acid construct or expression vector of paragraph 148, and/or by the method of any one of paragraphs 105-131.
- an archaeal cell a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell,
- the cell of paragraph 158 wherein the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- a mammalian cell such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
- the cell of paragraph 161 wherein the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- yeast cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevis
- the cell of paragraph 161 wherein the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans,
- the cell is a prokaryotic cell, e.g., a Grampositive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E.
- a Grampositive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactoc
- coli Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Lacticaseibacillus casei, Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus,
- 171a The cell according to paragraph 171 , wherein the cell is a Lacticaseibacillus paracesei cell. 171 b. The cell according to paragraph 171 , wherein the cell is a Streptococcus thermophilus cell. 171c. The cell according to paragraph 171 , wherein the cell is a E. coli cell.
- a method of producing a Cas nuclease or fusion polypeptide comprising cultivating the recombinant host cell of any one of paragraphs 149-176 under conditions conducive for production of the Cas nuclease or fusion polypeptide.
- the targeted cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, a non-human animal cell, an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a non-human mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
- the targeted cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a
- a formulation comprising (i) the Cas nuclease according to any one of paragraphs 1-58, the fusion polypeptide according to any one of paragraphs 59-78, a composition according to any one of paragraphs 79-104, the polynucleotide according to any one of paragraphs 132-147, the nucleic acid construct or expression vector according to paragraph 148, or the cell according to any one of paragraphs 149-176, and optionally, (ii) one or more of a lipid, a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Medicinal Chemistry (AREA)
- Mycology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present invention relates to novel Cas nucleases and polynucleotides encoding the same. The invention also relates to nucleic acid constructs, vectors, and host cells comprising the polynucleotides and Cas nuclease, as well as fusion polypeptides, gene editing methods, methods of producing the Cas nuclease, formulations comprising the Cas nuclease, and use of the Cas nuclease.
Description
NOVEL CAS NUCLEASES AND POLYNUCLEOTIDES ENCODING THE SAME
Reference to a Sequence Listing
This application contains a Sequence Listing in computer readable form, which is incorporated herein by reference.
Background of the Invention
Field of the Invention
The present invention relates to novel CRISPR-associated (Cas) nucleases, variants thereof, and polynucleotides encoding the same. The invention also relates to nucleic acid constructs, vectors, and host cells comprising the polynucleotides and Cas nuclease, as well as fusion polypeptides, gene editing methods, methods of producing the Cas nuclease, formulations comprising the Cas nuclease, and use of the Cas nuclease.
Description of the Related Art
In recent years, genome editing has emerged as a pivotal tool for research and applications. Early methods required complex engineering of nucleases, such as meganucleases, zinc finger fusion proteins, or TALENs, tailored for each target sequence. This process was timeconsuming and costly, posing scalability and efficiency challenges.
A transformative breakthrough occurred with RNA-guided nucleases, particularly CRISPR-associated (Cas) proteins. These RNA-guided nucleases allow specific targeting of genetic sequences using guide RNA, streamlining genome editing by eliminating the need for custom-engineered nucleases. RNA-guided nucleases, including CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA), offer versatile genome editing options, from introducing mutations via non-homologous end-joining (NHEJ), to precise base editing when fused with deaminases.
Programmable nucleases, core components of RNA-guided nucleases, bind and cleave nucleic acids with sequence-specificity. They exhibit activities like cis cleavage or nickase activity, guided by specialized RNA molecules. These nucleases can be engineered to reduce catalytic activity while maintaining sequence specificity, expanding their utility.
CRISPR systems in bacterial and archaeal adaptive immunity display diverse characteristics. Differences in size, PAM site, on-target activity, and cleavage pattern offer unique advantages for various applications, but can also represent limitations (e.g., low frequency of PAM sites in the target cell genome, or low expressability). Novel Cas nucleases are essential to address evolving genome engineering demands.
Summary of the Invention
The inventors of instant invention have identified novel Cas nucleases which showed nuclease activities across different organisms, including bacterial and fungal species. Surprisingly, the novel nucleases possess several advantages over the nucleases known from the prior art. The herein identified novel Cas nucleases are compatible with more flexible PAM sites, resulting in a higher number of possible target sites per genome. Furthermore, the herein identified Cas nucleases have a relatively small size and have thus a promising outlook to be used in pharmaceutical approaches. Also, the relatively small nuclease size is beneficial for high expression in recombinant cells, particularly bacterial cells.
In a 1st aspect the present invention relates to Cas nucleases selected from the group consisting of:
(a) a polypeptide having at least 70% sequence identity to any of the amino acid sequences of SEQ ID NOs: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52;
(b) a polypeptide encoded by a polynucleotide having at least 70% sequence identity to any of the polypeptide coding sequences of SEQ ID Nos: 73, 100, 53, 92, 91 , or 81 , or to any of SEQ ID NOs: 53 - 104, or to any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550;
(c) a polypeptide derived from any one of SEQ ID NOs: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations), in particular substitutions, such as conservative amino acid substitutions;
(d) a polypeptide having a TM-score of at least 0.80 compared to the three- dimensional structure of the polypeptide of any one of SEQ ID Nos: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52, wherein the three-dimensional structure is calculated using Alphafold;
(e) a polypeptide derived from the polypeptide of (a), (b), (c), or (d), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and
(f) a fragment of the polypeptide of (a), (b), (c), (d), or (e).
Some aspects of the disclosure provide Cas nucleases that have different PAM specificities. Typically, Cas nucleases, such as Cas9 from S. pyogenes (spCas9), require a canonical “nGG” PAM sequence to bind a particular nucleic acid region. This may limit the ability to target desired bases within a genome. In some embodiments, the Cas nucleases provided herein may need to be placed at a precise location, for example where a target base is placed within a 4-base region (e.g., a “editing window”), which is approximately 15 bases upstream of the PAM. See Komor, A.C., et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016). Accordingly, in some
embodiments, any of the Cas nucleases provided herein may contain a CRISPR nuclease that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., nGG) PAM sequence. In one embodiment, the Cas nuclease of the invention utilizes a “nnAY” PAM sequence which allows a more flexible selection of target sites compared to the canonical “nGG” PAM sequence of spCas9.
In a 2nd aspect, the invention relates to a fusion polypeptide comprising the Cas nuclease of the 1st aspect, and one or more second polypeptide.
In a 3rd aspect the present invention relates to a non-naturally occurring composition comprising (i) the Cas nuclease of the 1st aspect and/or the fusion polypeptide of the 2nd aspect, or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease of the 1st aspect and/or the fusion polypeptide of the 2nd aspect.
In a 4th aspect the present invention relates to a method of modifying a nucleotide sequence at a DNA target site in the genome of a cell, comprising introducing into the cell the Cas nuclease of the 1st aspect, the fusion polypeptide according to the 2nd aspect, the composition of the 3rd aspect, the polynucleotide of the 5th aspect, and/or the nucleic acid construct or expression vector of the 6th aspect.
In a 5th aspect, the present invention relates to a polynucleotide encoding the Cas nuclease of the 1st aspect, and/or the fusion polypeptide of the 2nd aspect.
In a 6th aspect the present invention relates to a nucleic acid construct or expression vector comprising the polynucleotide of the 5th aspect, operably linked to one or more control sequences that direct the production of the polypeptide in a cell.
In a 7th aspect, the present invention relates to a cell comprising the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the polynucleotide of the 5th aspect, or the nucleic acid construct or expression vector of the 6th aspect.
In an 8th aspect, the present invention relates to a cell comprising a genome modified by the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, and/or the nucleic acid construct or expression vector of the 6th aspect.
In a 9th aspect, the present invention relates to a method of producing a Cas nuclease of the 1st aspect, or a fusion polypeptide of the 2nd aspect, comprising cultivating the host cell of the 7th aspect under conditions conducive for production of the Cas nuclease or fusion polypeptide.
In a 10th aspect, the present invention relates to the use of the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, or the nucleic acid construct or expression vector of the 6th aspect for modifying a target sequence in a cell, e.g., a target gene.
In an 11th aspect, the present invention relates to the use of the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, the nucleic acid construct or expression vector
of the 6th aspect, the cell of the 7th aspect, or the cell of the 8th aspect for the manufacture of a medicament for modifying a target sequence in a cell, e.g., a target gene.
In a 12th aspect, the present invention relates to a formulation comprising (i) the Cas nuclease according to the 1st aspect, the fusion polypeptide according to the 2nd aspect, a composition according to the 3rd aspect, the polynucleotide according to the 5th aspect, the nucleic acid construct or expression vector according to the 6th aspect, the cell according to the 7th aspect, or the cell according to the 8th aspect, and optionally, (ii) one or more of a lipid, a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle.
Brief Description of the Drawings
Figure 1 shows the bioinformatic pipeline developed to identify novel CRISPR-Cas systems in genome sequences. Genome sequences from various sources are processed with three state-of-the-art sequence mining tools to identify Cas nuclease genes, CRISPR arrays, and tracrRNA sequences. The pool of potential Cas proteins is further enriched by scanning the genomes with custom HMMs and filtered for the presence of domains and residues required for endonuclease activity. The three functional elements Cas, CRISPR array, and tracrRNA are mapped to each other via there genomic locus as well as the complementarity of the CRISPR repeats and the tracrRNAs.
Figure 2 shows a sequence homology tree of the novel Cas nucleases based on their amino acid sequences (SEQ ID NOs: 1 - 52).
Figures 3-6 show ratios of white/black colonies after transformation of A. niger with CRISPR nucleases.
Figures 7-10 show insertions I deletions generated by the CRISPR nucleases in the A. niger genome.
Figures 11-13 show CRISPR plasmids used in A. niger.
Figure 14 shows a schematic drawing of the plasmid pTNA665 (PamyL-NZ0076).
Figure 15 shows a schematic drawing of the plasmid pTNA666 (Pgrac-NZ0076).
Figure 16 shows a schematic drawing of the plasmid pTNA669 (separate guide RNA for NZ0076).
Figure 17 shows a schematic drawing of the plasmid pTNA670 (single guide RNA for NZ0076).
Figure 18 shows the killing effect based on nuclease activity in E. coli.
Figure 19 shows gene editing efficiency in E. coli.
Figure 20 shows a ribbon alignment between the protein structures from nucleases 0076 and 0172.
Figure 21 shows a ribbon alignment between the protein structures from nucleases 0100 and 0172.
Figure 22 shows a ribbon alignment between the protein structures from nucleases 0076 and 0102.
Figure 23 shows a ribbon alignment between the protein structures from nucleases 0076 and S. pyogenes Cas9.
Figure 24 shows a ribbon alignment between the protein structures from nucleases 0076 and 0100.
Definitions
In accordance with this detailed description, the following definitions apply. Note that the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise.
Unless defined otherwise or clearly indicated by context, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Base-editing polypeptide: The term “base-editing polypeptide” means a polypeptide comprising a base editor domain capable of chemically altering the target DNA sequence without introducing a double-strand break. Non-limiting examples for a base-editing polypeptide include a deaminase, e.g., a cytidine deaminase, and an adenosine deaminase. The base-editing polypeptide may be fused to an inactivated Cas nuclease of the invention to enable singlenucleotide changes to a specific DNA target sequence. Using a base-editing polypeptide, single nucleotide polymorphisms (SNPs) can be introduced at the DNA target site without generating a double-strand break.
Catalytic domain: The term “catalytic domain” means the region of an enzyme containing the catalytic machinery of the enzyme. The catalytic domain of a Cas nuclease comprises a HNH domain, and one or more RuvC domain(s).
Catalytically dead nuclease: The term “catalytically dead nuclease”, “catalytically inactive nuclease” or “dCas” means a mutated Cas nuclease which has reduced, no, or substantially no cleavage activity. The catalytically dead nuclease has reduced, no, or substantially no cleavage activity for single- and double-strand DNA. In other words, the dCas may not cleave either strand of a target DNA.
For example, the catalytically dead nuclease may comprise an inactivated RuvC domain and an inactivated HNH domain. In some embodiments, the Cas nuclease is a catalytically dead nuclease, e.g., having substantially no nuclease activity, e.g., no more than 5 percent nuclease activity as compared with a wild-type Cas nuclease not having an inactivated RuvC domain and not having an inactivated HNH domain.
As a non-limiting example, in some cases, the dCas harbors both the D10A and the H840A mutations in S. pyogenes Cas9, or corresponding mutations in any Cas nuclease of the invention. Additional suitable nuclease-inactive dCas can be apparent to those of skill in the art
based on this disclosure and knowledge in the field, and are within the scope of this disclosure. Such additional exemplary suitable dCas include, but are not limited to, D10A/H840A, D10A/D839A/H840A, and D10A/D839A/H840A/N863A mutant domains (See, e.g., Prashant et al., “CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering” Nature Biotechnology. 2013; 31(9): 833-838, the entire contents of which are incorporated herein by reference). cDNA: The term "cDNA" means a DNA molecule that can be prepared by reverse transcription from a mature, spliced, mRNA molecule obtained from a eukaryotic or prokaryotic cell. cDNA lacks intron sequences that may be present in the corresponding genomic DNA. The initial, primary RNA transcript is a precursor to mRNA that is processed through a series of steps, including splicing, before appearing as mature, spliced mRNA.
Coding sequence: The term “coding sequence” means a polynucleotide, which directly specifies the amino acid sequence of a polypeptide. The boundaries of the coding sequence are generally determined by an open reading frame, which begins with a start codon, such as ATG, GTG, or TTG, and ends with a stop codon, such as TAA, TAG, or TGA. The coding sequence may be a genomic DNA, cDNA, synthetic DNA, or a combination thereof.
Codon-optimized gene: The term "codon-optimized gene" means a gene having its frequency of codon usage optimized to the frequency of preferred codon usage of a host cell. The nucleic acid changes made to codon-optimize a gene do not change the amino acid sequence of the encoded polypeptide of the parent gene.
Control sequences: The term “control sequences” means nucleic acid sequences involved in regulation of expression of a polynucleotide in a specific organism or in vitro. Each control sequence may be native (/.e., from the same gene) or heterologous (/.e., from a different gene) to the polynucleotide encoding the polypeptide, and native or heterologous to each other. Such control sequences include, but are not limited to leader peptide, polyadenylation signal, prepeptide, propeptide, signal peptide, promoter, terminator, enhancer, and transcription or translation initiator and terminator sequences. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the polynucleotide encoding a polypeptide.
Cas nuclease: The term “Gas nuclease” means an RNA-guided DNA endonuclease associated with CRISPR which is capable of cleaving a target DNA sequence when coupled with a guide RNA. The Cas nuclease is guided by guide RNA(s) to recognize and cleave a specific target site in double-stranded DNA in the genome of a cell. CRISPR-Cas systems are currently classified into 2 classes, 6 types, and 33 subtypes (Makarova et al., 2020, Nat Rev Microbiol 18: 67-83).
In one embodiment, the Cas nuclease is a Class 2 Type ll_A CRISPR-Cas system employing the Cas nuclease of SEQ ID NO: 1 or variant thereof (including, for example, a
CRISPR nickase). In one embodiment, the Cas nuclease is a Class 2 Type I l_C CRISPR-Cas system employing the Cas nuclease of SEQ ID NO: 21 or variant thereof (including, for example, a CRISPR nickase). In one embodiment, the Cas nuclease is a Class 2 Type ll_A CRISPR-Cas system employing the Cas nuclease of SEQ ID NO: 40 or variant thereof (including, for example, a CRISPR nickase). In one embodiment, the Cas nuclease is a Class 2 Type ll_A CRISPR-Cas system employing the Cas nuclease of SEQ ID NO: 39 or variant thereof (including, for example, a CRISPR nickase). In one embodiment, the Cas nuclease is a Class 2 Type ll_A CRISPR-Cas system employing the Cas nuclease of SEQ ID NO: 48 or variant thereof (including, for example, a CRISPR nickase). Typically, Cas nucleases of type II comprise two nuclease domains, an HNH nuclease domain that cleaves the complementary DNA strand, and a RuvC-like nuclease domain that cleaves the non-complementary DNA strand.
Target recognition and cleavage by the Cas nuclease requires a chimeric RNA consisting of a fusion of crRNA (comprising a guide sequence and a partial direct repeat) and tracrRNA (trans - activating crRNA) and a short, conserved sequence motif downstream of the crRNA binding region, called a protospacer adjacent motif (PAM). In one embodiment, target recognition and cleavage by the Cas nuclease takes place with separate crRNA molecule and separate tracrRNA molecule, i.e., where the crRNA is not fused to the tracrRNA. In one embodiment, the Cas nuclease (e.g., SEQ ID NO: 21) targets the target DNA immediately adjacent to a 5’- nnGHMA PAM sequence.
In one embodiment, the Cas nuclease (e.g., SEQ ID NO: 1) derived from the bacterium Lactobacillus sp. targets the target DNA immediately adjacent to a 5’-nnAY PAM sequence.
In one embodiment, the Cas nuclease (e.g., SEQ ID NO: 40) derived from the bacterium Enterococcus asini targets the target DNA immediately adjacent to a 5’-nnAMA PAM sequence.
In one embodiment, the Cas nuclease (e.g., SEQ ID NO: 39) derived from the bacterium Enterococcus hermanniensis targets the target DNA immediately adjacent to a 5’-nnGTA PAM sequence.
In one embodiment, the Cas nuclease (e.g., SEQ ID NO: 48) derived from the bacterium Vagococcus penaei targets the target DNA immediately adjacent to a 5’-nnRHRD PAM sequence.
The RNA-guided Cas nuclease activity creates site-specific double strand breaks, which are then repaired by either non-homologous end joining (NHEJ) or homology-directed repair (HDR). It is understood that the term "Cas nuclease" encompasses variants thereof. crRNA: CRISPR RNA (crRNA) serves as the molecular guide for Cas nucleases, providing sequence specificity for the Cas nuclease to target and/or edit and/or regulate specific DNA and/or RNA sequences. crRNA sequence comprises spacers, which recognize a distinct DNA sequence (protospacer). The crRNA associates with the Cas nuclease and creates a ribonucleoprotein complex called the CRISPR-Cas effector complex. Due to the match of the spacer to the complimentary target DNA sequence, the Cas nuclease introduces a DNA-break
at the target site. The crRNA can be reprogrammed allowing precise and customizable genome editing.
DNA target site: The term “target sequence”, “target site”, or “DNA target site” means one or more DNA (e.g., genomic DNA) or RNA target sequence of interest that may be subject to a single-strand cut, or double-strand cut by the Cas nuclease, and/or induced or repressed by the Cas nuclease. Typically, the target site (protospacer sequence) is at least 15-20 nucleotides in length in order to allow its hybridization to the corresponding spacer sequence of the guide RNA. The target site can be located anywhere in the genome but will often be within a coding sequence or open reading frame. Non-limiting examples for target sites include genes, promoters, and other regulatory sequences such as enhancers, silencers, insulators, splicing sites, and untranslated regions (UTRs), including UTRs of genes and 5’-UTRs.
Preferably, the target site of interest is flanked by a functional PAM sequence for the selected Cas nuclease.
Expression: The term “expression” means any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, folding of the translated polypeptide into a functional structure, post-translational modification, and secretion.
Expression vector: An "expression vector" refers to a linear or circular DNA construct comprising a DNA sequence encoding a polypeptide, which coding sequence is operably linked to a suitable control sequence capable of effecting expression of the DNA in a suitable host. Such control sequences may include a promoter to effect transcription, an optional operator sequence to control transcription, a sequence encoding suitable ribosome binding sites on the mRNA, enhancers and sequences which control termination of transcription and translation.
Extension: The term “extension” means an addition of one or more amino acids to the amino and/or carboxyl terminus of a polypeptide, wherein the “extended” polypeptide has nuclease activity and/or DNA-binding activity.
Fragment: The term “fragment” means a polypeptide, a catalytic domain, or a DNA- binding module having one or more amino acids absent from the amino and/or carboxyl terminus of the mature polypeptide, catalytic domain, or binding module, wherein the fragment has nuclease or DNA-binding activity.
Fusion polypeptide: The term “fusion polypeptide” is a polypeptide in which one or more polypeptide is fused at the N-terminus and/or the C-terminus of a Cas nuclease of the present invention. A fusion polypeptide is produced by fusing a polynucleotide encoding another polypeptide to a polynucleotide of the present invention, or by fusing two or more polynucleotides of the present invention together. Techniques for producing fusion polypeptides are known in the art, and include ligating the coding sequences encoding the polypeptides so that they are in frame and that expression of the fusion polypeptide is under control of the same promoter(s) and terminator. Fusion polypeptides may also be constructed using intein technology in which fusion
polypeptides are created post-translationally (Cooper et al., 1993, EMBO J. 12: 2575-2583; Dawson et al., 1994, Science 266: 776-779).
Genomic DNA: As used herein, “genomic DNA” refers to linear and/or chromosomal DNA and/or to plasmid or other extrachromosomal DNA sequences present in the cell or cells of interest. In some embodiments, the cell of interest is a eukaryotic cell. In some embodiments, the cell of interest is a prokaryotic cell. In some embodiments, the methods produce double-stranded breaks (DSBs) at pre-determined target sites in a genomic DNA sequence, resulting in mutation, insertion, and/or deletion of DNA sequences at the target site(s) in a genome.
Guide sequence portion: The “guide sequence portion” or “spacer” of an RNA molecule refers to a nucleotide sequence that is capable of hybridizing to a specific target DNA sequence (protospacer), e.g., the guide sequence portion has a nucleotide sequence which is partially or fully complementary to the DNA sequence being targeted along the length of the guide sequence portion. In some embodiments, the guide sequence portion is 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49 or 50 nucleotides in length, preferably at least 18 nucleotides in length, such as at least 23 nucleotides in length, or approximately 17-50, 17-49, 17-48, 17-47, 17-46, 17-45, 17-44, 17-43, 17-42, 17-41 , 17-40, 17-39, 17-38, 17-37, 17-36, 17-35, 17-34, 17-33, 17-31 , 17-30, 17-29, 17-28, 17-27, 17- 26, 17-25, 17-24, 17-22, 17-21 , 18-25, 18-24, 18-23, 18-22, 18-21 , 19-25, 19-24, 19-23, 19-22, 19-21 , 19-20, 20-22, 18-20, 20-21 , 21-22, or 17-20 nucleotides in length. The entire length of the guide sequence portion is fully complementary to the DNA sequence being targeted along the length of the guide sequence portion. The guide sequence portion may be part of an RNA molecule that can form a complex with a Cas nuclease with the guide sequence portion serving as the DNA targeting portion of the CRISPR complex. When the DNA molecule having the guide sequence portion is present contemporaneously with the CRISPR molecule the RNA molecule is capable of targeting the Cas nuclease to the specific target DNA or RNA sequence. Each possibility represents a separate embodiment. An RNA molecule can be custom designed to target any desired sequence. Accordingly, a molecule comprising a “guide sequence portion” is a type of targeting molecule. Throughout this application, the terms “guide molecule,” “RNA guide molecule,” “guide RNA molecule,” and “gRNA molecule" are synonymous with a molecule comprising a guide sequence portion, and the term “spacer” is synonymous with a “guide sequence portion.”
In embodiments of the present invention, the Cas nuclease has its greatest cleavage activity when used with an RNA molecule comprising a guide sequence portion having 17, 18, 19 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides.
A single-guide RNA (sgRNA) molecule may be used to direct a Cas nuclease to a desired target site. The single-guide RNA comprises a guide sequence portion as well as a scaffold portion. The scaffold portion interacts with a Cas nuclease and, together with a guide sequence
portion, activates and targets the Cas nuclease to a desired target site. A scaffold portion may be further engineered, for example, to have a reduced size.
The gRNA in CRISPR-Cas genome editing constitutes the re-programmable part that makes the system so versatile. In most natural Cas systems, the gRNA is actually a complex of two RNA polynucleotides, a first crRNA containing about 15-30 nucleotides that determine the specificity of the Cas nuclease and the tracr RNA which hybridizes to the crRNA to form an RNA complex that interacts with Cas nuclease (see Jinek et al., 2012, A programmable dual- RNA- guided DNA endonuclease in adaptive bacterial immunity, Science 337: 816-821).
Since the discovery of the CRISPR-Cas system single polynucleotide gRNAs have been developed and successfully applied just as effectively as the natural two-part gRNA complex.
The spacer may be part of a targeting guide RNA molecule that can form a complex with a Cas nuclease with the spacer sequence serving as the targeting portion of the CRISPR complex. When the molecule having the spacer sequence is present contemporaneously with the CRISPR molecule, the RNA molecule is capable of targeting the Cas nuclease to the specific target sequence. Each possibility represents a separate embodiment. A targeting RNA molecule can be custom designed to target any desired sequence.
The term “targets” as used herein, refers to preferential hybridization of a spacer sequence to a nucleic acid having a targeted nucleotide sequence (protospacer). It is understood that the term “targets” encompasses variable hybridization efficiencies, such that there is preferential targeting of the nucleic acid having the targeted nucleotide sequence, but unintentional off-target hybridization in addition to on-target hybridization might also occur. It is understood that where an RNA molecule targets a sequence, a complex of the RNA molecule and a Cas nuclease molecule targets the sequence for nuclease activity.
In the context of targeting a DNA sequence that is present in a plurality of cells, it is understood that the targeting encompasses hybridization of the guide sequence portion of the RNA molecule with the sequence in one or more of the cells, and also encompasses hybridization of the RNA molecule with the target sequence in fewer than all of the cells in the plurality of cells. Accordingly, it is understood that where an RNA molecule targets a sequence in a plurality of cells, a complex of the RNA molecule and a Cas nuclease is understood to hybridize with the target sequence in one or more of the cells, and also may hybridize with the target sequence in fewer than all of the cells. Accordingly, it is understood that the complex of the RNA molecule and the Cas nuclease introduces a double strand break in relation to hybridization with the target sequence in one or more cells and may also introduce a double strand break in relation to hybridization with the target sequence in fewer than all of the cells. As used herein, the term “modified cells” or “cell comprising a genome modified by the Cas nuclease” refers to cells in which a double strand break is affected by a complex of an RNA molecule and the Cas nuclease as a result of hybridization with the target sequence, i.e. , on- target hybridization.
Heterologous: The term "heterologous" means, with respect to a host cell, that a polypeptide or nucleic acid does not naturally occur in the host cell. The term "heterologous" means, with respect to a polypeptide or nucleic acid, that a control sequence, e.g., promoter, of a polypeptide or nucleic acid is not naturally associated with the polypeptide or nucleic acid, i.e., the control sequence is from a gene other than the gene encoding the mature polypeptide.
HNH sequence: The HNH sequence comprises the HNH domain of a Cas nuclease. The HNH domain in Cas nucleases stands for "Histidine-Asparagine-Histidine." These conserved amino acid residues play a crucial role in the nuclease activity of this domain. The HNH domain is one of the two main types of nuclease domains found in CRISPR-associated (Cas) proteins. In the context of CRISPR systems, the HNH domain is responsible for cleaving the DNA strand that is complementary to the RNA guide strand, thereby creating a cut in the target DNA. This break is a key step in the CRISPR-Cas gene editing process, allowing for precise DNA modification.
Together with the RuvC domain, the HNH domain creates a double-strand break in the target DNA, allowing for gene editing or modification.
Host Strain or Host Cell: A "host strain" or "host cell" is an organism into which an expression vector, phage, virus, or other DNA construct, including a polynucleotide encoding a polypeptide of the present invention has been introduced. Exemplary host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing the Cas nuclease. The term "host cell" includes protoplasts created from cells.
Introduced: The term "introduced" in the context of inserting a nucleic acid sequence into a cell, means "transfection", "transformation" or "transduction," as known in the art.
Isolated: The term “isolated” means a polypeptide, nucleic acid, cell, or other specified material or component that has been separated from at least one other material or component, including but not limited to, other proteins, nucleic acids, cells, etc. An isolated polypeptide, nucleic acid, cell or other material is thus in a form that does not occur in nature. An isolated polypeptide includes, but is not limited to, a culture broth containing the secreted polypeptide expressed in a host cell.
Mature polypeptide: The term “mature polypeptide” means a polypeptide in its mature form following N-terminal and/or C-terminal processing (e.g., removal of signal peptide).
Mature polypeptide coding sequence: The term “mature polypeptide coding sequence” means a polynucleotide that encodes a mature Cas nuclease having nuclease activity and/or DNA binding activity.
Native: The term "native" means a nucleic acid or polypeptide naturally occurring in a host cell.
Nickase: The term “Nickase”, “CRISPR nickase”, or “nCas” means a nuclease having an inactivated RuvC domain, or an inactivated HNH domain. It is understood that a Cas nuclease, rather than losing nuclease activity to cleave all DNA, may lose the ability to cleave only the target
strand or only the non-target strand of a double-stranded DNA, thereby being functional as a nickase (see, Gao et al. (2016) CELL RES., 26: 901). Accordingly, in certain embodiments, a Cas nuclease is a nCas. In certain embodiments, a nCas has the activity to cleave the non- complementary strand but lacks substantially the activity to cleave the complementary strand, e.g., by a mutation in the HNH domain. For example, an nCas can have a mutation that reduces the function of the HNH domain, such as an H840A mutation in S. pyogenes Cas9 or a corresponding mutation in any Cas nuclease of the invention. In certain embodiments, a nCas has the cleavage activity to cleave the complementary strand but lacks substantially the activity to cleave the non-complementary strand, e.g., by mutation in the RuvC domain. For example, the nCas can have a mutation that reduces the function of the RuvC domain, such as a D10A mutation in S. pyogenes Cas9 or a corresponding mutation in any Cas nuclease of the invention.
Nuclear Localization Sequence: The terms "nuclear localization sequence" and "NLS" are used interchangeably to indicate an amino acid sequence/peptide that directs the transport of a protein with which it is associated from the cytoplasm of a cell across the nuclear envelope barrier. The term "NLS" is intended to encompass not only the nuclear localization sequence of a particular peptide, but also derivatives thereof that are capable of directing translocation of a cytoplasmic polypeptide across the nuclear envelope barrier. NLSs are capable of directing nuclear translocation of a Cas nuclease when attached to the N-terminus, the C-terminus, or both the N- and C-termini of the Cas nuclease. In addition, a polypeptide having an NLS coupled by its N- or C-terminus to amino acid side chains located randomly along the amino acid sequence of the polypeptide will be translocated. Typically, an NLS consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface, but other types of NLS are known. Non-limiting examples of NLSs include an NLS sequence derived from: the SV40 virus large T-antigen, nucleoplasmin, c-myc, the hRNPAI M9 NLS, the IBB domain from importinalpha, myoma T protein, human p53, mouse c- abl IV, influenza vims NS1 , Hepatitis virus delta antigen, mouse Mxl protein, human poly(ADP- ribose) polymerase, and the steroid hormone receptors (human) glucocorticoid.
Nucleic acid: The term "nucleic acid" encompasses DNA, RNA, heteroduplexes, and synthetic molecules capable of encoding a polypeptide. Nucleic acids may be single-stranded or double-stranded, and may have chemical modifications. The terms "nucleic acid" and "polynucleotide" are used interchangeably. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present compositions and methods encompass nucleotide sequences that encode a particular amino acid sequence. Unless otherwise indicated, nucleic acid sequences are presented in 5'-to-3' orientation.
Nucleic acid construct: The term "nucleic acid construct" means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which
is synthetic, and which comprises one or more control sequences operably linked to the nucleic acid sequence.
Operably linked: The term "operably linked" means that specified components are in a relationship (including but not limited to juxtaposition) permitting them to function in an intended manner. For example, a regulatory sequence is operably linked to a coding sequence such that expression of the coding sequence is under control of the regulatory sequence.
PAM: The term “PAM” or “protospacer adjacent motif” as used herein refers to a nucleotide sequence of a target DNA located in proximity to the targeted DNA sequence (protospacer) and recognized by the Cas nuclease, i.e., by the guide RNA forming a complex with the Cas nuclease and the target DNA. The PAM sequence may differ depending on the Cas nuclease identity. In some instances, a PAM is required for a complex of a Cas nuclease and a guide RNA to hybridize to and edit the target sequence. In some instances, the complex does not require a PAM to edit the target sequence.
Commonly accepted abbreviations that are used in the art as well as herein to represent ambiguity in nucleotide bases of the PAM include the following: R=G or A; Y=C or T ; M=A or C; K=G or T ; S=G or C; W=A or T ; H=A or C or T; B=G or T or C; V=G or C or A; D=G or A or T; N=A or C or G or T. Non-limiting examples of suitable PAM sequences for the Cas nucleases of the present invention are shown in Table 1.
Purified: The term “purified” means a nucleic acid, polypeptide or cell that is substantially free from other components as determined by analytical techniques well known in the art (e.g., a purified polypeptide or nucleic acid may form a discrete band in an electrophoretic gel, chromatographic eluate, and/or a media subjected to density gradient centrifugation). A purified nucleic acid or polypeptide is at least about 50% pure, usually at least about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, about 99.6%, about 99.7%, about 99.8% or more pure (e.g., percent by weight or on a molar basis). In a related sense, a composition is enriched for a molecule when there is a substantial increase in the concentration of the molecule after application of a purification or enrichment technique. The term "enriched" refers to a compound, polypeptide, cell, nucleic acid, amino acid, or other specified material or component that is present in a composition at a relative or absolute concentration that is higher than a starting composition.
In one aspect, the term "purified" as used herein refers to the polypeptide or cell being essentially free from components (especially insoluble components) from the production organism. In other aspects, the term "purified" refers to the Cas nuclease being essentially free of insoluble components from the native organism from which it is obtained. In one aspect, the Cas nuclease is separated from some of the soluble components of the organism and culture medium from which it is recovered. The polypeptide may be purified (j.e., separated) by one or more of the unit operations filtration, precipitation, or chromatography.
Accordingly, the Cas nuclease may be purified such that only minor amounts of other proteins, in particular, other polypeptides, are present. The term "purified" as used herein may refer to removal of other components, particularly other proteins and most particularly other enzymes present in the cell of origin of the polypeptide. The Cas nuclease may be "substantially pure", i.e., free from other components from the organism in which it is produced, e.g., a host organism for recombinantly produced polypeptide. In one aspect, the polypeptide is at least 40% pure by weight of the total polypeptide material present in the preparation. In one aspect, the polypeptide is at least 50%, 60%, 70%, 80% or 90% pure by weight of the total polypeptide material present in the preparation. As used herein a "substantially pure polypeptide" may denote a Cas nuclease preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1 %, and even most preferably at most 0.5% by weight of other polypeptide material with which the Cas nuclease is natively or recombinantly associated.
It is, therefore, preferred that the substantially pure Cas nuclease or fusion polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99% pure, most preferably at least 99.5% pure by weight of the total polypeptide material present in the preparation. The polypeptide of the present invention is preferably in a substantially pure form i.e., the preparation is essentially free of other polypeptide material with which it is natively or recombinantly associated). This can be accomplished, for example by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
Recombinant: The term "recombinant" is used in its conventional meaning to refer to the manipulation, e.g., cutting and rejoining, of nucleic acid sequences to form constellations different from those found in nature. The term recombinant refers to a cell, nucleic acid, polypeptide or vector that has been modified from its native state. Thus, for example, recombinant cells express genes that are not found within the native (non-recombinant) form of the cell, or express native genes at different levels or under different conditions than found in nature. The term “recombinant” is synonymous with “genetically modified” and “transgenic”.
Recover: The terms "recover" or “recovery” means the removal of a polypeptide from at least one fermentation broth component selected from the list of a cell, a nucleic acid, or other specified material, e.g., recovery of the polypeptide from the whole fermentation broth, or from the cell-free fermentation broth, by polypeptide crystal harvest, by filtration, e.g., depth filtration (by use of filter aids or packed filter medias, cloth filtration in chamber filters, rotary-drum filtration, drum filtration, rotary vacuum-drum filters, candle filters, horizontal leaf filters or similar, using sheet or pad filtration in framed or modular setups) or membrane filtration (using sheet filtration, module filtration, candle filtration, microfiltration, ultrafiltration in either cross flow, dynamic cross
flow or dead end operation), or by centrifugation (using decanter centrifuges, disc stack centrifuges, hydro cyclones or similar), or by precipitating the polypeptide and using relevant solid-liquid separation methods to harvest the polypeptide from the broth media by use of classification separation by particle sizes. Recovery encompasses isolation and/or purification of the polypeptide.
Reverse transcriptase: The term "reverse transcriptase" or “RT” describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5'-3' RNA-directed DNA polymerase activity, 5'-3' DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5' and 3' ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3'-5' exonuclease activity necessary for proofreading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al., Biochemistry 22:2365-2372 (1983). Another reverse transcriptase which is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M- MLV). See, e.g., Gerard, G. R., DNA 5:271-279 (1986) and Kotewicz, M. L., et al., Gene 35:249- 258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No. 5,244,797.
Reverse transcriptase, when fused with a Cas nuclease, offers a versatile and precise approach to gene editing. It allows for the conversion of RNA targets into DNA, enhancing the specificity and accuracy of Cas-mediated gene editing while enabling simultaneous manipulation of both DNA and RNA molecules for a wide range of applications in genetics and molecular biology.
The disclosure contemplates any wild-type RT obtained from any naturally occurring organism or virus, or obtained from a commercial or non-commercial source. In addition, the reverse transcriptases usable in the fusion polypeptides of the disclosure can include any naturally occurring mutant RT, engineered mutant RT, or other variant RT, including truncated variants that retain function. The RTs may also be engineered to contain specific amino acid substitutions, such as those specifically disclosed herein.
RTs are multi-functional enzymes typically with three enzymatic activities including RNA- and DNA-dependent DNA polymerization activity, and an RNaseH activity that catalyzes the cleavage of RNA in RNA-DNA hybrids. Some mutants of RTs have disabled the RNaseH moiety
to prevent unintended damage to the mRNA. These enzymes that synthesize complementary DNA (cDNA) using mRNA as a template were first identified in RNA viruses.
Exemplary enzymes for use with the herein disclosed fusion polypeptide can include, but are not limited to, M-MLV reverse transcriptase and RSV reverse transcriptase. Enzymes having RT activity are commercially available. Some exemplary reverse transcriptases that can be fused to CRISPR nucleases or provided as individual proteins according to various embodiments of this disclosure are provided below.
A person of ordinary skill in the art will recognize that wild-type RTs, including but not limited to, Moloney Murine Leukemia Virus (M-MLV); Human Immunodeficiency Virus (HIV) reverse transcriptase and avian Sarcoma-Leukosis Virus (ASLV) reverse transcriptase, which includes but is not limited to Rous Sarcoma Virus (RSV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, Avian Erythroblastosis Virus (AEV) Helper Virus MCAV reverse transcriptase, Avian Myelocytomatosis Virus MC29 Helper Virus MCAV reverse transcriptase, Avian Reticuloendotheliosis Virus (REV-T) Helper Virus REV-A reverse transcriptase, Avian Sarcoma Virus UR2 Helper Virus UR2AV reverse transcriptase, Avian Sarcoma Virus Y73 Helper Virus YAV reverse transcriptase, Rous Associated Virus (RAV) reverse transcriptase, and Myeloblastosis Associated Virus (MAV) reverse transcriptase may be suitably used in the subject methods and composition described herein. In some embodiments, the RT may be any RT described in WO 2020/191248, the contents of which is herein incorporated by reference.
In some embodiments, a suitable reverse transcriptase may be any reverse transcriptase described in WO2020191233, WO2020191233, WO2020191243, WO2020191246,
WO2020191245, WO2020191234, WO2020191233, W02020191241 , US20200085066, US20200109398, US20200109398, WO2020191239, WO2020191245, and WO 2020191248, the contents of each of which are incorporated herein by reference in their entirety.
RuvC sequence: In the context of CRISPR-Cas systems, the RuvC sequence comprises or consists of a "RuvC" or "RuvC-like" domain. RuvC stands for "Resolved Holliday Junction nuclease" and is responsible for cleaving the DNA strand opposite to the DNA strand which is complementary to the RNA guide strand. Together with the HNH domain, the RuvC domain(s) creates a double-strand break in the target DNA, allowing for gene editing or modification. In some embodiments, the Cas nuclease comprises 3 RuvC domains, e.g., a RuvC I, a RuvC II and a RuvC III domain.
Sequence identity: The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
For purposes of the present invention, the sequence identity between two amino acid sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open
Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. In order for the Needle program to report the longest identity, the nobrief option must be specified in the command line. The output of Needle labelled “longest identity” is calculated as follows:
(Identical Residues x 100)/(Length of Alignment - Total Number of Gaps in Alignment)
For purposes of the present invention, the sequence identity between two polynucleotide sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NLIC4.4) substitution matrix. In order for the Needle program to report the longest identity, the nobrief option must be specified in the command line. The output of Needle labelled “longest identity” is calculated as follows:
(Identical Deoxyribonucleotides x 100)/(Length of Alignment - Total Number of Gaps in Alignment)
Signal Peptide: A "signal peptide" is a sequence of amino acids attached to the N- terminal portion of a protein, which facilitates the secretion of the protein outside the cell. The mature form of an extracellular protein lacks the signal peptide, which is cleaved off during the secretion process.
Subsequence: The term “subsequence” means a polynucleotide having one or more nucleotides absent from the 5' and/or 3' end of a mature polypeptide-coding sequence, wherein the subsequence encodes a fragment having nuclease activity. tracrRNA: trans-activating CRISPR RNA (tracrRNA), is a class of RNA molecules forming an integral component of the CRISPR-Cas system. The tracrRNA serves as a scaffold or scaffold-like molecule that facilitates the binding of Cas nucleases to the CRISPR RNA (crRNA) molecule. In this complex, tracrRNA interacts with the Cas nuclease to form a ribonucleoprotein complex that recognizes and binds to the target DNA or RNA sequence, guiding the Cas nuclease to the precise site for cleavage or editing. Non-limiting examples of tracrRNA coding sequences are listed in column 3 of Table 4.
Variant: The term “variant” means a Cas nuclease having nuclease and/or DNA-binding activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions. A substitution means replacement of the amino acid occupying a position with a different amino acid; a deletion means removal of the amino acid occupying a position; and an insertion means adding amino acids (e.g., 1-5 amino acids, 1-3 amino acids, or, in particular, 1 amino acid) adjacent to and immediately following the amino acid occupying a position.
Wild-type: The term "wild-type" in reference to an amino acid sequence or nucleic acid sequence means that the amino acid sequence or nucleic acid sequence is a native or naturally- occurring sequence. As used herein, the term "naturally-occurring" refers to anything (e.g., proteins, amino acids, or nucleic acid sequences) that is found in nature. Conversely, the term "non-naturally occurring" refers to anything that is not found in nature (e.g., compositions produced in the laboratory or during manufacturing, and/or recombinant nucleic acids and protein sequences produced in the laboratory or modification of the wild-type sequence). In embodiments of the present invention, an engineered Cas nuclease is a variant Cas nuclease comprising at least one amino acid modification (e.g., substitution, deletion, and/or insertion) compared to the Cas nuclease of any of the Cas nucleases indicated in column 1 of Table 4.
Detailed Description of the Invention
Cas nucleases
In a 1st aspect, the invention relates to Cas nucleases selected from the group consisting of:
(a) a polypeptide having at least 70% sequence identity to any of the amino acid sequences of SEQ ID NOs: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52;
(b) a polypeptide encoded by a polynucleotide having at least 70% sequence identity to any of the polypeptide coding sequences of SEQ ID NOs: 73, 100, 53, 92, 91 , or 81 , or to any of SEQ ID NOs: 53 - 104, or to any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520, 528, 549 or 550;
(c) a polypeptide derived from any one of SEQ ID NOs: 21 , 48, 1 , 40, 39, or 29, or any one of SEQ ID NOs: 1 - 52, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations), in particular substitutions, such as conservative amino acid substitutions;
(d) a polypeptide having a TM-score of at least 0.80 compared to the three- dimensional structure of the polypeptide of any one of SEQ ID Nos: 21 , 48, 1 , 40, 39, or 29, or any one of SEQ ID NOs: 1 - 52, wherein the three-dimensional structure is calculated using Alphafold;
(e) a polypeptide derived from the polypeptide of (a), (b), (c), or (d), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and
(f) a fragment of the polypeptide of (a), (b), (c), (d), or (e).
In one embodiment, the Cas nuclease selected from the group consisting of:
(a) a polypeptide having at least 70%, e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence
of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52;
(b) a polypeptide encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, or any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550;
(c) a polypeptide derived from SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions, such as conservative amino acid substitutions;
(d) a polypeptide having a TM-score of at least 0.80, e.g., at least 0.85, at least 0.90, at least 0.91 , at least 0.92, at least 0.93, at least 0.94, at least 0.95, at least 0.96, at least 0.97, at least 0.98, at least 0.99, or even 1.0, compared to the three-dimensional structure of the polypeptide of SEQ ID NO:1 , or of any one of SEQ ID NOs: 1-52, wherein the three-dimensional structure is calculated using Alphafold;
(e) a polypeptide derived from the polypeptide of (a), (b), (c), or (d), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and
(f) a fragment of the polypeptide of (a), (b), (c), (d) or (e).
In one embodiment, the nuclease is having nuclease activity.
In one embodiment, the nuclease is having DNA-binding activity.
In one embodiment, the nuclease is having nuclease activity and DNA-binding activity.
In one embodiment, the nuclease comprises or consists of an amino acid sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
In one embodiment, the nuclease is comprising, consisting essentially of, or consisting of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
In one embodiment, the nuclease is a fragment of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, wherein the fragment preferably contains at least 600 amino acid residues (e.g., amino acids 9 to 640 of SEQ ID NO: 1 , amino acids 13 to 628 of SEQ ID NO: 39, amino acids 16 to 637 of SEQ ID NO: 40, amino acids 10 to 637 of SEQ ID NO: 41 , amino acids 10 to 639 of SEQ ID NO: 42, amino acids 10 to 636 of SEQ ID NO: 43, amino acids 10 to 635 of SEQ ID NO: 44, amino acids 9 to 640 of SEQ ID NO: 45, amino acids 10 to 637 of SEQ ID NO: 46, amino acids 10 to 633 of SEQ ID NO: 47, amino acids 12 to 632 of SEQ ID NO: 48, amino acids 8 to 620 of SEQ ID NO: 21 , amino acids 9 to 640 of SEQ ID NO: 51 , or amino acids 9 to 640 of SEQ ID NO: 52).
In one embodiment, the nuclease is comprising, consisting essentially of, or consisting of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ
ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52.
In one embodiment, the nuclease is encoded by a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the mature polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, or to any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512- 520,528, 549 or 550.
In one embodiment, the nuclease is comprising an N-terminal extension and/or C-terminal extension of 1-10 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids, preferably and extension of 1-10 amino acid residues in the N- terminus and/or 1-10 amino acids in the C- terminus, such as 1-5 amino acids.
In one embodiment, the nuclease is having at most 10%, at most 9%, at most 8%, at most 7%, at most 6%, at most 5%, at most 4%, at most 3%, at most 2% or at most 1 % sequence differences to the polypeptide of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID
NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52.
In one embodiment, the nuclease differs from the polypeptide of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52 by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
In one embodiment, the nuclease is comprising one or more functional RuvC domain.
In one embodiment, the nuclease is comprising two or more functional RuvC domains.
In one embodiment, the nuclease is comprising three or more functional RuvC domains.
In one embodiment, the nuclease is comprising one or more functional HNH domain.
In one embodiment, the nuclease is comprising one or more domain selected from the group consisting of:
(a) a RuvC domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 105 - 143 or 313 - 318;
(b) a HNH domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320;
(c) a RuvC domain derived from any one of SEQ ID NOs: 105 - 143 or 313 - 318 by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 105 - 143 or 313 - 318;
(d) a HNH domain derived from any one of SEQ ID NOs: 144 - 156 or 319-320 by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 144 - 156 or 319- 320; and
(e) a fragment of the catalytic domain of (a), (b), (c), or (d); preferably wherein the nuclease has nuclease activity, or wherein the nuclease has nickase activity.
In one embodiment, the HNH domain has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at
least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320.
In one embodiment, the HNH domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NO: 144, 146, 145, 154, 319, or 320.
In one embodiment, the HNH domain is a variant of any one of SEQ ID NOs: 144 - 156 or 319-320 comprising a substitution, such as a conservative amino acid substitution, a deletion, and/or an insertion at one or more positions.
In one embodiment, the HNH domain differs from any one of SEQ ID NOs: 144 - 156 or 319-320 by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
In one embodiment, the HNH domain is a fragment of any one of SEQ ID NOs: 144 - 156 or 319-320, wherein the fragment preferably contains at least 20 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1), or at least 27 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1).
In one embodiment, the HNH domain comprises, consists essentially of, or consists of any one of SEQ ID NOs: 144 - 156 or 319-320.
In one embodiment, the RuvC domain has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 105 - 143 or 313 - 318.
In one embodiment, the RuvC domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
In one embodiment, the RuvC domain is a variant of any one of SEQ ID NOs: 105 - 143 or 313 - 318 comprising a substitution, such as a conservative amino acid substitution, a deletion, and/or an insertion at one or more positions.
In one embodiment, the RuvC domain differs from any one of SEQ ID NOs: 105 - 143 or 313 - 318 by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
In one embodiment, the RuvC domain is a fragment of any one of SEQ ID NOs: 105 - 143 or 313 - 318, wherein the fragment preferably contains at least 10 amino acid residues (e.g., amino acids 5 to 20 of SEQ ID NO: 1).
In one embodiment, the RuvC domain comprises, consists essentially of, or consists of any one of SEQ ID NOs: 105 - 143 or 313 - 318.
In one embodiment, the nuclease has double-strand break activity towards a DNA target site.
In one embodiment, sequence identity is determined by the method described in the definition section under “Sequence Identity”.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a eukaryotic cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a mammalian cell, e.g., a non-human mammalian cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a E. coli cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a E. coli cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 512-520.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus licheniformis cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus licheniformis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or
100% identity to the nucleotide sequence of any of SEQ ID NOs: 405, 416, 417, 434, 449, 465, or 466.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 528, 549 or 550.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Lb. paracasei (Lacticaseibacillus paracasei or Lactobacillus paracasei) cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a S. thermophilus cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a filamentous fungal cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in an Aspergillus niger cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Aspergillus niger cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or
100% identity to the nucleotide sequence of any of SEQ ID NOs: 347, 349, 351 , or 353.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a P. pastoris cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in an Aspergillus oryzae cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Trichoderma reesei cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a Lactobacillus cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a probtiotic cell.
In one embodiment, the polynucleotide encoding the nuclease is codon-optimized for expression in a S. cerevisiae cell.
In one embodiment, the nuclease is a Class 2 Cas nuclease.
In one embodiment, the nuclease is a Class 2 Type II Cas nuclease.
In one embodiment, the nuclease is a Class 2 Type-ll-A Cas nuclease.
In one embodiment, the nuclease is a Class 2 Type-ll-B Cas nuclease.
In one embodiment, the nuclease is a Class 2 Type I l-C Cas nuclease.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence provided for the nuclease in Table 1.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnAY”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnGHMA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnGTA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnAMA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “nnRHRD”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “ATGTCA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “OGATA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “TTACA”.
In one embodiment, the nuclease utilizes a protospacer adjacent motif (PAM) sequence with the sequence “TTACAA”.
In one embodiment, the nuclease is non-naturally occurring, e.g., wherein the nuclease is engineered and comprises unnatural or synthetic amino acids.
In one embodiment, the nuclease is naturally occuring.
In an aspect, the Gas nuclease is isolated.
In another aspect, the Gas nuclease is purified.
Protospacer Adjacent Motif (PAM) Sequences
Gas nucleases of the present disclosure may cleave, nick, or bind to a target nucleic acid within or near a protospacer adjacent motif (PAM) sequence of the target nucleic acid. In some embodiments, the target nucleic acid is a double-stranded nucleic acid comprising a target strand and a non-target strand. In some embodiments, cleavage occurs within 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides of a 5’ or 3’ terminus of a PAM sequence. In some embodiments, effector Gas nucleases described herein recognize a PAM sequence. In some embodiments, recognizing a PAM sequence comprises interacting with a sequence adjacent to the PAM. In some embodiments, a target nucleic acid comprises a target sequence that is adjacent to a PAM sequence. In some embodiments, the Gas nuclease does not require a PAM to bind and/or cleave a target nucleic acid. Examples of identified PAM sequences are shown in Table 1.
In some embodiments, a target nucleic acid is a single-stranded target nucleic acid comprising a target sequence. Accordingly, in some embodiments, the single-stranded target nucleic acid comprises a PAM sequence described herein that is adjacent (e.g., within 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides) or directly adjacent to the target sequence. In some embodiments, a complex comprising the Gas nuclease and a guide RNA cleaves the single-stranded target nucleic acid.
In some embodiments, a target nucleic acid is a double-stranded nucleic acid comprising a target strand and a non-target strand, wherein the target strand comprises a target sequence. In some embodiments, the PAM sequence is located on the target strand. In some embodiments, the PAM sequence is located on the non-target strand. In some embodiments, the PAM sequence described herein is adjacent (e.g., within 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides) to the target sequence on the target strand or the non-target strand. In some
embodiments, such a PAM described herein is directly adjacent to the target sequence on the target strand or the non-target strand. In some embodiments, a complex comprising the Cas nuclease and a guide RNA cleaves the target strand or the non-target strand. In some embodiments, the complex cleaves both, the target strand and the non-target strand. In some embodiments, the complex recognizes the PAM sequence, and hybridizes to a target sequence of the target nucleic acid. In some embodiments, the complex cleaves the target nucleic acid, wherein the complex has recognized the PAM sequence and is hybridized to the target sequence.
In some embodiments, a Cas nuclease described herein, or a multimeric complex thereof, recognizes a PAM on a target nucleic acid. In some embodiments, multiple Cas nucleases of the multimeric complex recognize a PAM on a target nucleic acid. In some embodiments, at least two of the multiple Cas nucleases recognize the same PAM sequence. In some embodiments, at least two of the multiple Cas nucleases recognize different PAM sequences. In some embodiments, only one Cas nuclease of the multimeric complex recognizes a PAM on a target nucleic acid.
A Cas nuclease of the present disclosure, or a multimeric complex thereof, may cleave or nick a target nucleic acid within or near a protospacer adjacent motif (PAM) sequence of the target nucleic acid. In some embodiments, cleavage occurs within 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides of a 5’ or 3’ terminus of a PAM sequence.
In some embodiments, compositions and methods described herein do not comprise a PAM sequence. In some embodiments, Cas nucleases do not recognize a PAM sequence. In some embodiments, compositions and methods described herein comprise a protospacerflanking site (PFS) sequence. A PFS sequence may be useful for the detection and/or modification of RNA.
Removal or Reduction of Cas nuclease Activity
The present invention also relates to methods of producing a variant of a Cas nuclease, which comprises mutating, e.g., deletion, insertion, or substitution of a polynucleotide, the polynucleotide encoding a Cas nuclease of the present invention, which results in a variant having reduced nuclease activity (e.g. only nickase activity), or no nuclease activity (e.g. a catalytically inactive Cas nuclease).
The variant may be constructed by mutation of the polynucleotide using methods well known in the art, for example, one or more nucleotide insertions, one or more nucleotide replacements, or one or more nucleotide deletions.
In one embodiment, the nuclease comprises an amino acid substitution, insertion, or deletion in the one or more RuvC domain.
In one embodiment, the nuclease comprises an amino acid substitution, insertion, or deletion in the one or more HNH domain.
In one embodiment, the nuclease has a single-stranded break activity towards a DNA target site.
In one embodiment, the nuclease is a catalytically dead nuclease, e.g., due to inactivation /mutation of at least one RuvC domain and at least one HNH domain.
In one embodiment, the catalytically dead nuclease comprises one or more inactivated RuvC domain and one or more inactivated HNH domain.
In one embodiment, the catallytically dead nuclease comprising one or more inactivated RuvC domain and one or more inactivated HNH domain is created by one or more amino acid substitution, deletion or insertion at the positions provided for the nuclease in column 3 of Table 2 or column 3 of Table 3, respectively.
In one embodiment, the RuvC domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D9 corresponding to position 9 of SEQ ID NO: 1 .
In one embodiment, the RuvC domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D8 corresponding to position 8 of SEQ ID NO: 21.
In one embodiment, the RuvC domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D16 corresponding to position 16 of SEQ ID NO: 40.
In one embodiment, the RuvC domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D13 corresponding to position 13 of SEQ ID NO: 39.
In one embodiment, the RuvC domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D12 corresponding to position 12 of SEQ ID NO: 48.
In one embodiment, the HNH domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D613 corresponding to position 613 of SEQ ID NO: 1.
In one embodiment, the HNH domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H614 corresponding to position 614 of SEQ ID NO: 1.
In one embodiment, the HNH domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N637 corresponding to position 637 of SEQ ID NO: 1.
In one embodiment, the HNH domain of SEQ ID NO:1 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K640 corresponding to position 640 of SEQ ID NO: 1.
In one embodiment, the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D593 corresponding to position 593 of SEQ ID NO: 21.
In one embodiment, the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H594 corresponding to position 594 of SEQ ID NO: 21.
In one embodiment, the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N617 corresponding to position 617 of SEQ ID NO: 21.
In one embodiment, the HNH domain of SEQ ID NO:21 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K620 corresponding to position 620 of SEQ ID NO: 21.
In one embodiment, the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D610 corresponding to position 610 of SEQ ID NO: 40.
In one embodiment, the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H611 corresponding to position 611 of SEQ ID NO: 40.
In one embodiment, the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N634 corresponding to position 634 of SEQ ID NO: 40.
In one embodiment, the HNH domain of SEQ ID NQ:40 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K637 corresponding to position 637 of SEQ ID NO: 40.
In one embodiment, the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D601 corresponding to position 601 of SEQ ID NO: 39.
In one embodiment, the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H602 corresponding to position 602 of SEQ ID NO: 39.
In one embodiment, the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N625 corresponding to position 625 of SEQ ID NO: 39.
In one embodiment, the HNH domain of SEQ ID NO:39 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K628 corresponding to position 628 of SEQ ID NO: 39.
In one embodiment, the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid D605 corresponding to position 605 of SEQ ID NO: 48.
In one embodiment, the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid H606 corresponding to position 606 of SEQ ID NO: 48.
In one embodiment, the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid N629 corresponding to position 629 of SEQ ID NO: 48.
In one embodiment, the HNH domain of SEQ ID NO:48 is inactivated by mutation, e.g. deletion, insertion or substitution, of amino acid K632 corresponding to position 632 of SEQ ID NO: 48.
In one aspect, the polypeptide is derived from SEQ ID NO: 1 - 52 by substitution, deletion or addition of one or several amino acids. In some embodiments, the polypeptide is a variant of SEQ ID NOs: 1 - 52 comprising a substitution, deletion, and/or insertion at one or more positions. In one aspect, the number of amino acid substitutions, deletions and/or insertions introduced into the polypeptide of SEQ ID NOs: 1-52 is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15. The amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding of the Cas nuclease; small deletions, typically of 1-30 amino acids; small amino or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a polyhistidine tract, an antigenic epitope or a binding module.
In one embodiment, the nuclease is a nickase having one or more inactivated RuvC domain created by an amino acid substitution, insertion, or deletion at a position provided for the nuclease in column 3 of Table 2.
In some embodiments, the RuvC domain is derived from an amino acid sequence provided for in column 2 of Table 2, and/or at one or more positions provided for in column 3 of Table 2, by substitution, deletion or addition of one or several amino acids. In one aspect, the number of amino acid substitutions, deletions and/or insertions introduced into the RuvC domain is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15. The amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding of the protein; small deletions, typically of 1-30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tag, an antigenic epitope or a binding module.
Table 2. RuvC domains of the novel Cas nucleases.
In one embodiment, the nuclease is a nickase having one or more inactivated HNH domain created by an amino acid substitution, insertion or deletion at a position provided for the nuclease in column 3 of Table 3.
Table 3. HNH domains of the novel Cas nucleases
In some embodiments, the HNH domain is derived from an amino acid sequence provided in column 2 of Table 3, and/or at one or more positions provided in column 3 of Table 3, by substitution, deletion or addition of one or several amino acids. In one aspect, the number of amino acid substitutions, deletions and/or insertions introduced into the HNH domain is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15. The amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding of the protein; small deletions, typically of 1-30 amino acids; small amino or carboxyl- terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tract, an antigenic epitope or a binding module.
Essential amino acids in a polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at each residue in the molecule, and the resultant molecules are tested for nuclease activity to identify amino acid residues that are critical to the activity of the molecule. See also,
Hilton et al., 1996, J. Biol. Chem. 271 : 4699-4708. The active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et al., 1992, J. Mol. Biol. 224: 899- 904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64. The identity of essential amino acids can also be inferred from an alignment with a related polypeptide, and/or be inferred from sequence homology and conserved catalytic machinery with a related polypeptide or within a polypeptide or protein family with polypeptides/proteins descending from a common ancestor, typically having similar three-dimensional structures, functions, and significant sequence similarity. Additionally, or alternatively, protein structure prediction tools can be used for protein structure modelling to identify essential amino acids and/or active sites of polypeptides. See, for example, Jumper et al., 2021 , “Highly accurate protein structure prediction with AlphaFold”, Nature 596: 583-589.
Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241 : 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22625. Other methods that can be used include error-prone PCR, phage display (e.g., Lowman eta!., 1991 , Biochemistry 30: 10832-10837; US 5,223,409; WO 92/06204), and region-directed mutagenesis (Derbyshire et al., 1986, Gene 46: 145; Ner et al., 1988, DNA 7: 127).
In a 2nd aspect, the invention relates to a fusion polypeptide, comprising the Cas nuclease of any one of the 1st aspect, and one or more second polypeptide.
In one embodiment, the one or more second polypeptide comprises a polypeptide that localizes to one or more subcellular organelles.
In one embodiment, the one or more second polypeptide is a nuclear localization sequence (NLS), a cell penetrating peptide, and/or an affinity tag.
In one embodiment, the fusion polypeptide comprises 1-10 or more NLS at or near the amino-terminus, 1-10 or more NLS at or near the carboxy-terminus, or a combination of 1-10 or more NLS at or near the amino-terminus and 1-10 or more NLS at or near the carboxy-terminus.
In one embodiment, the fusion polypeptide comprises 1-4 NLS.
In one embodiment, the fusion polypeptide comprises one NLS.
In one embodiment, the one or more NLS is located within the open-reading frame (ORF) of the nuclease.
In one embodiment, the one or more NLS are in tandem repeats.
In one embodiment, the fusion polypeptide comprises a first NLS and a second NLS.
In one embodiment, the fusion polypeptide comprises a linker sequence between the first NLS and the second NLS.
In one embodiment, the linker between the first NLS and the second NLS comprises at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 amino acids.
In one embodiment, the one or more second polypeptide comprises a base-editing polypeptide.
In one embodiment, the base-editing polypeptide comprises a base editor domain.
In one embodiment, the fusion polypeptide comprises a linker between the Cas nuclease and the base-editing polypeptide.
In one embodiment, the base-editing polypeptide comprises a deaminase, e.g., a cytidine deaminase, such as a APOBEC3A deaminase, or an adenosine deaminase.
In one embodiment, the one or more second polypeptide comprises a reverse transcriptase, the reverse transcriptase preferably comprising a reverse transcriptase domain.
In one embodiment, the nuclease is fused to one or more NLS of sufficient strength to drive accumulation of a CRISPR complex comprising the Cas nuclease in a detectable amount in the nucleus of a eukaryotic cell.
In one embodiment, sequence identity is determined by the method described in the definition section under “Sequence Identity”.
In an aspect, the fusion polypeptide is isolated.
In another aspect, the fusion polypeptide is purified.
Sources of Cas nucleases
A Cas nuclease of the present invention may be obtained from microorganisms of any genus. For purposes of the present invention, the term “obtained from” as used herein in connection with a given source shall mean that the polypeptide encoded by a polynucleotide is produced by the source or by a strain in which the polynucleotide of the invention has been inserted. In one aspect, the polypeptide obtained from a given source is not secreted extracellularly.
In one aspect, the Cas nuclease is obtained from bacterial cells.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Streptococcus cell. In one embodiment, the Streptococcus cell is a Streptococcus equinus, Streptococcus mutans, Streptococcus sp., Streptococcus henryi DSM 19005, Streptococcus sp. CCH8-G7, Streptococcus pacificus, Streptococcus orisratti DSM 15617, Streptococcus salivarius, or Streptococcus ruminantium cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Bacillus cell, e.g., a Bacillus sp-63030 cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Turicibacter cell, e.g., a Turicibacter sp. cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Ureibacillus cell, e.g., a Ureibacillus thermosphaericus cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Clostridia cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Ruminococcus cell, e.g., a Ruminococcus sp. cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from an Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Vagococcus cell, e.g., a Vagococcus penaei cell.
In one embodiment, the Cas nuclease is a polypeptide obtained from a Lactobacillus cell. In one embodiment, the Lactobacillus cell is a Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
In one embodiment, the Cas nuclease is obtained from or obtainable from a Streptococcus cell, e.g., a Streptococcus equinus, Streptococcus mutans, Streptococcus sp., Streptococcus henryi DSM 19005, Streptococcus sp. CCH8-G7, Streptococcus pacificus, Streptococcus orisratti DSM 15617, Streptococcus salivarius, or Streptococcus ruminantium cell, from a Bacillus cell, e.g., a Bacillus sp-63030 cell, from a Turicibacter cell, e.g., a Turicibacter sp. cell, from a Ureibacillus cell, e.g., a Ureibacillus thermosphaericus cell, from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell, from a Clostridia cell, from a Ruminococcus cell, e.g., a Ruminococcus sp. cell, from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell, from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell, from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell, from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell, or from a Vagococcus cell, e.g., a
Vagococcus penaei cell, preferably of a Enterococcus asini cell, a Enterococcus hermanniensis cell, a Vagococcus penaei cell, or a Lentihominibacter hominis cell.
In one embodiment, the Cas nuclease is obtained from or obtainable from a Lactobacillus cell, e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
It will be understood that for the aforementioned species, the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivalents.
The Cas nuclease may be identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) or DNA samples obtained directly from natural materials (e.g., soil, composts, water, etc.) using the above-mentioned probes. Techniques for isolating microorganisms and DNA directly from natural habitats are well known in the art. A polynucleotide encoding the Cas nuclease may then be obtained by similarly screening a genomic DNA or cDNA library of another microorganism or mixed DNA sample. Once a polynucleotide encoding a Cas nuclease has been detected with the probe(s), the polynucleotide can be isolated or cloned by utilizing techniques that are known to those of ordinary skill in the art (see, e.g., Davis etal., 2012, Basic Methods in Molecular Biology, Elsevier).
AlphaFold structure prediction
AlphaFold is a computational method for predicting the three-dimensional structure of a polypeptide from its amino acid sequence (Jumper et al., Highly accurate protein structure prediction with AlphaFold. Nature, 2021). Predicted structures for millions of polypeptides deposited in the UniProt database have been deposited in the AlphaFold Protein Structure Database, using the AlphaFold Monomer v2.0 model (Varadi et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high- accuracy models. Nucleic Acids Research, 2021). In the AlphaFold Protein Structure Database, the three-dimensional structure of a polypeptide can be obtained by searching for the UniProt accession number of the polypeptide.
In addition to the many three-dimensional structures that are already publicly available, code is available for reproducing and predicting structures of new polypeptides at source code repositories such as Github.com under deepmind/alphafold/, using notebooks/AlphaFold.ipynb, which uses Alphafold v2.3.1 or newer. Additionally, it can be found in Github.com under sokrypton/ColabFold using v1.5.2 or newer, using AlphaFold2.ipynb. For technical details, please see Jumper et al. (vide supra).
AlphaFold produces a per-residue estimate of its confidence on a scale from 0 to 100. This confidence measure is called pLDDT and corresponds to the model’s predicted score on the IDDT-Ca metric. It is stored in the B-factor fields of the mmCIF and PDB files available for download (although unlike a B-factor, higher pLDDT is better). Regions with pLDDT score of more than 90 are expected to be modelled to high accuracy. These should be suitable for any application that benefits from high accuracy (e.g., characterization of binding sites). Regions with a pLDDT score between 70 and 90 are expected to be modelled well, corresponding to a generally good backbone prediction.
Structural Similarity
The relatedness between two amino acid sequences has conventionally been described by the parameter “sequence identity”. However, since the biological function of a polypeptide is defined by it’s three-dimensional structure rather than its amino acid sequence, a better way of assessing a functional relationship between polypeptides is by comparing their three-dimensional structures. Thus, for the purposes of the present invention, the relatedness between the three- dimensional structure of two polypeptides is described by the parameter “structural similarity”.
A three-dimensional structure of any polypeptide may be obtained experimentally via, e.g., X-ray crystallography or using in silico methods such as AlphaFold (vide supra). The structural similarity between three-dimensional structures may then be determined by the TM-score, which is calculated using the following general formula (Zhang & Skolnick, Proteins 57:702-710, 2004):
TM — score
where LN is the length of the native structure, LT is the length of the aligned residues to the template structure, d, is the distance between the 7th pair of aligned residues and do is a scale to normalize the match difference. ‘Max’ denotes the maximum value after optimal spatial superposition.
For the purposes of the present invention, LN is always the length of the reference protein, indicating the use of a fixed reference length L to prevent artificially large TM-scores from alignment of substructures:
A structural alignment of the three-dimensional structure of two polypeptides is necessary before the TM-score can be calculated. This is achieved via algorithms that optimize the structural overlap, and several methods are available, such as CEalign (Shindyalov and Bourne, Protein
Eng., 11 , 739-747, 1998), DALI (Holm and Sander, Trends Biochem. Sci., 20, 478-480, 1995), or TM-align (Nucleic Acids Res. 33:2302-2309, 2005).
For the purposes of the present invention, TM-align is applied. For convenience, TM-score is integrated in the TM-align software, which is available from the author’s website. The version of TM-align is preferably updated 2019-08-22 or later, and the TM-score between a reference and query protein is determined by running this command:
TMalign <query.pdb> reference. pdb> -L <length of reference>
Where <query.pdb> is the name of the PDB file containing coordinates of the query polypeptide, reference. pdb> is the name of the PDB file containing coordinates of the reference polypeptide. The TM-score is calculated and reported in the output, along with several other parameters from the alignment.
Compositions
In a 3rd aspect, the present invention also relates to a non-naturally occuring composition comprising (i) the Cas nuclease of the 1st aspect, or the fusion polypeptide of the 2nd aspect, and/or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease of the 1st aspect or the fusion polypeptide the 2nd aspect.
In one embodiment, the nucleic acid molecule is a chemically modified nucleic acid molecule.
In one embodiment, the nucleic acid molecule is DNA.
In one embodiment, the nucleic acid molecule is RNA.
In one embodiment, the RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
In one embodiment, the ORF consists of nucleosides selected from adenosine, a modified adenosine, uridine, a modified uridine, guanosine, a modified guanosine, cytidine, and a modified cytidine.
In one embodiment, the ORF consists of nucleosides selected from adenosine, uridine, a modified uridine, guanosine, and cytidine.
In one embodiment, the nucleic acid molecule is linear.
In one embodiment, the nucleic acid molecule is circular.
In one embodiment, the composition is further comprising one or more RNA molecules, or a DNA polynucleotide encoding one or more of the one or more RNA molecules, wherein the one or more RNA molecules and the Cas nuclease or fusion polypeptide do not naturally occur together, and the one or more RNA molecules are configured to form a complex with the Cas nuclease or fusion polypeptide and/or target the complex to a target site.
In one embodiment, the one or more RNA molecule comprises a guide RNA (gRNA), which gRNA is comprising a CRISPR RNA (crRNA) and a trans activating RNA (tracrRNA).
In one embodiment, the one or more RNA molecule is a single-molecule RNA (sgRNA), e.g., wherein the crRNA and the tracrRNA are part of the same RNA molecule.
In another embodiment, the one or more RNA molecule is a dual-molecule RNA, e.g., wherein the crRNA and the tracrRNA are separate RNA molecules.
In one embodiment, the composition is further comprising a donor template for homology directed repair (HDR).
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, or any of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of any one of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at
least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 73.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 92.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 91.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 100.
In one embodiment, the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 81.
In one embodiment, the one or more RNA molecule comprises a trans activating RNA (tracrRNA) encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 157 - 208.
In one embodiment, the one or more RNA molecule comprises a trans activating RNA (tracrRNA) encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ NO: 157, SEQ ID NO: 177, SEQ ID NO: 196, SEQ ID NO: 195, SEQ ID NO: 204, or SEQ ID NO: 185.
In one embodiment, at least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%,
at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID Nos: 209 - 260.
In one embodiment, at least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NO: 209, SEQ ID NO: 229, SEQ ID NO: 248, SEQ ID NO: 247, SEQ ID NO: 256, or SEQ ID NO: 237.
In one embodiment, at least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 261 - 312.
In one embodiment, at least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NO: 261 , SEQ ID NO: 281 , SEQ ID NO: 300, SEQ ID NO: 299, SEQ ID NO: 308, or SEQ ID NO: 289.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule is a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of column 4 in Table 4, e.g. any one of SEQ ID NOs: 261 - 312.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1 , and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at
least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 209.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21 , and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 229.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 248.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 247.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at
least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 256.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 29, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 237.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any of the polynucleotide sequences of column 2 in Table 4, e.g., any one of SEQ ID NOs: 209 - 260.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1 , and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 157.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21 , and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%,
at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 177.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 196.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 195.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 204.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 29, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 185.
In one embodiment, the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any of the polynucleotide sequences of column 3 in Table 4, e.g., to any one of SEQ ID NOs: 157 - 208.
In one embodiment, the composition further comprises a base editor enzyme.
In one embodiment, the base editor enzyme is an adenosine deaminase or a cytidine deaminase.
In one embodiment, the composition further comprises a reverse transcriptase enzyme.
Table 4 discloses which crRNA and tracrRNA coding sequences are associated with each of novel Cas nuclease. For example, the Cas nuclease 0076 with SEQ ID NO: 1 utilizes the crRNA sequence encoded by SEQ ID NO: 209, and the tracrRNA sequence encoded by SEQ ID NO: 157. Additionally, and/or alternatively, the Cas nuclease with SEQ ID NO: 1 utilizes a gRNA or sgRNA sequence encoded by SEQ ID NO: 261 , comprising both the crRNA sequence encoded by SEQ ID NO: 209 and the tracrRNA sequence encoded by SEQ ID NO: 157.
Methods for modifying a DNA target site
In a 4th aspect, the present invention also relates to a method of modifying a nucleotide sequence at a DNA target site in the genome of a cell, comprising introducing into the cell the Cas nuclease of the 1st aspect or the fusion polypeptide of the 2nd aspect, a polynucleotide encoding the Cas nuclease of the 1st aspect or the fusion polypeptide of the 2nd aspect, and/or the composition of the 3rd aspect.
In one embodiment, the method comprises introducing a DNA-break at the DNA target site.
In one embodiment, the DNA-break is a single-strand break.
In one embodiment, the DNA-break is a double-strand break.
In one embodiment, the method is carried out under conditions that are permissive for non-homologous end joining (NHEJ), and homology-directed repair (HDR).
In one embodiment, the method is carried out under conditions that are permissive for non-homologous end joining (NHEJ).
In one embodiment, the method is carried out under conditions that are permissive for homology-directed repair (HDR).
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a PAM sequence, e.g., adjacent to the PAM sequence “nnAY”, or adjacent to any one of the PAM sequences mentioned in Table 1.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnGHMA”, e.g., “nnGHCA”, “nnGGCA”, or “nnGCMA”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand, e.g. Aspergillus DNA strand, adjacent to the PAM sequence “nnGTA”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnAMA”, e.g., “nnAAA”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to the PAM sequence “nnRHRD”, e.g., “nnACAG”, “nnACAR”, “nnATGT”, or “nnAYRD”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand, e.g. Bacillus subtilis DNA, adjacent to the PAM sequence “ATGTCA”, “CCATA”, “TTACA”, or “TTACAA”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in an Aspergillus DNA strand adjacent to the PAM sequence “nnGHCA”, “nnGTA”, or “nnACAG”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a Bacillus DNA strand adjacent to the PAM sequence “nnGHMA”, “nnGGCA”, “nnACAR”, “nnATGT”, or “nnAMA”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a E. coli DNA strand adjacent to the PAM sequence “nnGCMA”, “nnAAA”, or “nnAYRD”.
In one embodiment, the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a sequence that is complementary to the PAM sequence.
In one embodiment, the target site is within a coding region of a protein.
In one embodiment, the target site is within a non-coding region of a protein.
In one embodiment, the target site is within a regulatory region of a protein, e.g., a promoter.
In one embodiment, the cell is a eukaryotic cell.
In one embodiment, the cell is a prokaryotic cell.
In one embodiment, the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
In a preferred embodiment, the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
In one embodiment, the fungal cell is a Pichia cell, e.g., a Pichia pastoris cell.
In one embodiment, the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
In one embodiment, the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pan noci nta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus,
Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
In one embodiment, the cell is a Trichoderma cell.
In one embodiment, the cell is a Trichoderma reesei cell.
In one embodiment, the cell is an Aspergillus cell.
In one embodiment, the cell is an Aspergillus niger cell.
In one embodiment, the cell is an Aspergillus oryzae cell.
In one embodiment, the cell is a plant cell.
In one embodiment, the plant cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
In a preferred embodiment, the cell is a prokaryotic cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Lacticaseibacillus casei, Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus helveticus, Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
In one embodiment, the cell is a Bacillus cell.
In one embodiment, the cell is a Bacillus subtilis cell.
In one embodiment, the cell is a Bacillus licheniformis cell.
In one embodiment, the cell is a Lacticaseibacillus paracesei cell.
In one embodiment, the cell is a Streptococcus thermophilus cell.
In one embodiment, the cell is is a E. coli cell.
DNA Repair by non-homoloqous end joining
Upon target recognition, Cas nucleases induce double-strand breaks in the target sequence, which when repaired by non-homologous end joining (NHEJ) can result in frameshift mutations and gene knockdown. The frameshift mutation caused by error-prone NHEJ may include nucleotide insertions or deletions (indels). Alternatively, homology-directed repair (HDR) at the double-strand break site can allow insertion of the desired sequence.
DNA Repair by Homologous Recombination
The term "homology-directed repair" or "HDR" refers to a mechanism for repairing DNA damage in cells, for example, during repair of double-stranded and single- stranded breaks in DNA. HDR requires nucleotide sequence homology and uses a "nucleic acid template" (nucleic acid template or donor template used interchangeably herein) to repair the sequence where the doublestranded or single break occurred (e.g., DNA target sequence). This results in the transfer of genetic information from, for example, the nucleic acid template to the DNA target sequence. HDR may result in alteration of the DNA target sequence (e.g., insertion, deletion, mutation) if the nucleic acid template sequence differs from the DNA target sequence and part or all of the nucleic acid template polynucleotide or oligonucleotide is incorporated into the DNA target sequence. In some embodiments, an entire nucleic acid template polynucleotide, a portion of the nucleic acid template polynucleotide, or a copy of the nucleic acid template is integrated at the site of the DNA target sequence.
The terms "nucleic acid template" and “donor”, refer to a nucleotide sequence that is inserted or copied into a genome. The nucleic acid template comprises a nucleotide sequence, e.g., of one or more nucleotides, that will be added to or will template a change in the target nucleic acid or may be used to modify the target sequence. A nucleic acid template sequence may be of any length, for example between 2 and 10,000 nucleotides in length (or any integer value there between or there above), preferably between about 100 and 1 ,000 nucleotides in length (or any integer there between), more preferably between about 200 and 500 nucleotides in length. A nucleic acid template may be a single-stranded nucleic acid, a double-stranded nucleic acid. In some embodiment, the nucleic acid template comprises a nucleotide sequence, e.g., of one or more nucleotides, that corresponds to wild type sequence of the target nucleic acid, e.g., of the target position. In some embodiment, the nucleic acid template comprises a ribonucleotide sequence, e.g., of one or more ribonucleotides, that corresponds to wild type
sequence of the target nucleic acid, e.g., of the target position. In some embodiment, the nucleic acid template comprises modified ribonucleotides.
Insertion of an exogenous sequence (also called a "donor sequence," donor template” or "donor"), for example, for correction of a mutant gene or for increased expression of a wild-type gene can also be carried out. It will be readily apparent that the donor sequence is typically not identical to the genomic sequence where it is placed. A donor sequence can contain a non- homologous sequence flanked by two regions of homology to allow for efficient HDR at the location of interest. Additionally, donor sequences can comprise a vector molecule containing sequences that are not homologous to the region of interest in cellular chromatin. A donor molecule can contain several, discontinuous regions of homology to cellular chromatin. For example, for targeted insertion of sequences not normally present in a region of interest, said sequences can be present in a donor nucleic acid molecule and flanked by regions of homology to sequence in the region of interest.
The donor polynucleotide can be DNA or RNA, single-stranded and/or doublestranded and can be introduced into a cell in linear or circular form. See, e.g., U.S. Patent Publication Nos. 2010/0047805; 2011/0281361 ; 2011/0207221 ; and 2019/0330620. If introduced in linear form, the ends of the donor sequence can be protected (e.g., from exonucleolytic degradation) by methods known to those of skill in the art. For example, one or more dideoxynucleotide residues are added to the 3' terminus of a linear molecule and/or self- complementary oligonucleotides are ligated to one or both ends. See, for example, Chang and Wilson, Proc. Natl. Acad. Sci. USA (1987); Nehls et al., Science (1996). Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues.
Accordingly, embodiments of the present invention using a donor template for repair may use a DNA or RNA, single-stranded and/or double-stranded donor template that can be introduced into a cell in linear or circular form.
In embodiments of the present invention a gene-editing composition comprises: (1) an RNA molecule comprising a guide sequence to affect a double strand break in a gene prior to repair and (2) a donor RNA template for repair, the RNA molecule comprising the guide sequence is a first RNA molecule and the donor RNA template is a second RNA molecule. In some embodiments, the guide RNA molecule and template RNA molecule are connected as part of a single molecule.
A donor sequence may also be an oligonucleotide and be used for gene correction or targeted alteration of an endogenous sequence. The oligonucleotide may be introduced to the cell on a vector, may be electroporated into the cell, or may be introduced via other methods known in the art. The oligonucleotide can be used to correct a mutated sequence in an
endogenous gene (e.g., the sickle mutation in beta globin), or may be used to insert sequences with a desired purpose into an endogenous locus.
A polynucleotide can be introduced into a cell as part of a vector molecule having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance. Moreover, donor polynucleotides can be introduced as naked nucleic acid, as nucleic acid complexed with an agent such as a liposome or poloxamer, or can be delivered by recombinant viruses (e.g., adenovirus, AAV, herpesvirus, retrovirus, lentivirus and integrase defective lentivirus (IDLV)).
The donor is generally inserted so that its expression is driven by the endogenous promoter at the integration site, namely the promoter that drives expression of the endogenous gene into which the donor is inserted. However, it will be apparent that the donor may comprise a promoter and/or enhancer, for example a constitutive promoter or an inducible or tissue specific promoter.
The donor molecule may be inserted into an endogenous gene such that all, some or none of the endogenous gene is expressed. For example, a transgene as described herein may be inserted into an endogenous locus such that some (N-terminal and/or C-terminal to the transgene) or none of the endogenous sequences are expressed, for example as a fusion with the transgene. In other embodiments, the transgene (e.g., with or without additional coding sequences such as forthe endogenous gene) is integrated into any endogenous locus, for example a safe-harbor locus, for example a CCR5 gene, a CXCR4 gene, a PPPIR12c (also known as AAVS1) gene, an albumin gene or a Rosa gene. See, e.g., U.S. Patent Nos. 7,951 ,925 and 8,110,379; U.S. Publication Nos. 2008/0159996; 20100/0218264; 2010/0291048; 2012/0017290; 2011/0265198; 2013/0137104; 2013/0122591 ; 2013/0177983 and
2013/0177960 and U.S. Provisional Application No. 61/823,689).
When endogenous sequences (endogenous or part of the transgene) are expressed with the transgene, the endogenous sequences may be full-length sequences (wild-type or mutant) or partial sequences. Preferably the endogenous sequences are functional. Non-limiting examples of the function of these full length or partial sequences include increasing the serum half-life of the polypeptide expressed by the transgene (e.g., therapeutic gene) and/or acting as a carrier.
Furthermore, although not required for expression, exogenous sequences may also include transcriptional or translational regulatory sequences, for example, promoters, enhancers, insulators, internal ribosome entry sites, sequences encoding 2A peptides and/or polyadenylation signals.
In certain embodiments, the donor molecule comprises a sequence selected from the group consisting of a gene encoding a protein (e.g., a coding sequence encoding a protein that is lacking in the cell or in the individual or an alternate version of a gene encoding a protein), a
regulatory sequence and/or a sequence that encodes a structural nucleic acid such as a microRNA or siRNA.
Polynucleotides encoding Cas nucleases
In a 5th aspect, the present invention also relates to polynucleotides encoding the Cas nuclease of the 1st aspect, and/or the fusion polypeptide of the 2nd aspect.
In one embodiment, the polynucleotide comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, or any of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512- 520,528, 549 or 550.
In one embodiment, the polynucleotide comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81.
In one embodiment, the polynucleotide is a chemically modified nucleic acid molecule.
In one embodiment, the polynucleotide is DNA.
In one embodiment, the polynucleotide is RNA.
In one embodiment, the RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
In one embodiment, the ORF consists of nucleosides selected from adenosine, a modified adenosine, uridine, a modified uridine, guanosine, a modified guanosine, cytidine, and a modified cytidine.
In one embodiment, the ORF consists of nucleosides selected from adenosine, uridine, a modified uridine, guanosine, and cytidine.
In one embodiment, the polynucleotide is linear.
In one embodiment, the polynucleotide is circular.
In one embodiment, the poly-A sequence comprises non-adenine nucleotides.
In one embodiment, the poly-A sequence comprises 100-400 nucleotides.
In one embodiment, the polynucleotide is operably linked to one or more heterologous control sequence.
In one embodiment, the heterologous control sequence is a heterologous promoter.
In one embodiment, the polynucleotide is isolated.
In one embodiment, the polynucleotide is purified.
The polynucleotide may be a genomic DNA, a cDNA, a synthetic DNA, a synthetic RNA, a mRNA, or a combination thereof. The polynucleotide may be cloned from a strain of Lactobacillus, Bacillus, Turicibacter, Ureibacillus, Lentihominibacter, Clostridia, Ruminococcus, Alicyclobacillus, Enterococcus, Companilacbtobacillus, Bombilactobacillus, Vagococcus, or a related organism and thus, for example, may be a polynucleotide sequence encoding a variant of the Cas nuclease of the invention.
In one embodiment, the polynucleotide is obtained from a Streptococcus cell. In one embodiment, the Streptococcus cell is a Streptococcus equinus, Streptococcus mutans, Streptococcus sp., Streptococcus henryi DSM 19005, Streptococcus sp. CCH8-G7, Streptococcus pacificus, Streptococcus orisratti DSM 15617, Streptococcus salivarius, or Streptococcus ruminantium cell.
In one embodiment, the polynucleotide is obtained from a Bacillus cell, e.g., a Bacillus sp- 63030 cell.
In one embodiment, the polynucleotide is obtained from a Turicibacter cell, e.g., a Turicibacter sp. cell.
In one embodiment, the polynucleotide is obtained from a Ureibacillus cell, e.g., a Ureibacillus thermosphaericus cell.
In one embodiment, the polynucleotide is obtained from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell.
In one embodiment, the polynucleotide is obtained from a Clostridia cell.
In one embodiment, the polynucleotide is obtained from a Ruminococcus cell, e.g., a Ruminococcus sp. cell.
In one embodiment, the polynucleotide is obtained from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell.
In one embodiment, the polynucleotide is obtained from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell.
In one embodiment, the polynucleotide is obtained from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell.
In one embodiment, the polynucleotide is obtained from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell.
In one embodiment, the polynucleotide is obtained from a Vagococcus cell, e.g., a Vagococcus penaei cell.
In one embodiment, the polynucleotide is obtained from a Lactobacillus cell. In one embodiment, the Lactobacillus cell is a Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
In a preferred embodiment the polynucleotide encoding the Cas nuclease is isolated from a Lactobacillus cell.
In an embodiment, the polynucleotide is a subsequence encoding a fragment having Cas nuclease activity and/or DNA binding activity of the present invention.
The polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence. For a general description of nucleotide substitution, see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
Nucleic Acid Constructs
In a 6th aspect, the present invention relates to a nucleic acid construct or expression vector comprising the polynucleotide according to the 5th aspect of the invention, operably linked to one or more control sequences that direct the production of the nuclease or fusion polypeptide in a cell.
The present invention also relates to nucleic acid constructs or expression vectors comprising a polynucleotide of the present invention, wherein the polynucleotide is operably linked to one or more control sequences that direct the expression of the coding sequence in a suitable host cell under conditions compatible with the control sequences.
The polynucleotide may be manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the polynucleotide prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying polynucleotides utilizing recombinant DNA methods are well known in the art.
Promoters
The control sequence may be a promoter, a polynucleotide that is recognized by a host cell for expression of a polynucleotide encoding the Cas nuclease. The promoter contains transcriptional control sequences that mediate the expression of the Cas nuclease. The promoter
may be any polynucleotide that shows transcriptional activity in the host cell including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
Examples of suitable promoters for directing transcription of the polynucleotide of the present invention in a bacterial host cell are described in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Lab., NY, Davis et al., 2012, supra, and Song et a!., 2016, PLOS One 11(7): e0158447.
Examples of suitable promoters for directing transcription of the polynucleotide of the present invention in a filamentous fungal host cell are promoters obtained from Aspergillus, Fusarium, Rhizomucor and Trichoderma cells, such as the promoters described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
For expression in a yeast host, examples of useful promoters are described by Smolke et al., 2018, “Synthetic Biology: Parts, Devices and Applications” (Chapter 6: Constitutive and Regulated Promoters in Yeast: How to Design and Make Use of Promoters in S. cerevisiae), and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
Terminators
The control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription. The terminator is operably linked to the 3’terminus of the polynucleotide encoding the Cas nuclease. Any terminator that is functional in the host cell may be used in the present invention.
Preferred terminators for bacterial host cells may be obtained from the genes for Bacillus clausii alkaline protease (aprH), Bacillus licheniformis alpha-amylase (amyL), and Escherichia coli (E. coli) ribosomal RNA (rrnB).
Preferred terminators for filamentous fungal host cells may be obtained from Aspergillus or Trichoderma species, such as obtained from the genes for Aspergillus niger glucoamylase, Trichoderma reesei beta-glucosidase, Trichoderma reesei cellobiohydrolase I, and Trichoderma reesei endoglucanase I, such as the terminators described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
Preferred terminators for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), and Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase. Other useful terminators for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423-488.
mRNA Stabilizers
The control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene encoding the Cas nuclease.
Examples of suitable mRNA stabilizer regions are obtained from a Bacillus thuringiensis crylllA gene (WO 94/25612) and a Bacillus subtilis SP82 gene (Hue et al., 1995, J. Bacterid. 177: 3465-3471).
Examples of mRNA stabilizer regions for fungal cells are described in Geisberg et al., 2014, Cell 156(4): 812-824, and in Morozov et al., 2006, Eukaryotic Ce// 5(11): 1838-1846.
Leader Sequences
The control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell. The leader is operably linked to the 5’terminus of the polynucleotide encoding the Cas nuclease. Any leader that is functional in the host cell may be used.
Suitable leaders for bacterial host cells are described by Hambraeus et al., 2000, Microbiology 146(12): 3051-3059, and by Kaberdin and Blasi, 2006, FEMS Microbiol. Rev. 30(6): 967-979.
Preferred leaders for filamentous fungal host cells may be obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.
Suitable leaders for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP).
Polyadenylation Sequences
The control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3’terminus of the polynucleotide encoding the Cas nuclease which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.
Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha-glucosidase, Aspergillus oryzae TAKA amylase, and Fusarium oxysporum trypsin-like protease.
Useful polyadenylation sequences for yeast host cells are described by Guo and Sherman, 1995, Mol. Cellular Biol. 15: 5983-5990.
Signal Peptides
The control sequence may also be a signal peptide coding region that encodes a signal peptide linked to the Nterminus of the Cas nuclease and directs the Cas nuclease into the cell’s secretory pathway. The 5’end of the coding sequence of the polynucleotide may inherently contain a signal peptide coding sequence naturally linked in translation reading frame with the segment of the coding sequence that encodes the polypeptide. Alternatively, the 5’end of the coding sequence may contain a signal peptide coding sequence that is heterologous to the coding sequence. A heterologous signal peptide coding sequence may be required where the coding sequence does not naturally contain a signal peptide coding sequence. Alternatively, a heterologous signal peptide coding sequence may simply replace the natural signal peptide coding sequence to enhance secretion of the polypeptide. Any signal peptide coding sequence that directs the expressed polypeptide into the secretory pathway of a host cell may be used.
Effective signal peptide coding sequences for bacterial host cells are the signal peptide coding sequences obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis beta-lactamase, Bacillus stearothermophilus alphaamylase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus subtilis prsA. Further signal peptides are described by Freudl, 2018, Microbial Cell Factories 17: 52.
Effective signal peptide coding sequences for filamentous fungal host cells are the signal peptide coding sequences obtained from the genes for Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Aspergillus oryzae TAKA amylase, Humicola insolens cellulase, Humicola insolens endoglucanase V, Humicola lanuginosa lipase, and Rhizomucor miehei aspartic proteinase, such as the signal peptide described by Xu et al., 2018, Biotechnology Letters 40: 949-955
Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding sequences are described by Romanos et al., 1992, supra.
Propeptides
The control sequence may also be a propeptide coding sequence that encodes a propeptide positioned at the N-terminus of the Cas nuclease. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to an active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding sequence may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Myceliophthora thermophila laccase (WO 95/33836), Rhizomucor miehei aspartic proteinase, and Saccharomyces cerevisiae alpha-factor.
Where both signal peptide and propeptide sequences are present, the propeptide sequence is positioned next to the N-terminus of a polypeptide and the signal peptide sequence
is positioned next to the N-terminus of the propeptide sequence. Additionally or alternatively, when both signal peptide and propeptide sequences are present, the polypeptide may comprise only a part of the signal peptide sequence and/or only a part of the propeptide sequence. Alternatively, the final or isolated polypeptide may comprise a mixture of mature polypeptides and polypeptides which comprise, either partly or in full length, a propeptide sequence and/or a signal peptide sequence.
Regulatory Sequences
It may also be desirable to add regulatory sequences that regulate expression of the Cas nuclease relative to the growth of the host cell. Examples of regulatory sequences are those that cause expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. Regulatory sequences in prokaryotic systems include the lac, tac, and trp operator systems. In yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the Aspergillus niger glucoamylase promoter, Aspergillus oryzae TAKA alpha-amylase promoter, and Aspergillus oryzae glucoamylase promoter, Trichoderma reesei cellobiohydrolase I promoter, and Trichoderma reesei cellobiohydrolase II promoter may be used. Other examples of regulatory sequences are those that allow for gene amplification. In fungal systems, these regulatory sequences include the dihydrofolate reductase gene that is amplified in the presence of methotrexate, and the metallothionein genes that are amplified with heavy metals.
Expression Vectors
The present invention also relates to recombinant expression vectors comprising a polynucleotide of the present invention, a promoter, and transcriptional and translational stop signals. The various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide encoding the Cas nuclease at such sites. Alternatively, the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
The recombinant expression vector may be any vector (e.g., a plasmid or virus) that can be conveniently subjected to recombinant DNA procedures and can bring about expression of the polynucleotide. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vector may be a linear or closed circular plasmid.
The vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g.,
a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one that, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. Furthermore, a single vector or plasmid or two or more vectors or plasmids that together contain the total DNA to be introduced into the genome of the host cell, or a transposon, may be used.
The vector preferably contains one or more selectable markers that permit easy selection of transformed, transfected, transduced, or the like cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
The vector preferably contains at least one element that permits integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
For integration into the host cell genome, the vector may rely on the polynucleotide’s sequence encoding the polypeptide or any other element of the vector for integration into the genome by homologous recombination, such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question. The origin of replication may be any plasmid replicator mediating autonomous replication that functions in a cell. The term “origin of replication” or “plasmid replicator” means a polynucleotide that enables a plasmid or vector to replicate in vivo.
More than one copy of a polynucleotide of the present invention may be inserted into a host cell to increase production of a Cas nuclease. For example, 2 or 3 or 4 or 5 or more copies are inserted into a host cell. An increase in the copy number of the polynucleotide can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the polynucleotide where cells containing amplified copies of the selectable marker gene, and thereby additional copies of the polynucleotide, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
Host Cells
In a 7th aspect the invention relates to cells comprising the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the polynucleotide of the 5th aspect, and/or the nucleic acid construct or expression vector of the 6th aspect.
In one embodiment, the cell is a recombinant cell.
In a preferred embodiment, the Cas nuclease is heterologous to the cell.
In one embodiment, the cell comprises at least two copies, e.g., three, four, or five or more copies of the polynucleotide of the 5th aspect or the vector or construct of the 6th aspect.
In one embodiment, the genome of the cell comprises a polynucleotide encoding the Cas nuclease of the 1st aspect or fusion polypeptide of the 2nd aspect, a polynucleotide of the 5th aspect, or a nucleic acid construct or expression vector of the 6th aspect.
In one embodiment, the genome of the recombinant cell comprises at least two copies, e.g., three, four, or five, or more copies of a polynucleotide encoding the Cas nuclease of the 1st aspect or fusion polypeptide of the 2nd aspect, of a polynucleotide of the 5th aspect, or of a nucleic acid construct or expression vector of the 6th aspect.
In an 8th aspect the invention relates to cells comprising a genome which was modified by the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, and/or the nucleic acid construct or expression vector of the 6th aspect.
In one embodiment, the cell is a recombinant cell.
In one embodiment, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a non-human mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
In one embodiment, the cell is a eukaryotic cell.
In one embodiment, the cell is a prokaryotic cell.
In one embodiment, the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
In one embodiment, the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
In one embodiment the cell is a Pichia pastoris cell.
In one embodiment the cell is a Saccharomyces cerevisiae cell.
In one embodiment, the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
In one embodiment, the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix,
Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pan noci nta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
In one embodiment, the cell is a Trichoderma cell.
In one embodiment, the cell is a Trichoderma reesei cell.
In one embodiment, the cell is an Aspergillus cell.
In one embodiment, the cell is an Aspergillus niger cell.
In one embodiment, the cell is an Aspergillus oryzae cell.
In one embodiment, the cell is a plant cell.
In one embodiment, the cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
In one embodiment, the cell is a prokaryotic cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Lacticaseibacillus casei, Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus helveticus,
Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
In a preferred embodiment, the cell is a Bacillus cell.
In one embodiment, the cell is a Bacillus subtilis cell.
In one embodiment, the cell is a Bacillus licheniformis cell.
In one embodiment the cell is a Lacticaseibacillus paracesei cell.
In one embodiment the cell is a Streptococcus thermophilus cell.
In one embodiment the cell is a E. coli cell.
In one embodiment, the cell is isolated.
In one embodiment, the cell is purified.
The present invention also relates to recombinant host cells, comprising a polynucleotide of the present invention operably linked to one or more control sequences that direct the production of the Cas nuclease.
A construct or vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self-replicating extra- chromosomal vector as described earlier. The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source. The Cas nuclease can be native or heterologous to the recombinant host cell. Also, at least one of the one or more control sequences can be heterologous to the polynucleotide encoding the Cas nuclease. The recombinant host cell may comprise a single copy, or at least two copies, e.g., three, four, five, or more copies of the polynucleotide of the present invention.
For purposes of this invention, Bacillus classes/genera/species shall be defined as described in Patel and Gupta, 2020, Int. J. Syst. Evol. Microbiol. 70: 406-438.
The bacterial host cell may also be any Streptococcus cell including, but not limited to, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus cells.
The bacterial host cell may also be any Streptomyces cell including, but not limited to, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
Methods for introducing DNA into prokaryotic host cells are well-known in the art, and any suitable method can be used including but not limited to protoplast transformation, competent cell transformation, electroporation, conjugation, transduction, with DNA introduced as linearized or as circular polynucleotide. Persons skilled in the art will be readily able to identify a suitable
method for introducing DNA into a given prokaryotic cell depending, e.g., on the genus. Methods for introducing DNA into prokaryotic host cells are for example described in Heinze et al., 2018, BMC Microbiology 18:56, Burke et al., 2001 , Proc. Natl. Acad. Sci. USA 98: 6289-6294, Choi et al., 2006, J. Microbiol. Methods Q4 391-397, and Donald et al., 2013, J. Bacteriol. 195(11): 2612- 2620.
The host cell may be a fungal cell. “Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota as well as the Oomycota and all mitosporic fungi (as defined by Hawksworth et al., In, Ainsworth and Bisby’s Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK).
Fungal cells may be transformed by a process involving protoplast-mediated transformation, Agrobacterium-mediated transformation, electroporation, biolistic method and shock-wave-mediated transformation as reviewed by Li et al., 2017, Microbial Cell Factories 16: 168 and procedures described in EP 238023, Yelton et al., 1984, Proc. Natl. Acad. Sci. USA 81 : 1470-1474, Christensen etal., 1988, Bio/TechnologyQ: 1419-1422, and Lubertozzi and Keasling, 2009, Biotechn. Advances 27: 53-75. However, any method known in the art for introducing DNA into a fungal host cell can be used, and the DNA can be introduced as linearized or as circular polynucleotide.
The fungal host cell may be a yeast cell. “Yeast” as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). For purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, Passmore, and Davenport, editors, Soc. App. Bacteriol. Symposium Series No. 9, 1980).
In a preferred embodiment, the yeast host cell is a Pichia or Komagataella cell, e.g., a Pichia pastoris cell (Komagataella phaffii).
The fungal host cell may be a filamentous fungal cell. “Filamentous fungi” include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 1995, supra). The filamentous fungi are generally characterized by a mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative.
The filamentous fungal host cell may be an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell.
In a preferred embodiment, the filamentous fungal host cell is an Aspergillus, Trichoderma or Fusarium cell.
In a further preferred embodiment, the filamentous fungal host cell is an Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, or Fusarium venenatum cell.
In an 8th aspect the invention relates to plant cells comprising the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the polynucleotide of the 5th aspect, and/or the nucleic acid construct or expression vector of the 6th aspect.
In one embodiment, the plant cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
Methods of Production
In a 9th aspect, the present invention also relates to methods of producing a Cas nuclease of the the 1st aspect, or a fusion polypeptide of the 2nd aspect, comprising cultivating the host cell of the 7th aspect under conditions conducive for production of the Cas nuclease or fusion polypeptide; and optionally, (b) recovering the Cas nuclease and/or the fusion polypeptide.
In one aspect, the cell is a Bacillus cell. In another aspect, the cell is a Bacillus subtilis cell. In another aspect, the cell is a Bacillus licheniformis cell.
In one aspect, the cell is an Aspergillus cell. In another aspect, the cell is an Aspergillus niger cell. In another aspect, the cell is an Aspergillus oryzae cell.
In one aspect, the cell is a Trichoderma cell. In another aspect, the cell is a Trichoderma reesei cell.
In one embodiment the cell is a Pichia pastoris cell.
In one embodiment the cell is a Saccharomyces cerevisiae cell.
In one embodiment the cell is a Lacticaseibacillus paracesei cell.
In one embodiment the cell is a Streptococcus thermophilus cell.
In one embodiment the cell is a E. coli cell.
The host cell is cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art. For example, the cell may be cultivated by shake flask cultivation, or small-scale or large-scale fermentation (including continuous, batch, fed-batch, or solid-state, and/or microcarrier-based fermentations) in laboratory or industrial fermentors in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated. Suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
The polypeptide may be detected using methods known in the art that are specific for the polypeptide, including, but not limited to, the use of specific antibodies, formation of an enzyme
product, disappearance of an enzyme substrate, or an assay determining the relative or specific activity of the polypeptide.
The polypeptide may be recovered from the medium using methods known in the art, including, but not limited to, collection, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In one aspect, a whole fermentation broth comprising the polypeptide is recovered. In another aspect, a cell-free fermentation broth comprising the polypeptide is recovered.
The polypeptide may be purified by a variety of procedures known in the art to obtain substantially pure polypeptides and/or polypeptide fragments (see, e.g., Wingfield, 2015, Current Protocols in Protein Science’, 80(1): 6.1.1-6.1.35; Labrou, 2014, Protein Downstream Processing, 1129: 3-10).
In an alternative aspect, the polypeptide is not recovered.
Use of the Cas nucleases
In a 10th aspect, the present invention relates to the use of the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, or the nucleic acid construct or expression vector of the 6th aspect for modifying a target sequence in a cell, e.g., a target gene.
In an 11th aspect, the present invention relates to the use of the Cas nuclease of the 1st aspect, the fusion polypeptide of the 2nd aspect, the composition of the 3rd aspect, the method of the 4th aspect, the polynucleotide of the 5th aspect, the nucleic acid construct or expression vector of the 6th aspect, the cell of the 7th aspect, or the cell of the 8th aspect for the manufacture of a medicament for modifying a target sequence in a cell, e.g., a target gene.
In one embodiment, the targeted cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, a non-human animal cell, an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a non-human mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell
Formulations
In a 12th aspect, the present invention also relates to a formulation comprising (i) the Cas nuclease according to the 1st aspect, the fusion polypeptide according to the 2nd aspect, a composition according to the 3rd aspect, the polynucleotide according to the 5th aspect, the nucleic acid construct or expression vector according to the 6th aspect, the cell according to the 7th aspect, or the cell according to the 8th aspect, and optionally, (ii) one or more of a lipid, a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle.
In one embodiment, the lipid is a lipid nanoparticle.
In one embodiment, the Cas nuclease or fusion polypeptide is in a lyophilized formulation.
In one embodiment, the Cas nuclease or fusion polypeptide is in a liquid formulation.
In one embodiment, the Cas nuclease or fusion polypeptide is in a substantially endotoxin- free formulation.
Delivery
The Cas nuclease or CRISPR compositions described herein may be delivered as a protein, DNA molecules, RNA molecules, Ribonucleoproteins (RNP), nucleic acid vectors, or any combination thereof. In some embodiments, the RNA molecule comprises a chemical modification. Non-limiting examples of suitable chemical modifications include 2'-0-methyl (M), 2'-0-methyl, 3'phosphorothioate (MS) or 2'-0-m ethyl, 3 'thioPACE (MSP), pseudouridine, and 1- methyl pseudo-uridine. Each possibility represents a separate embodiment of the present invention.
The Cas nucleases and/or polynucleotides encoding same described herein, and optionally additional proteins (e.g., ZFPs, TALENs, transcription factors, restriction enzymes) and/or nucleotide molecules such as guide RNA may be delivered to a target cell by any suitable means. The target cell may be any type of cell e.g., eukaryotic or prokaryotic, in any environment e.g., isolated or not, maintained in culture, in vitro, ex vivo, in vivo or in planta.
In some embodiments, the composition to be delivered includes mRNA of the nuclease and RNA of the guide. In some embodiments, the composition to be delivered includes mRNA of the nuclease, RNA of the guide and a donor template. In some embodiments, the composition to be delivered includes the Cas nuclease and guide RNA. In some embodiments, the composition to be delivered includes the Cas nuclease, guide RNA and a donor template for gene editing via, for example, homology directed repair (HDR). In some embodiments, the composition to be delivered includes mRNA of the nuclease, DNA-targeting RNA and the tracrRNA. In some embodiments, the composition to be delivered includes mRNA of the nuclease, DNA-targeting RNA and the tracrRNA and a donor template. In some embodiments, the composition to be delivered includes the Cas nuclease DNA-targeting RNA and the tracrRNA. In some embodiments, the composition to be delivered includes the Cas nuclease, DNA-targeting RNA and the tracrRNA and a donor template for gene editing via, for example, homology directed repair.
For the foregoing embodiments, each embodiment disclosed herein is contemplated as being applicable to each of the other disclosed embodiment. For example, it is understood that any of the RNA molecules or compositions of the present invention may be utilized in any of the methods of the present invention.
As used herein, all headings are simply for organization and are not intended to limit the disclosure in any manner. The content of any individual section may be equally applicable to all sections.
The present invention is further described by the following examples that should not be construed as limiting the scope of the invention.
Examples
Example 1 : Identification of novel Cas nucleases
The Cas nucleases with SEQ ID NOs: 1 - 52 have been identified by mining bacterial genomes using a bioinformatic pipeline developed by the inventors of the instant invention. As shown in Fig. 1 , the pipeline includes several modules that are customized for specific tasks, including the identification of Cas nuclease genes, CRISPR arrays, and tracrRNA, as well as matching and ranking of these features. To identify Cas nuclease genes, the pipeline employs state-of-the-art tools with a large suite of Hidden Markov Models (HMMs) and a scoring scheme to predict the Cas nuclease subtype. Additionally, new HMMs have been built on proprietary data. Identified Cas enzymes were filtered for the presence of conserved domains as well as essential catalytic residues. To identify CRISPR arrays, the pipeline uses a combination of searching for repetitive sequences and aligning them to known repeats. A filtering process was applied to exclude repeats that do not meet CRISPR-specific criteria. A kmer-based machine learning approach (extreme gradient boosting trees) was applied to predict the subtype of the CRISPR arrays based on the consensus repeat. To identify tracrRNAs, the pipeline used a combination of scanning for anti-repeat sequences using the CRISPR repeat consensus sequences of identified CRISPR arrays as queries and sequence-structure covariance models derived from aligning sequences to experimentally validated tracrRNA tail structures. The complementarity of the CRISPR direct repeats and the tracrRNA was evaluated via alignment using a custom scoring system.
Table 5 provides an overview of the sequences (SEQ ID NOs.) of the identified novel Cas nucleases, nuclease domains, tracrRNA coding sequences, and crRNA coding sequences.
Fig. 2 shows a phylogenetic tree of the Cas nucleases of the invention based on their amino acid sequences (SEQ ID NOs: 1-52). The full-length amino acid sequences were aligned using Clustal Omega and the tree was calculated with FastTree (using the Whelan-And-Goldman model). Scale bar shows 0.5 substitutions per site, indicating the evolutionary distance between the Cas nucleases. Most of the novel nucleases belong to Class 2 Type II Cas nucleases, including the nuclease of SEQ ID NO: 21 , and nuclease 0076 (SEQ ID NO: 1) and its close homologes (SEQ ID NO: 52, SEQ ID NO: 50, SEQ ID NO: 51 , and SEQ ID NO: 45, which all cluster very close to SEQ ID NO:1 as shown in Fig. 2).
As also shown in Fig. 2, the novel nucleases of SEQ ID NO: 39, SEQ ID NO: 40 and SEQ ID NO: 48 cluster closely together.
Example 2: CRISPR activity of nucleases 0076, 0100, 0105, and 0149 in Aspergillus niger
To confirm that the CRISPR nucleases 0076 (SEQ ID NO:1), 0100 (SEQ ID NO:21), 0105 (SEQ ID NO:39) and 0149 (SEQ ID NO: 48) can be used to specifically induce indels at a target region for targeted gene editing, the Aspergillus niger fwnA (wA) knockout/spore colour assay was used. Using this assay, plasmids with different nucleases and spacer sequences targeting fwnA (wA) were transformed into Aspergillus niger Mbinl 18 and screened for their effectiveness in inducing white phenotype colonies, which suggests the presence of insertions/deletions in fwnA (wA). As a result, one or more white phenotype transformants were found for all four nucleases showing that all four nucleases are capable of gene editing in this fungal host. Sequencing of the respective targeted protospacer regions showed evidence of indels, suggesting nuclease-induced fwnA (wA) knockout. These results also suggest that like Cas9, the PAMs for all nucleases are situated to the 3’ region of the protospacer. This study confirms that 076, 0100, 0105 and 0149 and their predicted gRNA scaffolds can be used for targeted gene editing in A. niger.
White spore assay
To validate the potential of the nucleases for targeted gene editing in Aspergillus niger, a fwnA (wA) knockout-induced spore colour change assay was used.
In A. niger, the knockout of Polyketide synthase fwnA (wA) leads to white/fawn spore colour phenotype, thought to be due to inactivation of PpfA-dependent lysine biosynthesis/siderophore biosynthesis (Jorgensen et al., 2015, Fungal Genetics and Biology, 48(5), pp.544-553). This this
spore colour assay can be used as an indicator of CRISPR nuclease activity. In the absence of repair DNA, DNA strand breaks in Eukaryotes generally cause indels due to error-prone NHEJ DNA repair. Consequently, CRISPR nuclease-induced DNA breaks of fwnA (wA) are expected to causes indels, gene knockout, and subsequently, white spore colour. Therefore, the appearance of white colonies following transformations with putative CRISPR systems targeting fwnA shows targeted nuclease activity.
Control plasmid and screening plasmid design
To ascertain that the system functions correctly, two controls were used for each nuclease screening: As a positive control (+ctrl), the S. pyogenes CRISPR/Cas9 system was used to demonstrate that fwnA (wA) knockout by indel induction leads to detectable colour change under the tested experimental conditions. As a negative control (-Ctrl), a plasmid containing the nucleases and spacerless-sgRNA was used to confirm that the system itself did not cause untargeted indels in fwnA (wA).
For the screening, 21bp spacer sequences with expected activity targeting A. niger fwnA were designed based on the hypothetical PAMs of the respective nucleases. Screening plasmids were designed to contain two expression cassettes to express the sgRNA (each targeting a different region of fwnA) and nuclease. sgRNAs were derived from concatenation of the spacer, direct repeat and tracrRNA, while the nuclease gene sequences were codon optimized for A. niger. Constitutive expression was used for all expression cassettes, apart from the nuclease of SEQ ID NO:1 (0076) and the nuclease of SEQ ID NO: 39 (0105), nuclease expression which utilized a Thiamine-inducible expression system for fine-tuning of expression level to prevent cytotoxicity (Shoji et al., 2005, FEMS microbiology letters, 244(1), pp.41-46). This system utilizes the transcriptionally/post-transcriptionally regulated promoter PthiA, which downregulates gene expression in the presence of thiamine.
Transformation of controls and screening plasmids
The control plasmids (+ctrl and -Ctrl) and screening plasmids were transformed into A. niger Mbin118 and a target of 12 random colonies were isolated for each transformation according to the methods documented in “Protoplast generation and transformation”. The ratios of white/black colonies for each transformation were then observed.
As expected, transformation of all -ctrls did not induce white spore phenotypes for all nucleases (Figures 3 to 6, -Ctrl). Only black colonies resulted from 0100 and 0149 transformations, while 0076 and 0105 transformations showed a general lack of growth on transformation plates. In contrast, the S. pyogenes Cas9 +ctrl system produced almost solely white spore colonies (Figures 3 to 6, +ctrl). The results from these controls confirmed the validity of this assay for verification of fwnA knockout.
For the main screening for nuclease 0100 (SEQ ID NO: 21), we initially conducted transformation and cultivation both at 30°C and 34°C to evaluate the effect of temperature on white colony ratio. Figure 3 shows the. colony number and ratio of white/black phenotypes obtained after 0100 transformations at 30°C (Fig. 3A) and at 34°C (Fig. 3B).
White colonies were seen for 10 of 12 spacers tested at both temperatures (Figures 3A and 3B). The ratio of white colonies ranged from 17-100% for 30°C and 42-100% for 34°C. As white spore ratios were found to be higher at 34°C for all positive 0100 transformations, going forward 34°C was used for transformation and cultivation of nucleases 0076 (SEQ ID NO:1), 0105 (SEQ ID NO: 39), and 0149 (SEQ ID NO:48).
Transformation of nuclease 076 with SEQ ID NO: 1 was done using a thiamine- regulatable system, with three different thiamine concentrations tested (0.02, 0.4, and 5 pM). Figure 4 shows colony number and ratio of white/black phenotypes obtained after 076 transformation with thiamine supplementation at concentrations of 0.02 pM (A), 0.4 pM (B) and 5pM (C).
White colonies were seen for the spacer 76-T2 (Table 7) when the agar was supplemented with 0.02 pM and 0.4 pM thiamine (Figures 4A and 4B). The ratio of white colonies was 9% and 8%, respectively (Fig. 4A and 4B). As expected, no white colonies were seen for all transformations with 5pM thiamine supplementation (Figure 4C), suggesting downregulated nuclease expression by PthiA repression.
Nuclease 0105 (SEQ ID NO: 39) was transformed using a similar thiamine-regulatable system. Figure 5 shows the colony number and ratio of white/black phenotypes obtained after 0105 transformations with thiamine supplementation at-a 0.02 pM (A), 0.4 pM (B) and 5pM (C). Here, white colonies were seen for the spacer 105-T4 (Table 6) at all thiamine concentrations (Figure 5). The ratio of white colonies was 92% at 0.02 pM and 0.4 pM thiamine, but this decreased to 67% at 5pM thiamine. Again, these results suggest downregulated nuclease expression when thiamine is supplemented at 5 pM.
For nuclease 0149 (SEQ ID NO: 48), transformation was done using a constitutive expression system. Figure 6 shows the-colony number and ratio of white/black phenotypes obtained after 0149 transformations. White colonies were seen for spacer 149-T4 (Table 6) (11 of 12 picked colonies, 92%), while the other 3 screening spacers produced only black colonies (Figure 6).
Altogether, the results from transformation of nucleases 0076, 0100, 0705 and 0149 show that indel activity at the target site was seen for all four nucleases.
Spore PCR/ Sanger sequencing of selected transformants
To further confirm that the white spore phenotype was a result of targeted indel formation in fwnA, spore PCR/sequencing of 1 , 6, 7 and 8 white colonies from the transformations of 0076, 0100, 0105 and 0149 were done according to the methods documented in “Spore PCR, DNA purification and Sanger sequencing”. Sequencing results showed that indels and deletions of various length were present in all sequenced samples (Figures 7-9) and that these indels were all situated in close proximity of the targeted protospacer regions. This shows that DNA strand breaks were induced in the target regions by the expressed nucleases, leading to insertions/deletions and subsequent fwnA (wA) knockout. As targeted indel activity was seen for all four nucleases, this data also suggests that their PAMs are situated 3’ relative to their protospacers, and that the putative PAM sequences disclosed herein can be used for targeted gene editing.
Using the A. niger fwnA (wA) knockout/spore colour assay, the four putative nucleases 0076, 0100, 0105 and 0149 were able to induce the white spore phenotype. Subsequent spore PCR and sequencing confirmed that the white spore colonies were a result of fwnA knockout caused by nuclease-induced indels at the target regions.
Figure 7 shows the sequenced protospacer region of white spore colonies after transformation by plasmids targeting 76-T2 with 0.02 pM thiamine supplementation (SEQ ID NO: 322) aligned to the wild type fwnA target sequence (SEQ ID NO: 321).
Figure 8 shows the sequenced protospacer regions of white spore colonies after transformation by plasmids targeting spacer 100-T2 (A) and spacer 100-T5 (B) aligned to the wild type fwnA target sequence (top, SEQ ID NO: 323 and SEQ ID NO: 328). Insertions / deletions in Fig. 8A are shown as SEQ ID NOs: 324-327. Insertions I deletions in Fig. 8B are shown as SEQ ID NOs: 329-330.
Figure 9 shows the sequenced protospacer regions of white spore colonies after transformation by plasmids targeting spacer 105-T4 with 0.02 pM thiamine supplementation aligned to the original fwnA target sequence (top, SEQ ID NO: 331). Insertions I deletions in Fig. 9 are shown as SEQ ID NOs: 331-338.
Figure 10 shows the sequenced protospacer regions of white spore colonies after transformation by plasmids targeting 149-T4 at 34C aligned to the original fwnA target sequence (top, SEQ ID NO: 339). Insertions / deletions in Fig. 10 are shown as SEQ ID NOs: 340-346. One sequence with a 346bp insertion was omitted from the diagram.
As can be seen in Figs. 7-10, each target locus was subject to either a (poly)nucleotide insertion or deletion.
Altogether, these results confirm that nucleases 0076, 0100, 0105 and 0149 and their corresponding direct repeats and gRNA scaffolds can be used for targeted gene editing in fungal hosts.
Expression plasmid construction
Positive control plasmid construction
An intermediate plasmid pHUda2351 containing S. pyogenes Cas9 (Figure 11A) was first digested by Pme\ and the linearized vector was recovered by silica column purification (Nucleobond Xtra Midi, Takara). To satisfy the PAM requirement of Cas9, a 20bp gRNA spacer sequence was designed to target fwnA (position 37-56bp, sense strand). To facilitate plasmid assembly, additional 25bp regions homologous with the plasmid backbone insert site were added at the 5’ and 3’ ends. The spacer sequences with flanks were synthesized using commercial oligonucleotide synthesis services. The spacers were then ligated to the linearized vector using HiFi DNA Assembly (New England Biolabs, USA) and designated pBKHMOOl 5-32 (Figure 11 B). Figure 11 shows Plasmid vectors used for S. pyogenes CRISPR/Cas9 positive control 1. (A) The intermediate plasmid vector pHUda2351 containing the S. pyogenes Cas9 system used for control plasmid construction and (B) positive control 1 containing Cas9 and spacers targeting regions of the spacer 32 or spacer 112. ampR, E. coli ampicillin resistance gene; AMA1 , A. nidulans AMA1 origin of replication; P. oryzae RNAPIII U6-2 promoter, Magnaporthe grisea RNAPIII U6-2 promoter; Af tRNA gly, A. fumigatus glycine tRNA; gRNA backbone, Cas9 scaffold sequence; Spacer 32 or 112, spacer targeting fwnA', P. oryzae U6-2 terminator (long), Magnaporthe grisea U6-2 long terminator; Ptef (nid), A. nidulans tef promoter; Cas9, S. pyogenes Cas9; Ttef(nid), A. nidulans tef terminator; Ptef 1 , A. niger tef1 promoter; NATr, Nourseothricin resistance gene; TniaD, niaD terminator.
Intermediate Plasmid (negative control plasmid) construction
For construction of the intermediate plasmid for screening (which also functions as the negative control plasmid), the plasmid vector pBKHMOOOl (Figure 12) was first digested with Asci and Sbf\, and the plasmid backbone fragment of length 11097bp length was recovered by gel extraction.
Figure 12 shows the plasmid vector pBKHMOOOl used as a backbone sequence. A. oryzae U6 promoter, A. oryzae U6 promoter; Af tRNA gly, A. fumigatus glycine tRNA; A. oryzae U6 term, A. oryzae U6 terminator; AnPtefl , A. niger tef1 promoter; CRISPR-Cas<t>, Cas<t> nuclease gene; Ttef(nid), A. nidulans tef1 terminator; Ptef 1 , A. niger tef1 promoter; NATr, Nourseothricin resistance gene; TniaD, niaD terminator; pUC, pUC plasmid backbone sequence; ampR, E. coli ampicillin resistance gene; AMA1 , A. nidulans AMA1 origin of replication .
Before constructing the insert fragment, the gene sequences encoding the nucleases 0076, 0100, 0105 and 0149 were first codon optimized for A. niger, and the nucleoplasmin and SV40 nuclear localization signals were added on to the 5’ and 3’ ends respectively. Their respective hypothetical scaffold sequences without spacer (Direct repeat) were used without codon-optimization. The used nuclease sequences and their corresponding sgRNAs without spacers can be found in Table 6.
Table 6. Sequences of nuclease genes and corresponding sgRNAs used .
Insert fragments for 0100 and 0149 contained constitutive expression cassettes for their respective gRNAs and nucleases. For 0075 and 0105, insert fragments contained a constitutive expression cassette encoding for their gRNAs and a Thiamine inducible cassette encoding for their nucleases. 25bp regions homologous with the plasmid backbone insert site were added to the 5’ and 3’ flanks to aid future plasmid construction, and the resulting fragments were synthesized using commercial gene synthesis services.
The plasmid backbones and inserts were joined to form their corresponding intermediate plasmids using HiFi DNA Assembly (New England Biolabs, USA). These intermediate plasmids were designated pBKHM-076-thiA-int, pBKHM-0100-int, pBKHM-0105-thiA-int and pBKHM- 0149-int (Figures 13A-13D). These plasmids were used directly as negative controls.
Figure 13 shows the intermediate plasmid/ negative control vectors (A) pBKHM-076-thiA- int, (B) pBKHM-0100-int, (C) pBKHM-0105-thiA-int and (D) pBKHM-0149-int. A.oryzae U6 promoter, A. oryzae U6 promoter; Af tRNA gly, A. fumigatus glycine tRNA; A.oryzae U6 term, A. oryzae U6 terminator; DR, direct repeat; tracr RNA, tracr RNA; sgRNA, concatenation of the respective direct repeat and tracrRNA sequence; AnPtefl , A. niger tefl promoter; Ttef(nid), A. nidulans tef1 terminator; Ptefl , A. niger tefl promoter; NATr, Nourseothricin resistance gene; TniaD, niaD terminator; pUC, pUC plasmid backbone sequence; ampR, E. coli ampicillin resistance gene; AMA1 , A. nidulans AMA1 origin of replication
Screening plasmid construction
Prior to screening plasmid construction, the four plasmids pBKHM-076-thiA-int, pBKHM- O1OO-int, pBKHM-0105-thiA-int and pBKHM-0149-int were digested by Asci and the linearized vector was recovered by silica column purification (Nucleobond Xtra Midi, Takara).
To construct the screening plasmids, various 21 bp gRNA spacer oligo sequences targeting different regions of fwnA were used (Table 7). Flanking 20bp regions homologous to the plasmid backbone insert sites were added and the oligos were synthesized using commercial oligonucleotide synthesis services. Subsequently, the spacer sequences were then joined to the four Asc cut intermediate plasmids using HiFi DNA Assembly (New England Biolabs, USA) to form the final screening plasmids.
Table 7. Target sequences for the various enzymes
Fungal strain
Aspergillus niger strain MBinl 18 (NN049549) was used for all procedures.
Culture media
The following media were used in this study:
COVE-N-glyX: 218 g/L Xylitol, 10 g/L glycerol, 2.02 g/L KNO3, 50ml/L COVE salt solution, 25 g/L agar BA10, pH5.3
YPG: 4 g/L yeast extract, 1 g/L KH2PO4, 0.5 g/L MgSO4.7aq, 15 g/L glucose, pH 6.0 COVE salt solution: 26 g KCI, 26 g MgSO4.7aq, 76 g KH2PO4, 50ml Cove trace metals /L COVE trace metals: 0.04 g NaB4O?.10aq, 0.4 g CuSO4.5aq, 1.2 g FeSO4.7aq, 0.7 g MnSO4.aq, 0.8 g Na2MoO2.2aq, 10 g ZnSO4.7aq /L
COVE-N top agar solution: 342.3 g/L Sucrose, 20ml/L COVE salt solution, 3 g/L NaNO2, 10 g/L Nippon gene agarose L Low melt agarose, 6 drops/L 5N NaOH
STC: 0.8 M sorbitol, 50 mM Tris pH 8, 50 mM CaCh STPC: 40 % PEG4000 in STC buffer.
Tween water: 1 g/L Polyoxyethylen (20) Sorbitan Monolaurate (Tween 20) LB: 10 g/L Bacto tryptone, 10 g/L NaCI, 5 g/L Bacto Yeast extract, pH 7.0
Protoplast generation and transformation
Protoplast formation
An agar slant (COVE-N-glyX) was inoculated with spores of MBinl 18, and the strain was grown at 30°C until completely sporulated. 9 ml of 0.1 % tween20 water was added to the slant, and the spores were suspended manually. The spore suspension was transferred to shake flasks (500 ml) with baffles containing 100 ml YPG medium. The flask was incubated at 30 or 32°C for 15-20 hrs (60-80 rpm). Mycelia was collected by filtering through Mira-cloth. Mycelia was washed 2-3 times by 0.6 M KCI or 0.7M KCI+10mM CaCh. Mycelia was resuspended in 20-30 ml 0.6M KCI or 0.7M KCI+10mM CaCh with 20-48 mg/ml Glucanex and 1.2mg/ml BSA in 50 ml Centrifuge tube. The sample was incubated for 1-1.5 hrs at 30 or 32 °C, 80 rpm, and the protoplasting was monitored frequently by microscopy. After protoplasting was observed, the solution was filtered through Mira-cloth to 25 ml Universal container (Nunc 364211). The solution was then centrifuged at 2000 rpm for 10 minutes with slow acceleration. The supernatant was discarded, and the pellet was washed with 5-15 ml STC buffer, then centrifuged at 2000 rpm for 10 minutes with slow acceleration. The protoplasts were resuspended in protoplast solution (STC/STPC/DMSO=8:2:0.1) to a concentration of approx. 2 x 107 protoplasts/ml. Mixing and pellet resuspension were done gently using pipetting.
Transformation
For each transformation, the transforming DNA was added to 100 l of protoplasts in 14 ml Falcon tube, mixed gently and incubated on ice for more than 30 minutes. 1 ml SPTC buffer was added and the solution was mixed gently, then incubated at 37 °C in water bath for 20 minutes. 10-15 ml COVE-N top agar solution containing 50 pg/ml of Nourseothricin was added to the solution, mixed and poured onto transformation plates. After the agar solidified, the plates were incubated at 34 °C until colonies were clearly visible. Transforming DNA volume is less than 10 ul (1 - 10 ug).
Strain isolation
Colonies were picked for each transformation and isolated to COVE-N-glyX agar. The colonies were allowed to sporulate by incubation at 30 °C for 1 week.
Spore PCR, DNA purification and Sanger seqeuncing
Reagents from the Phire Plant Direct PCR Kit (Thermofisher) were used. Spores from each fungal strain were picked using a 1 pl inoculating loop by firstly scraping off spores from the fungal colony and dipping in 10 pl Dilution Buffer. The sample was vortexed briefly and incubated at RT for 5 mins, then centrifuged briefly, diluted 10 times with sterile water and used as template in subsequent PCR. 20uL PCR reactions were set-up according to the makers instructions, using 0.5uL of the prepared template. PCR amplification was done according to the makers instructions.
DNA purification was done by gel electrophoresis, excision of DNA with correct band length and purification by silica column (QIAquick Gel Extraction Kit, QIAGEN) using standard molecular biology procedures. Purified DNA samples were sequenced by commercial Sanger sequencing services.
Example 3: Activity of novel CRISPR nucleases in Bacillus licheniformis
Design of protospacers targeting a DsRED gene in B. licheniformis
To evaluate the editing activity of the CRISPR nucleases of the present invention in B. licheniformis, a knocking out experiment was designed as follows. The B. licheniformis strain MDT545 (WO 2021/183622) was used as a host strain since it has a DsRED expression cassette (SEQ ID NO: 383) at amyL locus of the chromosome. As MDT545 also has a GFP expression cassette at xylA locus of the chromosome, DsRED knock-outs result in diminishing red fluorescence and arising only green fluorescence from the resultant cells, thus edited cells can be screened by visible phenotypic changes. Deletion cassette of DsRED gene (SEQ ID NO: 384) was designed to delete 67-bp DNAs in the middle of DsRED coding sequences. The protospacer
sequences for disruption of the DsRED gene are shown in SEQ ID NOs: 385 - 404. Protospacers were designed as 21-bp long DNAs within the region to be deleted after disruption.
Construction of plasmids for expression of CRISPR components targeting a DsRED gene in B. licheniformis
The full construction of plasmid DNAs was done in several consecutive steps by sequential DNA manipulations as follows.
The plasmids for expression of CRISPR components were assembled by PCR amplifications of synthetic DNAs. The purified PCR products were used in a subsequent PCR reaction to create a single plasmid using SOE PCR as described in materials and methods. The PCR amplification reaction mixture contained 50 ng of each of the gel purified PCR products and a thermocycler was used to assemble and amplify the plasmids. The resulting SOE products were used directly for transformation of B. subtilis host PP3724 to establish the plasmids which were later used as vehicles for transfer of the established plasmids to the B. licheniformis host MDT545. Synthetic DNAs for construction of plasmids are shown as SEQ ID NOs: 405 - 484. Briefly, a synthetic DNA fragment encoding a given CRISPR nuclease was integrated into the region flanked upstream by the promoter PamyL4199 (U.S. Pat. No. 6,100,063) and downstream by the aprH transcription terminator of Bacillus clausii in a mobilizable plasmid vector pBC16, marked by tetracycline resistance (Bernhard et al. (1978), J Bacteriol. 133, 897-903). In the case of the 0076 nuclease (SEQ ID NO:1) and the 0102 nucleases (SEQ ID NQ:40), the IPTG inducible promoter Pgrac was used for expression of nucleases instead of PamyL4199. On the other hand, a synthetic DNA fragment encoding a guide RNA expression cassette was integrated into upstream of the DsRED deletion cassette in a plasmid vector based on temperature sensitive pAM beta 1 -derived plasmid pWT, marked by erythromycin resistance (Bidnenko et al. (1998), 28, 1005-1016), and comprising the origin of transfer oriT of plasmid pUB110 for mobilization by conjugation (Selinger et al. (1990), J. Bacteriol., 172, 3290-3297). DNA maps representing a typical nuclease vector and a guide RNA expression vector are shown in Figs. 14 - 17. When necessary, synthetic DNAs were PCR-amplified by the oligo DNAs shown in Table 8 (SEQ ID NOs: 485 - 507) and gel-purified before SOE PCR and subsequent transformation to B. subtilis host PP3724. Finally, transformants were isolated by tetracycline or erythromycin resistance.
Activity of novel CRISPR nucleases in B. licheniformis
The purpose of this experiment was to demonstrate that the novel nucleases efficiently introduce a desired modification to the chromosome of B. licheniformis. As described in the above sections, DsRED knock-out phenotype (and green fluorescence) was used as to show the targeted nuclease activity.
First, a guide RNA expression vector was transferred to the B. licheniformis host MDT545 by conjugation with a B. subtilis PP3724 host having the said plasmid. Since the B. subtilis PP3724 host requires D-Alanine to the growth medium, desired conjugants of B. licheniformis could be isolated in the erythromycin containing medium without D-Alanine. Second, a corresponding nuclease vector was transferred to the B. licheniformis host MDT545 with the said guide RNA vector by conjugation. Finally, conjugants growing on tetracycline and erythromycin containing medium were the ones with vectors necessary for editing. The editing efficiency of a given set of nuclease and guide RNA expression vectors was calculated as follows: Editing Efficiency (%) = G / N, where G indicates the number of colonies only showing green fluorescence on double resistant medium (agar plate), and N indicates the total number of colonies grown on the said medium. The summary of the editing efficiency for a given set of nuclease and guide RNA is shown in Table 2. In the column of “Guide RNA backbone”, “separate” indicates that
crRNAs and tracrRNAs were separately expressed, while “single guide_a” or “single guide_b” indicates that crRNA and tracrRNA were concatenated in a different way. In the column of “Target”, “empty” indicates no protospacer sequence (0-bp) designed for targeting the DsRED gene, while protospacer-1 to -4 indicates different sequences of protospacers (21 -bp) designed.
Nuclease 0076 was expressed by the IPTG-inducible Pgrac promoter. The green fluorescent colonies were found from the double resistant medium without IPTG (EXP_08 in Table 9), indicating that leaky expression from Pgrac was sufficient for introducing desired modification to the chromosome.
Nuclease 0102 was expressed by the IPTG-inducible Pgrac promoter. The total number of colonies was counted on the double resistant medium without IPTG, where all colonies showed red fluorescence. Then, up to eight colonies per plate were isolated to the medium with IPTG, where we checked the phenotypic changes of the isolates. Thus, editing efficiency was calculated with the number of isolates, not with the total number of colonies, as shown in parentheses of Table 9.
Next, to confirm that he desired modification (67-bp deletion within DsRED gene) was introduced into the chromosome of the green colonies, genomic PCR and sanger sequencing were performed to selected colonies from EXP_08, _18, _20, _22, _27, _33, _37, _38, _43, _48, _49, _53, _55, _58, and _61 , respectively. Oligo DNAs for amplification of genomic region and sanger sequencing are shown in Table 10 (SEQ ID NO:508 - 511). Finally, it was proven that all tested green colonies had the desired deletion within the DsRED gene in the chromosome. Given that a small number of green colonies could emerge even with empty protospacer (EXP_61 of Table 9, 1 % efficiency) possibly due to spontaneous homologous recombination, one percent should be a criterion to conclude that a given nuclease helps increasing the efficiency of desired modification to the chromosome of B. licheniformis. In summary, the novel CRISPR nucleases 0076, 0100, 0102 and 0149 have shown to be a promising tool for genome engineering in B. licheniformis hosts.
Materials
Chemicals used as buffers and substrates were commercial products of at least reagent grade.
PCR amplifications were performed using standard textbook procedures, employing a commercial thermocycler and either PrimeStar GXL polymerase (TaKaRa, Japan) or KOD One polymerase (TOYOBO, Japan).
Following media for bacterial growth were used.
LB: See EP 0 506 780.
TY: See WO 94/14968, p. 16.
To select for erythromycin resistance, agar and liquid media were supplemented with 5 micro-gram/ml erythromycin. To select for tetracycline resistance, agar and liquid media were supplemented with 15 micro-gram/ml tetracycline. Where needed, IPTG (Isopropyl p-d-1- thiogalactopyranoside) was added with 1 mM as a final concentration.
Oligonucleotide primers were obtained from Macrogen, Korea. DNA manipulations (plasmid and genomic DNA preparation, restriction digestion, purification, ligation, DNA sequencing) was performed using standard textbook procedures with commercially available kits and reagents.
DNA was introduced into B. subtilis rendered naturally competent, either using a two step procedure (Yasbin et al., 1975, J. Bacteriol. 121 : 296-304.), or a one step procedure, in which cell material from an agar plate was resuspended in Spizisen 1 medium (WO 2014/052630), 12 ml shaken at 200 rpm for appr. 4 hours at 37 °C, DNA added to 400 microliter aliquots, and these further shaken 150 rpm for 1 hour at the desired temperature before plating on selective agar plates.
DNA was introduced into B. licheniformis by conjugation from B. subtilis, essentially as previously described (EP2029732 B1), using a modified B. subtilis donor strain PP3724, containing pLS20, wherein the methylase gene M.blil 90411 (LIS20130177942) is expressed from a triple promoter at the amyE locus, the pBC16-derived orf beta and the B. subtilis comS gene (and a kanamycin resistance gene) are expressed from a triple promoter at the air locus (making the strain D-alanine requiring), and the B. subtilis comS gene (and a cat gene) are expressed from a triple promoter at the pel locus.
All of the constructions described in the examples were assembled from synthetic DNA fragments ordered from TWIST Bioscience, USA. The fragments were assembled by sequence overlap extension (SOE) as described in the examples.
Strains
PP3724: This strain is a B. subtilis derivative, containing pLS20, wherein the methylase gene M.blil 90411 (US20130177942) is expressed from a triple promoter at the amyE locus, the
pBC16-derived orf beta and the B. subtilis comS gene (and a kanamycin resistance gene) are expressed from a triple promoter at the air locus (making the strain D-alanine requiring), and the B. subtilis comS gene (and a cat gene) are expressed from a triple promoter at the pel locus.
SJ1904: This strain is a B. licheniformis derivative, described in WO 2008/066931.
MDT545: This strain is a SJ1904 derivative, described in WO 2021/183622.
Plasmids pC194: Plasmid isolated from Staphylococcus aureus (Horinouchi and Weisblum, 1982). pE194: Plasmid isolated from S. aureus (Horinouchi and Weisblum, 1982). plIBUO: Plasmid isolated from (McKenzie et al., 1986)
Example 4: Novel nucleases are structurally highly distinct from S. pyogenes Cas9
Table 11 shows TM values comparing the three-dimensional structure of selected novel nucleases with the structure of S. pyogenes Cas9. The analyzed novel nucleases show a TM value of at most 0.62 when aligned to the three-dimensional structure of S. pyogenes Cas9, indicating that these novel nucleases are structurally very distinct thereof. Also, Table 11 further highlights the close relationship between the nucleases of SEQ ID NO: 21 and SEQ ID NO: 29 with a TM of 0.98.
The close structural similarity between the novel nucleases 0076, 0100, and 0172 is also shown in the ribbon alignments of Figs. 20-22 and Fig. 24.
Figure 20 shows a ribbon alignment between of the protein structures from nucleases 0076 (black) and 0172. Figure 21 shows a ribbon alignment between of the protein structures from nucleases 0100 (black) and 0172. Figure 22 shows a ribbon alignment between of the protein structures from nucleases 0076 (black) and 0102. Figure 24 shows a ribbon alignment between of the protein structures from nucleases 0076 (black) and 0100.
Figure 23 shows a ribbon alignment between of the protein structures from nucleases 0076 (black) and S. pyogenes Cas9 further evidencing the significant structural differences between the novel nucleases.
Example 5: Novel CRISPR nucleases with nuclease activity in E. coli
Killing effect confirming nuclease activity of novel CRISPR nucleases
In this experiment, the nuclease activity and efficiency of novel CRISPR nucleases 0100 and 0149 was tested in E. coli BL21 (DE3). The killing assay was used to evaluate the activity of the nucleases. The killing effect of E. coli using CRISPR nucleases involves the precise targeting and cleavage of bacterial DNA, leading to cell death due to the inability to repair critical genomic damage [Bikard et al., (2014), Nature biotechnology, 32(11), 1146-1150],
Unlike eukaryotic cells, bacteria have limited mechanisms for repairing double-strand breaks (DSBs). The primary repair mechanism, non-homologous end joining (NHEJ), is error- prone and unavailable in the E. coli BL21 (DE3) strain. If no repair gene fragment is provided to the cells, it is very difficult for them to survive after the DSBs. Therefore, the killing effect was used as a primary method to evaluate and identify the efficacy of specific gRNA sequences targeting the adhE and tam loci in the BL21 strain.
Both adhE and tam are not essential genes in the BL21 strain and thus can be knocked out and knocked in for different purposes. adhE is involved in metabolic processes related to fermentation and energy production, while tam is implicated in stress response mechanisms that enhance bacterial survival under adverse conditions [Membrillo-Hernandez et al. (2000), Journal of Biological Chemistry, 275(43), 33869-33875; and Ferla, M. and Patrick, W. (2014), Microbiology, 160(8), 1571-1584],
In this example, a dual plasmid system was designed for CRISPR. The genes encoding the spCas9, novel nuclease 0100 (SEQ ID NO: 21), and novel nuclease 0149 (SEQ ID NO: 48) were constructed in a separate low-copy plasmid (CRISPR plasmid), and the gRNA sequence for the targeted gene, as well as the CRISPR gRNA scaffold according to the specific nucleases, was constructed in another high-copy plasmid (guide plasmid). The gRNA and PAM sequences are specified in Table 12. The sequence encoding the 0100 nuclease-was codon optimized for the experiment (SEQ ID NO: 513). Both the wild-type (WT) coding sequence (SEQ ID NO: 100) and the codon-optimized sequence (OPT, SEQ ID NO: 518) for the 0149 nuclease were ordered and used in this experiment. For both codon-optimized sequences, the amino acid sequence of the nuclease did not change. An overview of codon optimized sequences for expression in E. coli is shown in Fig. 13. The CRISPR plasmid and guide plasmid were sequentially introduced into the BL21 strain and induced following the procedures described below in “materials and methods”. After transformation and recovery, the desired constructs were plated out on antibiotic-containing agar plates with or without 0.4 pg/ml anhydrotetracycline and incubated at 30 °C for at least 1-2 days. The killing effect was verified by comparing the number of colonies on plates under induced conditions with those under non-induced conditions.
Table 13. Codon-optimized nuclease coding sequences for expression in E. coli.
In this experiment, the killing effect was observed in 0100, 0149 WT, and 0149 OPT at both the adhE and tam loci, as shown in Fig. 18 A-F. Fig 18. shows colony comparison within different constructs for killing effect between induced and uninduced conditions. The left plate was induced with 0.4 pg/ml anhydrotetracycline. The right plate remains uninduced. Fig. 18 A shows 0100 with adhE targeted gRNA. Fig. 18 B shows 0149 OPT with adhE targeted gRNA. Fig. 18 C shows 0149 WT with adhE targeted gRNA. Fig. 18 D shows 0100 with tam targeted gRNA. Fig. 18 E shows 0149 OPT with tam targeted gRNA. Fig. 18 F shows 0149 WT with tam targeted gRNA. Fig. 18 G shows spCas9 with adhE targeted gRNA (positive control).
SpCas9 was used as a positive control, as shown in Fig. 18 G. It is suggested that some of the constructs, such as 0149 WT (Fig. 18 C and F) and 0149 OPT (Fig. 18 B and E), exhibited fewer colonies overall, even in uninduced conditions, likely due to leaky expression of CRISPR nucleases. This result indicates that these nucleases could be very active even with leaky expression (small amounts of nuclease in the cell), thus leading to cell death during the process.
Editing efficiency validation in knock-in knock-out (KIKO) experiment
In E. coli, genetic modification involves the use of homologous recombination to introduce a specific DNA sequence into the bacterial chromosome. This recombination event can be facilitated by the expression of recombination proteins, such as those from the lambda phage (e.g., the A-Red recombinase system), which enhances efficiency [Sabri et al. (2013), Microbial Cell Factories, 12(1), and Zhang et al. (2017), Current Microbiology, 74(8), 961-964], Therefore, during the construction of the CRISPR plasmid, the pSIM6 vector was used as a backbone to provide the A-Red recombination system in order to conduct efficient homologous recombination [Datta et al. (2006), Gene, 379, 109-1156], The process requires a DNA fragment that contains the desired genetic sequence flanked by regions of homology to the target site in the chromosome. The size of the homology arms is crucial as it facilitates the integration of DNA into the chromosome through homologous recombination.
In this experiment, the donor Pj23io4-mCherry DNA fragment was constructed with a very short homology arm (100 bp), which significantly reduces the recombination efficiency without a selection marker. The goal of this experiment is to test the efficiency of the novel CRISPR system as a counterselection method for providing better editing results compared to the traditional method using just the A-Red recombinase system for genetic modification. Thus, the donor fragment Pj23io4-mCherry with a homology arm for the adhE locus was introduced as a repair template to perform knock-in, knock-out (KIKO) in the adhE locus in BL21 strain. There was no additional antibiotic selection marker provided on the repair template.
The tested constructs, which included the desired CRISPR and guide plasmids, were first heat-activated to induce the expression of A-Red recombinase to facilitate the recombination of the donor fragment. Afterwards, the donor fragment was introduced via electroporation. After transformation and recovery, the tested candidate was induced with 0.4 pg/ml
anhydrotetracycline and incubated at 30 °C in a shaking flask overnight. The induced culture was then plated out the next day to pick single colonies for characterization. The 0149 Ctrl had no guide plasmid in the strain and thus is considered a negative control, along with the pSIM6 plasmid. Both constructs only have the A-Red recombinase system for conducting genetic modification. The uninduced controls (0149 WT Ctrl and pSIM6 Ctrl) were plated out immediately after recovery on agar plates with the corresponding antibiotics and incubated at 30 °C overnight. Colony PCR was performed to confirm the insertion of the donor fragment in all conditions.
Fig. 19 shows the editing efficiency of each CRISPR system compared to the traditional method using only the A-Red recombinase system. All four conditions were supplied with the donor DNA fragment. Expression of nucleases 0149 (both OPT and WT) was induced overnight with 0.4 pg/ml anhydrotetracycline. 0149 WT Ctrl contains 0149 nuclease (encoded by WT sequence) but without any guide plasmid. pSIM6 Ctrl was the A-Red recombinase system only control, without any CRISPR nuclease. Both conditions (pSIM6, and 0149 WT Ctrl) served as negative controls for the experiment.
As shown in Fig.19, 0149 OPT encoded by the codon-optimized sequence has shown 100% editing efficiency, with 79 out of 79 clones being positive for the desired insert. 0149 encoded by the WT sequence has exhibited 62.2% editing efficiency, with 28 out of 45 clones being positive for the desired insert. The insert was sequence-verified from selected clones (data not shown). However, both 0149 WT Ctrl and pSIM6 controls (considered traditional editing methods) only demonstrated 2% (1 out of 48 clones) and 3% (1 out of 33 clones) editing efficiency, respectively. The novel CRISPR nuclease 0149 has significantly improved editing efficiency compared to the classic homologous recombination method. Additionally, codon optimization of the 0149 nuclease further improved editing efficiency.
The above results show that nucleases 0100, and 0149 have demonstrated nuclease activity in E. coli. The codon-optimized 0149 nuclease has shown 100% editing efficiency at the adhE locus, while the 0149 encoded by the wild-type DNA sequence has shown 62.2% efficiency.
Material and Methods
Strain and plasmid construction
E. coli BL21 (DE3) strain (Novagen) was used in this study. The CRISPR plasmid was constructed on the pSIM6 vector [6 Datta et al. (2006), Gene, 379, 109-115] with an inducible Ptet promoter controlling the expression of different CRISPR nucleases. The CRISPR plasmid carries an ampicillin resistant marker and was synthesized by Twist Biosciencea vendor. The guide plasmid, which contains the gRNA sequence and the corresponding gRNA scaffold from different nucleases, was synthesized on the pTwist Kan High Copy v2. The gRNA sequence is under the control of a synthetic promoter. The guide plasmid carries a kanamycin resistant marker. The gRNA and the PAM sequence are indicated in Table 1. The gRNA sequence for the spCas9 adhE
locus was ordered according to Shukal et al. [Shukal et al. (2022), Microbial Cell Factories, 21(1), 19], The donor DNA fragment Pj23io4-mCherry, containing a 100 bp homology flanking area from the adhE locus, was synthesized and further amplified via PCR. It was then purified with the NucleoSpin Gel and PCR Clean-up Mini kit (Macherey-Nagel) for further experiments.
Media and culture conditions
All the cells were grown in 2x Yeast Extract Tryptone medium. Briefly, cells from overnight culture or 2xYT plate were inoculated into 20 mL fresh media in shaking flask. Cells were incubated at 30 °C for 16 h or longer before harvest. The media were supplemented with appropriate antibiotics (100 mg/L ampicillin and 30 mg/L kanamycin) to maintain corresponding plasmids.
CRISPR-Cas mediated killing assay
To construct BL21 harboring the CRISPR plasmid, BL21 electrocompetent cells were prepared for transforming the corresponding plasmids. For electroporation, 10 ng of the plasmid were mixed with 60 pl of the competent cells in a 1 mm Gene Pulser cuvette (Bio-Rad) and electroporated at 1.8 kV. The cells were recovered in 600 pl of SOC medium at 30 °C, 120 rpm for 1 hour before spreading onto a 2xYT agar plate containing ampicillin (100 pg/ml) and incubated overnight at 30 °C. The method is modified from Shukal et al. [Shukal et al. [ Shukal et al. (2022), Microbial Cell Factories, 21(1), 19], The guide plasmid was then introduced to the corresponding BL21-CRISPR plasmid-harboring strain with the same set-up for electroporation. After recovery, 100 pl of the culture was spread onto an antibiotic-containing 2xYT agar plate (100 pg/ml ampicillin and 30 pg/ml kanamycin) with or without 0.4 pg/ml anhydrotetracycline and incubated at 30 °C for at least 1-2 days to observe the killing effect.
CRISPR-Cas mediated gene knock in/knock out (KI KO)
A single colony from the killing assay, which contained both the CRISPR and guide plasmid, was picked and inoculated into 20 ml of 2xYT medium containing 100 pg/ml ampicillin and 30 pg/ml kanamycin and shaken at 30 °C, 120 rpm overnight. The cells were induced for 15 minutes at 42 °C for A-Red recombinase expression and subsequently made electrocompetent [Datta et al. (2006), Gene, 379, 109-115], Then, 300-600 ng of the donor DNA fragment PJ23104- mCherry was introduced following the same protocol for electroporation as described above.
The cells were recovered in 600 pl of SOC medium at 30 °C, 120 rpm for 1 hour. 100 pl of the recovered culture was taken out, serially diluted, and then spread onto a 2xYT agar plate supplied with 100 pg/ml ampicillin and 30 pg/ml kanamycin and incubated at 30 °C overnight as a non-induced control. Additionally, 200 pl of the recovered culture was transferred into 20 ml of 2xYT medium supplied with 0.4 pg/ml anhydrotetracycline, 100 pg/ml ampicillin, and 30 pg/ml kanamycin (to maintain the plasmids) and shaken at 30 °C, 120 rpm overnight. The next day, the
cell culture was serially diluted to obtain single colonies for characterization. The method was modified from publications [Datta et al. (2006), Gene, 379, 109-115; and Shukal et al. [ Shukal et al. (2022), Microbial Cell Factories, 21(1), 19], For the pSIM6 control, the procedure was similar to that described in Datta et al. [Datta et al. (2006), Gene, 379, 109-115], and the same amount of donor fragment as described above was added. After recovery, the recovered culture was plated directly on the selection agar plate and incubated at 30 °C overnight.
For clone characterization, single colonies were picked, and colony PCR was performed to identify successful recombination events based on the size of the insert.
Example 6: Novel nucleases for gene editing in Bacillus subtilis
The pJOE8999 plasmid, as described by Sachla et al. (2021) in "A simplified method for CRISPR-Cas9 engineering of Bacillus subtilis" (Microbiol Spectr 9:e00754-21), was obtained from the Bacillus Genetic Stock Center (BGSC #ECE358). This plasmid expresses the Cas9 nuclease under a mannose-inducible promoter and includes the Pvan promoter for constitutive expression of the guide sgRNA when cloned into the vector.
Construction of plasmids with Golden Gate Assembly
The following derivative plasmids of pJOE899 (SEQ ID NO: 542), including pJOE_Cas9_001 , pJOE_NZ0149_002, and pJOE_NZ0149_empty, were generated using standard Golden Gate Assembly of either two or three DNA fragments with the Bsal restriction enzyme and T4 DNA ligase. The Golden Gate assemblies were then transformed into E. coli TOP10 with selection on kanamycin.
DNA fragments were either generated by PCR or ordered as synthetic genes from Twist BioScience. The DNA containing the NZ0149 nuclease gene was codon optimized for B. subtilis expression without Bsal restriction enzyme sites and ordered as synthetic DNA at Twist BioScience (SEQ ID NO: 528). DNA fragments containing the guide RNA for the CRISPR endonucleases, Cas9 and novel nuclease 0149, sgRNA_Cas9_001 (SEQ ID NQ:530) and sgRNA_NZ0149_002 (SEQ ID NO:531) respectively, were ordered as synthetic DNA at Twist BioScience. The DNA fragment containing the Kanamycin (KAN) resistance and Cas9 genes were PCR amplified with oligonucleotides using pJOE8999 plasmid as template DNA.
See Table 14 below for the assembly of DNA fragments and the oligonucleotides used for the PCR reactions and the corresponding template DNA.
B. subtilis strain with the dsRed gene integrated in the genome
A B. subtilis strain containing the full length dsRed gene (SEQ ID NO:544) integrated in the Pel locus was constructed by homology recombination. This strain was then made competent and used to transform the pJOE plasmids and repair DNA.
Preparation of repair DNA of the dsRed gene containing a 130 bp deletion.
The repair DNA (SEQ ID N018) containing the dsRed gene with a 130 bp deletion (SEQ ID NO:543) was ordered at Twist BioScience as synthetic DNA.
The synthetic repair DNA (SEQ ID NO:545) was amplified by PCR using forward and reverse oligonucleotides identical to the 5’ and 3’ sequences. The PCR-amplified product was then used to co-transform B. subtilis with pJOE plasmids.
Table 14. DNA Fragments for Golden Gate Assembly
Plasmid preparation and transformation ofB. subtilis containing the dsRed gene integrated in the Pel locus:
Each plasmid was sequence-verified in E. co//TOP10 and transformed into E. co//TG1 , which generates multimeric plasmid DNA as described by Sachla et al. A plasmid miniprep was then prepared, and competent cells of the B. subtilis strain with dsRED integrated in the genome were co-transformed with 2 pg of plasmid DNA and 2 pg of repair template DNA (SEQ ID NO:545) provided as a PCR product. The transformation was plated on Luria Broth (LB) plates with 0.5% mannose (#M2069; SIGMA) and 15 pg/ml kanamycin, and grown at 30°C for 3 days. White and red colonies were counted. See results in Table 15.
White and red B. subtilis colonies were picked and streaked onto new LB+KAN+mannose plates and grown for 2 days at 30 °C. Colony PCR was then performed using oligonucleotides (SEQ ID NO:540 and SEQ ID NO:541) to amplify the dsRed gene and flanquing regions from the Pel locus. The results showed that white colonies contained a smaller fragment corresponding to the dsRed gene with a deletion, whereas the red colonies had a larger PCR fragment corresponding to the full-length dsRed gene.
These findings, along with sequencing results, confirm that the functional dsRed gene in the Pel locus had been replaced by the non-functional dsRed gene from the repair DNA due to the action of either Cas9 or 0149 nucleases. Therefore, nuclease 0149 can be used for gene editing in the B. subtilis host.
Construction of plasmids with POE-PCR
The following derivative plasmids of pJOE899 (SEQ ID NO:542), including pJQE_NZ0100_004, pJQE_NZ0100_empty, pJQE_NZ0102_003, pJQE_NZ0102_004 and pJQE_NZ0102_empty were generated by assembling synthetic DNA (ordered from Twist BioScience, the DNA containing the 0100 nuclease gene and 0102 nuclease gene was codon optimized for B. subtilis expression) and plasmid elements by POE-PCR (POE-PCR is described in WO24133344. See table 16 for assembly of fragments and SEQ IDs.
Table 16. DNA fragments for POE
Table 17 shows additional sequence information used within the B. subtilis experiments.
Transformation ofPOE-PCRs into B. subtilis containing the dsRed gene integrated in the Pel locus:
The POE-PCRs were transformed into the B. subtilis strain with dsRED integrated in the genome. The transformations were plated on Luria Broth (LB) plates with 0.5% mannose (#M2069; SIGMA) and 15 pg/ml kanamycin and incubated at 30°C for 3 days. The efficiency of recombination was assessed by counting the number of red and white colonies as described previously. While only red colonies were obtained on plates transformed with empty plasmids, pJOE_NZ0102_empty and pJOE_NZ0100_empty, plates transformed with pJGE_NZ0100_004, pJGE_NZ0102_003 and pJOE_102_004 contained white as well as red colonies.
Red and white colonies were analyzed with PCR and sequencing as described previously to confirm that the deletion in dsRED had been introduced as intended. Therefore, novel nucleases 0100 and 0102, along with the corresponding sgRNA, can be used for gene editing in the B. subtilis host.
The invention described and claimed herein is not to be limited in scope by the specific aspects herein disclosed, since these aspects are intended as illustrations of several aspects of the invention. Any equivalent aspects are intended to be within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. In the case of conflict, the present disclosure including definitions will control.
The invention is further defined by the following numbered paragraphs:
1 . A Cas nuclease selected from the group consisting of:
(a) a polypeptide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29;
(b) a polypeptide encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, preferably by a polypeptide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of any of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 ,
SEQ ID NO: 100, or SEQ ID NO: 81 , or any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550;
(c) a polypeptide derived from SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably from SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions, such as conservative amino acid substitutions;
(d) a polypeptide having a TM-score of at least 0.80, e.g., at least 0.85, at least 0.90, at least 0.91 , at least 0.92, at least 0.93, at least 0.94, at least 0.95, at least 0.96, at least 0.97, at least 0.98, at least 0.99, or even 1.0, compared to the three-dimensional structure of the polypeptide of any one of SEQ ID NOs: 1-52, preferably of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29, wherein the three-dimensional structure is calculated using Alphafold;
(e) a polypeptide derived from the polypeptide of (a), (b), (c), or (d), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and
(f) a fragment of the polypeptide of (a), (b), (c), (d) or (e).
2. The nuclease of paragraph 1 having nuclease activity, and/or DNA-binding activity.
3. The nuclease according to any one of paragraphs 1-2, wherein the nuclease comprises or consists of an amino acid sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
4. The nuclease of any one of paragraphs 1-3, comprising, consisting essentially of, or consisting of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
5. The nuclease according to any one of paragraphs 1-4, wherein the nuclease is a fragment of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6,
SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29, wherein the fragment preferably contains at least 600 amino acid residues (e.g., amino acids 9 to 640 of SEQ ID NO: 1 , amino acids 13 to 628 of SEQ ID NO: 39, amino acids 16 to 637 of SEQ ID NO: 40, amino acids 10 to 637 of SEQ ID NO: 41 , amino acids 10 to 639 of SEQ ID NO: 42, amino acids 10 to 636 of SEQ ID NO: 43, amino acids 10 to 635 of SEQ ID NO: 44, amino acids 9 to 640 of SEQ ID NO: 45, amino acids 10 to 637 of SEQ ID NO: 46, amino acids 10 to 633 of SEQ ID NO: 47, amino acids 12 to 632 of SEQ ID NO: 48, amino acids 9 to 640 of SEQ ID NO: 51 , amino acids 8 to 620 of SEQ ID NO: 21 , or amino acids 9 to 640 of SEQ ID NO: 52).
6. The nuclease of any one of paragraphs 1-5, comprising, consisting essentially of, or consisting of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
7. The nuclease of any one of paragraphs 1-6, which is encoded by a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the mature polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID
NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, or to any one of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550, preferably to SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81.
8. The nuclease of any one of paragraphs 1-7, comprising an N-terminal extension and/or C-terminal extension of 1-10 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids, preferably and extension of 1-10 amino acid residues in the N- terminus and/or 1-10 amino acids in the C-terminus, such as 1-5 amino acids.
9. The nuclease of any one of paragraphs 1-8, having at most 10%, at most 9%, at most 8%, at most 7%, at most 6%, at most 5%, at most 4%, at most 3%, at most 2% or at most 1 % sequence differences to the polypeptide of SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29.
10. The nuclease of any one of paragraphs 1 -9, which differs from the polypeptide of SEQ I D NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41 , SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51 , or SEQ ID NO: 52, preferably of SEQ ID NO: 1 , SEQ ID NO: 21 , SEQ ID NO: 40, SEQ ID NO: 39, SEQ ID NO: 48, or SEQ ID NO: 29, by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
11 . The nuclease of any one of paragraphs 1-10, which is obtained from or obtainable from a Streptococcus cell, e.g., a Streptococcus equinus, Streptococcus mutans, Streptococcus sp., Streptococcus henryi DSM 19005, Streptococcus sp. CCH8-G7, Streptococcus pacificus, Streptococcus orisratti DSM 15617, Streptococcus salivarius, or Streptococcus ruminantium cell, from a Bacillus cell, e.g., a Bacillus sp-63030 cell, from a Turicibacter cell, e.g., a Turicibacter sp. cell, from a Ureibacillus cell, e.g., a Ureibacillus thermosphaericus cell, from a Lentihominibacter cell, e.g., a Lentihominibacter hominis cell, from a Clostridia cell, from a Ruminococcus cell, e.g., a Ruminococcus sp. cell, from a Alicyclobacillus cell, e.g., a Alicyclobacillus sacchari cell, from a Enterococcus cell, e.g., a Enterococcus gilvus, a Enterococcus hermanniensis, or a Enterococcus asini cell, from a Companilacbtobacillus cell, e.g., a Companilacbtobacillus zhachilii, a Companilacbtobacillus halodurans, a Companilacbtobacillus keshanensis, a Companilacbtobacillus suantsaicola, or a Companilacbtobacillus hulinensis cell, from a Bombilactobacillus cell, e.g., a Bombilactobacillus apium cell, or from a Vagococcus cell, e.g., a Vagococcus penaei cell, preferably of a Enterococcus asini cell, a Enterococcus hermanniensis cell, a Vagococcus penaei cell, or a Lentihominibacter hominis cell.
12. The nuclease of any one of paragraphs 1-10, which is obtained from or obtainable from a Lactobacillus cell, e.g., Lactobacillus sp., Lactobacillus farciminis (DSM 20184), Lactobacillus farciminis, Lactobacillus murinus, Lactobacillus ruminis, Lactobacillus salivarius, Lactobacillus jensenii, Lactobacillus hamster, Lactobacillus delbrueckii, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus rhamnosus, or Lactobacillus gallinarum cell.
13. The nuclease of any one of paragraphs 1-12, comprising one or more functional RuvC domain.
14. The nuclease of any one of paragraphs 1-13, comprising one or more functional HNH domain.
15. The nuclease of any one of paragraphs 1-14, comprising one or more domain selected from the group consisting of:
(a) a RuvC domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318;
(b) a HNH domain having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319 or 320;
(c) a RuvC domain derived from any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318, by
substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318;
(d) a HNH domain derived from any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 144 - 156, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320; and
(e) a fragment of the catalytic domain of (a), (b), (c), or (d); preferably wherein the nuclease has nuclease activity, or wherein the nuclease has nickase activity.
16. The nuclease of any one of paragraphs 1-15, wherein the HNH domain has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320.
17. The nuclease of any one of paragraphs 1-16, wherein the HNH domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 144, 146, 145, 154, 319, or 320.
18. The nuclease of any one of paragraphs 1-17, wherein the HNH domain is a variant of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, comprising a substitution, such as a conservative amino acid substitution, a deletion, and/or an insertion at one or more positions.
19. The nuclease of any one of paragraphs 1-18, wherein the HNH domain differs from any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
20. The nuclease of any one of paragraphs 1-19, wherein the HNH domain is a fragment of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320, wherein the fragment preferably contains at least 20 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1), or at least 27 amino acid residues (e.g., amino acids 613 to 640 of SEQ ID NO: 1).
21. The nuclease of any one of paragraphs 1-20, wherein the HNH domain comprises, consists essentially of, or consists of any one of SEQ ID NOs: 144 - 156 or 319-320, preferably of SEQ ID NO: 144, 146, 145, 154, 319, or 320.
22. The nuclease of any one of paragraphs 1-21 , wherein the RuvC domain has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %,
at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
23. The nuclease of any one of paragraphs 1-22, wherein the RuvC domain comprises or consists of an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
24. The nuclease of any one of paragraphs 1-23, wherein the RuvC domain is a variant of any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318, comprising a substitution, such as a conservative amino acid substitution, a deletion, and/or an insertion at one or more positions.
25. The nuclease of any one of paragraphs 1-24, wherein the RuvC domain differs from any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108- 110, 135-137, or 313 - 318, by at most 15 amino acids, such as at most 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14 or 15 amino acids.
26. The nuclease of any one of paragraphs 1-25, wherein the RuvC domain is a fragment of any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318, wherein the fragment preferably contains at least 10 amino acid residues (e.g., amino acids 5 to 20 of SEQ ID NO: 1).
27. The nuclease of any one of paragraphs 1-26, wherein the RuvC domain comprises, consists essentially of, or consists of any one of SEQ ID NOs: 105 - 143 or 313 - 318, preferably of SEQ ID Nos: 105-107, 111-113, 108-110, 135-137, or 313 - 318.
28. The nuclease of any one of paragraphs 1-27, wherein the nuclease has double-strand break activity towards a DNA target site.
29. The nuclease of any one of paragraphs 1-28, wherein the nuclease comprises an amino acid substitution, insertion, or deletion in the one or more RuvC domain.
30. The nuclease of any one of paragraphs 1-29, wherein the nuclease comprises an amino acid substitution, insertion, or deletion in the one or more HNH domain.
31. The nuclease of any one of paragraphs 1-30, wherein the nuclease is a nickase having one or more inactivated RuvC domain created by an amino acid substitution, insertion, or deletion at a position provided for the nuclease in column 3 of Table 2.
32. The nuclease of any one of paragraphs 1-31 , wherein the nuclease is a nickase having one or more inactivated HNH domain created by an amino acid substitution, insertion or deletion at a position provided for the nuclease in column 3 of Table 3.
33. The nuclease of any one of paragraphs 1 -32, wherein the nuclease has a single-stranded break activity towards a DNA target site.
34. The nuclease of any one of paragraphs 1-32, wherein the nuclease is a catalytically dead nuclease.
35. The nuclease of paragraph 34, wherein the catalytically dead nuclease comprises one or more inactivated RuvC domain and one or more inactivated HNH domain.
36. The nuclease of any one of paragraphs 34-35, wherein the catallytically dead nuclease comprising one or more inactivated RuvC domain and one or more inactivated HNH domain is created by one or more amino acid substitution, deletion or insertion at the positions provided for the nuclease in column 3 of Table 2 or column 3 of Table 3.
37. The nuclease of any one of paragraphs 1-36, wherein sequence identity is determined by the method described in the definition section under “Sequence Identity”.
38. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a eukaryotic cell.
39. The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a mammalian cell, e.g., a non-human mammalian cell.
40. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a E. coli cell.
41. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus cell.
42. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell.
43. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus licheniformis cell.
44. The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a filamentous fungal cell.
45. The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in an Aspergillus niger cell.
46. The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in an Aspergillus oryzae cell.
47. The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Trichoderma reesei cell.
48. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Lactobacillus cell.
49. The nuclease of any one of paragraphs 1-37, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a probtiotic cell.
The nuclease of any one of paragraphs 1-38, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a S. cerevisiae cell.
50. The nuclease of any one of the preceding paragraphs, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in P. pastoris.
50b. The nuclease of any one of the preceding paragraphs, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in Lb. paracasei (Lacticaseibacillus paracasei or Lactobacillus paracasei).
50c. The nuclease of any one of the preceding paragraphs, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in S. thermophilus.
50d. The nuclease of any one of the preceding paragraphs, wherein polynucleotide encoding the nuclease is codon-optimized for expression in a E. coli cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 512- 520.
50e. The nuclease of any one of the preceding paragraphs, wherein polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus licheniformis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 405, 416, 417, 434, 449, 465, or 466.
50f. The nuclease of any one of the preceding paragraphs, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Aspergillus niger cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 347, 349, 351 , or 353.
50g. The nuclease of any one of the preceding paragraphs, wherein the polynucleotide encoding the nuclease is codon-optimized for expression in a Bacillus subtilis cell, wherein the polynucleotide comprises or consists of a sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the nucleotide sequence of any of SEQ ID NOs: 528, 549 or 550.
51. The nuclease of any one of paragraphs 1-50g, wherein the nuclease is a Class 2 Cas nuclease.
52. The nuclease of any one of paragraphs 1-51 , wherein the nuclease is a Class 2 Type II Cas nuclease.
53. The nuclease of any one of paragraphs 1-52, wherein the nuclease is a Class 2 Type-ll- A Cas nuclease.
54. The nuclease of any one of paragraphs 1-52, wherein the nuclease is a Class 2 Type-ll- B Cas nuclease.
55. The nuclease of any one of paragraphs 1-52, wherein the nuclease is a Class 2 Type II- C Cas nuclease.
56. The nuclease of any one of paragraphs 1-55, wherein the nuclease utilizes a protospacer adjacent motif (PAM) sequence provided for the nuclease in Table 1 .
57. The nuclease of any one of paragraphs 1-56, wherein the nuclease is non-naturally occurring, e.g., wherein the nuclease is engineered and comprises unnatural or synthetic amino acids.
58. The nuclease of any one of paragraphs 1-57, wherein the nuclease is naturally occuring.
59. A fusion polypeptide, comprising the Cas nuclease of any one of paragraphs 1-58, and one or more second polypeptide.
60. The fusion polypeptide of paragraph 59, wherein the one or more second polypeptide comprises a polypeptide that localizes to one or more subcellular organelles.
61. The fusion polypeptide according to any one of paragraphs 59-60, wherein one or more second polypeptide is a nuclear localization sequence (NLS), a cell penetrating peptide, and/or an affinity tag.
62. The fusion polypeptide according to any one of paragraphs 59-61 , wherein the fusion polypeptide comprises 1-10 or more NLS at or near the amino-terminus, 1-10 or more NLS at or near the carboxy-terminus, or a combination of 1-10 or more NLS at or near the amino-terminus and 1-10 or more NLS at or near the carboxy-terminus.
63. The fusion polypeptide according to any one of paragraphs 59-62, wherein the fusion polypeptide comprises 1-4 NLS.
64. The fusion polypeptide according to any one of paragraphs 59-63, wherein the fusion polypeptide comprises one NLS.
65. The fusion polypeptide according to any one of paragraphs 59-64, wherein the one or more NLS is located within the open-reading frame (ORF) of the nuclease.
66. The fusion polypeptide according to any one of paragraphs 59-65, wherein the one or more NLS are in tandem repeats.
67. The fusion polypeptide according to any one of paragraphs 59-66, wherein the fusion polypeptide comprises a first NLS and a second NLS.
68. The fusion polypeptide according to paragraph 67, wherein the fusion polypeptide comprises a linker sequence between the first NLS and the second NLS.
69. The fusion polypeptide according to paragraph 68, wherein the linker between the first NLS and the second NLS comprises at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 amino acids.
70. The fusion polypeptide according to any one of paragraphs 59-69, wherein the one or more second polypeptide comprises a base-editing polypeptide.
71. The fusion polypeptide according to any one of paragraphs 59-70, wherein the baseediting polypeptide comprises a base editor domain.
72. The fusion polypeptide according to any one of paragraphs 59-71 , wherein the fusion polypeptide comprises a linker between the Cas nuclease and the base-editing polypeptide.
73. The fusion polypeptide according to any one of paragraphs 59-72, wherein the baseediting polypeptide comprises a deaminase, e.g., a cytidine deaminase, such as a APOBEC3A deaminase, or an adenosine deaminase.
74. The fusion polypeptide according to any one of paragraphs 59-73, wherein the one or more second polypeptide comprises a reverse transcriptase, the reverse transcriptase preferably comprising a reverse transcriptase domain.
75. The fusion polypeptide according to any one of paragraphs 59-74, wherein the nuclease is fused to one or more NLS of sufficient strength to drive accumulation of a CRISPR complex comprising the Cas nuclease in a detectable amount in the nucleus of a eukaryotic cell.
76. The nuclease or fusion polypeptide according to any one of paragraphs 1-75, which is isolated.
77. The nuclease or fusion polypeptide according to any one of paragraphs 1-76, which is purified.
78. The nuclease or fusion polypeptide according to any one of paragraphs 1-77, wherein sequence identity is determined by the method described in the definition section under “Sequence Identity”.
79. A non-naturally occuring composition comprising (i) the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, and/or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78.
80. The composition according to paragraph 79, wherein the nucleic acid molecule is a chemically modified nucleic acid molecule.
81. The composition according to any one of paragraphs 79-80, wherein the nucleic acid molecule is DNA.
82. The composition according to any one of paragraphs 79-81 , wherein the nucleic acid molecule is RNA.
83. The composition according to any one of paragraphs 79-82, wherein the RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
84. The composition according to any one of paragraphs 79-83, wherein the ORF consists of nucleosides selected from adenosine, a modified adenosine, uridine, a modified uridine, guanosine, a modified guanosine, cytidine, and a modified cytidine.
85. The composition according to any one of paragraphs 79-84, wherein the ORF consists of nucleosides selected from adenosine, uridine, a modified uridine, guanosine, and cytidine.
86. The composition according to any one of paragraphs 79-86, wherein the nucleic acid molecule is linear.
87. The composition according to any one of paragraphs 79-85, wherein the nucleic acid molecule is circular.
88. The composition according to any one of paragraphs 79-87, further comprising one or more RNA molecules, or a DNA polynucleotide encoding one or more of the one or more RNA molecules, wherein the one or more RNA molecules and the Cas nuclease or fusion polypeptide do not naturally occur together, and the one or more RNA molecules are configured to form a complex with the Cas nuclease or fusion polypeptide and/or target the complex to a target site.
89. The composition according to any one of paragraphs 79-88, wherein the one or more RNA molecule comprises a guide RNA (gRNA), which gRNA is comprising a CRISPR RNA (crRNA) and a trans activating RNA (tracrRNA).
90. The composition according to any one of paragraphs 79-89, wherein the one or more RNA molecule is a single-molecule RNA (sgRNA), e.g., wherein the crRNA and the tracrRNA are part of the same RNA molecule.
91. The composition according to any one of paragraphs 79-89, wherein the one or more RNA molecule is a dual-molecule RNA, e.g., wherein the crRNA and the tracrRNA are separate RNA molecules.
92. The composition according to any one of paragraphs 79-91 , further comprising a donor template for homology directed repair (HDR).
93. The composition according to any one of paragraphs 79-92, wherein the sequence encoding the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO:
94. SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, preferably of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81 , or any of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512- 520,528, 549 or 550.
94. The composition according to any one of paragraphs 79-93, wherein the one or more RNA molecule comprises a trans activating RNA (tracrRNA) sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ I D NO: 157 - 208, preferably any of SEQ NO: 157, SEQ I D NO: 177, SEQ I D NO: 196, SEQ ID NO: 195, SEQ ID NO: 204, or SEQ ID NO: 185.
95. The composition according to any one of paragraphs 79-93, wherein at least one of the one or more RNA molecule comprises a CRISPR RNA (crRNA) molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID Nos: 209 - 260, preferably any of SEQ ID NO: 209, SEQ ID NO: 229, SEQ ID NO: 248, SEQ ID NO: 247, SEQ ID NO: 256, or SEQ ID NO: 237 .
96. The composition according to any one of paragraphs 79-95, wherein at least one of the one or more RNA molecule comprises or consists of a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of SEQ ID NOs: 261 - 312, preferably of any of SEQ ID NO: 261 , SEQ ID NO: 281 , SEQ ID NO: 300, SEQ ID NO: 299, SEQ ID NO: 308, or SEQ ID NO: 289.
97. The composition according to any one of paragraphs 79-96, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule is a RNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any polynucleotide sequence of column 4 in Table 4, e.g. any one of SEQ ID NOs: 261 - 312, preferably of any of SEQ ID NO: 261 , SEQ ID NO: 281 , SEQ ID NO: 300, SEQ ID NO: 299, SEQ ID NO: 308, or SEQ ID NO: 289.
98. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity
to the amino acid sequence of SEQ ID NO: 1 , and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 209.
98a. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21 , and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 229.
98b. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 248.
98c. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 247.
98d. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%,
at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 256.
98e. The composition according to any one of paragraphs 79-97, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 29, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 237.
99. The composition according to any one of paragraphs 79-98e, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a crRNA molecule comprising a guide sequence portion and a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any of the polynucleotide sequences of column 2 in Table 4, e.g., any one of SEQ ID NOs: 209 - 260, preferably any of SEQ ID NO: 209, SEQ ID NO: 229, SEQ ID NO: 248, SEQ ID NO: 247, SEQ ID NO: 256, or SEQ ID NO: 237.
100. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 1 , and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least
98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 157.
100a. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 21 , and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 177.
100b. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 40, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 196.
100c. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 39, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 195.
100d. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 48, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 204.
100e. The composition according to any one of paragraphs 79-99, wherein the Cas nuclease or
fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the amino acid sequence of SEQ ID NO: 29, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to the polynucleotide sequence of SEQ ID NO: 185.
101. The composition according to any one of paragraphs 79-1 OOe, wherein the Cas nuclease or fusion polypeptide comprises a sequence having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any amino acid sequence of column 1 in Table 4, and the at least one RNA molecule comprises a tracrRNA molecule comprising a sequence encoded by a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to any of the polynucleotide sequences of column 3 in Table 4, e.g., to any one of SEQ ID NOs: 157 - 208, preferably any one of SEQ ID NOs: 157, 177, 196, 195, 204, or 185-.
102. The composition according to any one of paragraphs 79-101 , wherein the composition further comprises a base editor enzyme.
103. The composition according to any one of paragraphs 79-102, wherein the base editor enzyme is an adenosine deaminase or a cytidine deaminase.
104. The composition according to any one of paragraphs 79-103, wherein the composition further comprises a reverse transcriptase enzyme.
105. A method of modifying a nucleotide sequence at a DNA target site in the genome of a cell, comprising introducing into the cell the Cas nuclease or fusion polypeptide according to any one of paragraphs 1-78, a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, and/or the composition of any one of paragraphs 79-104.
106. The method according to paragraph 105, wherein the method comprises introducing a DNA-break at the DNA target site.
107. The method according to any one of paragraphs 105-106, wherein the DNA-break is a single-strand break.
108. The method according to any one of paragraphs 105-106, wherein the DNA-break is a double-strand break.
109. The method according to any one of paragraphs 105-108, wherein the method is carried out under conditions that are permissive for non-homologous end joining (NHEJ), and/or homology-directed repair (HDR).
110. The method according to any one of paragraphs 105-109, wherein the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a PAM sequence, e.g., adjacent to the PAM sequence “nnRH”, “nnAY”, “nnGHMA”, “nnGTA”, “nnAMA”, or “nnRHRD”, or adjacent to any one of the PAM sequences mentioned in Table 1.
111. The method according to any one of paragraphs 105-110, wherein the Cas nuclease or fusion polypeptide effects a DNA-break in a DNA strand adjacent to a sequence that is complementary to the PAM sequence.
112. The method according to any one of paragraphs 105-111, wherein the target site is within a coding region of a protein.
113. The method according to any one of paragraphs 105-111, wherein the target site is within a non-coding region of a protein.
114. The method according to any one of paragraphs 105-111, wherein the target site is within a regulatory region of a protein, e.g., a promoter.
115. The method according to any one of paragraphs 105-114, wherein the cell is a eukaryotic cell.
116. The method according to any one of paragraphs 105-114, wherein the cell is a prokaryotic cell.
117. The method according to any one of paragraphs 105-114, wherein the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
118. The method according to any one of paragraphs 105-114, wherein the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
118a. The method according to paragraph 118, wherein the fungal cell is a Pichia cell, e.g., a Pichia pastoris cell.
119. The method according to any one of paragraphs 105-114, wherein the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
120. The method according to any one of paragraphs 105-114, wherein the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori,
Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Tra metes villosa, Tra metes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
121. The method according to paragraph 120, wherein the cell is a Trichoderma cell.
122. The method according to paragraph 121 , wherein the cell is a Trichoderma reesei cell.
123. The method according to paragraph 120, wherein the cell is an Aspergillus cell.
124. The method according to paragraph 123, wherein the cell is an Aspergillus niger cell.
125. The method according to paragraph 123, wherein the cell is an Aspergillus oryzae cell.
126. The method according to any one of paragraphs 105-115, wherein the cell is a plant cell.
127. The method according to paragraph 126, wherein the plant cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
128. The method according to paragraph 116, wherein the cell is a prokaryotic cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Lacticaseibacillus casei, Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus helveticus, Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus
amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
129. The method according to paragraph 128, wherein the cell is a Bacillus cell.
130. The method according to paragraph 128, wherein the cell is a Bacillus subtilis cell.
131. The method according to paragraph 128, wherein the cell is a Bacillus licheniformis cell.
131a. The method according to paragraph 128, wherein the cell is a Lacticaseibacillus paracesei cell.
131b. The method according to paragraph 128, wherein the cell is a Streptococcus thermophilus cell.
131c. The method according to paragraph 128, wherein the cell is a E. coli cell.
132. A polynucleotide encoding the Cas nuclease or fusion polypeptide according to any one of paragraphs 1-78.
133. The polynucleotide of paragraph 132, which comprises or consists of a polynucleotide having at least 60%, e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polypeptide coding sequence of SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71 , SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81 , SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91 , SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101 , SEQ ID NO: 102, SEQ ID NO: 103, or SEQ ID NO: 104, preferably of SEQ ID NO: 53, SEQ ID NO: 73, SEQ ID NO: 92, SEQ ID NO: 91 , SEQ ID NO: 100, or SEQ ID NO: 81 , or any of SEQ ID NOs: 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550.
134. The polynucleotide according to any one of paragraphs 132-133, wherein the polynucleotide is a chemically modified nucleic acid molecule.
135. The polynucleotide according to any one of paragraphs 132-134, wherein the polynucleotide is DNA.
136. The polynucleotide according to any one of paragraphs 132-134, wherein polynucleotide is RNA.
137. The polynucleotide according to paragraph 136, wherein the RNA is an mRNA comprising one or more of a 5’ untranslated regions (UTR), an open reading frame (ORF) encoding the Cas nuclease or fusion polypeptide, a 3’IITR, and a poly-adenylyl (polyA) tail.
138. The polynucleotide according to paragraph 137, wherein the ORF consists of nucleosides selected from adenosine, a modified adenosine, uridine, a modified uridine, guanosine, a modified guanosine, cytidine, and a modified cytidine.
139. The polynucleotide according to any one of paragraphs 137-138, wherein the ORF consists of nucleosides selected from adenosine, uridine, a modified uridine, guanosine, and cytidine.
140. The polynucleotide according to any one of paragraphs 132-139, wherein the polynucleotide is linear.
141. The polynucleotide according to any one of paragraphs 132-139, wherein the polynucleotide is circular.
142. The polynucleotide according to any one of paragraphs 132-141 , wherein the poly-A sequence comprises non-adenine nucleotides.
143. The polynucleotide according to any one of paragraphs 132-142, wherein the poly-A sequence comprises 100-400 nucleotides.
144. The polynucleotide according to any one of paragraphs 132-143, wherein the polynucleotide is operably linked to one or more heterologous control sequence.
145. The polynucleotide according to any one of paragraphs 132-144, wherein the heterologous control sequence is a heterologous promoter.
146. The polynucleotide according to any one of paragraphs 132-145, which is isolated.
147. The polynucleotide according to any one of paragraphs 132-146, which is purified.
148. A nucleic acid construct or expression vector comprising the polynucleotide according to any one of paragraphs 132-147, operably linked to one or more control sequences that direct the production of the nuclease or fusion polypeptide in a cell.
149. A cell comprising the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, a polynucleotide of any one of paragraphs 132-147, the nucleic acid construct or expression vector of paragraph 148, or the composition of any one of paragraphs 79-104.
150. The cell according to paragraph 149, wherein the cell is a recombinant cell.
151. The cell according to any one of paragraphs 149-150, wherein the Cas nuclease is heterologous to the cell.
152. The cell according to any one of paragraphs 149-151 , wherein the cell comprises at least two copies, e.g., three, four, or five or more copies of the polynucleotide or vector or construct of any one of paragraphs 132-148.
153. The cell according to any one of paragraphs 149-152, wherein the genome of the cell comprises a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of
paragraphs 1-78, a polynucleotide of any one of paragraphs 132-147, or a nucleic acid construct or expression vector of paragraph 148.
154. The cell according to any one of paragraphs 149-153, wherein the genome of the recombinant cell comprises at least two copies, e.g., three, four, or five, or more copies of a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, of a polynucleotide of any one of paragraphs 132-147, or of a nucleic acid construct or expression vector of any one of paragraph 148.
155. A cell comprising a genome modified by the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, by a polynucleotide encoding the Cas nuclease or fusion polypeptide of any one of paragraphs 1-78, by the composition of any one of paragraphs 79-104, by the polynucleotide of any one of paragaphs 132-147, by the nucleic acid construct or expression vector of paragraph 148, and/or by the method of any one of paragraphs 105-131.
156. The cell according to any one of paragraphs 149-155, wherein the cell is a recombinant cell.
157. The cell of any one of paragraphs 149-156, wherein the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a non-human mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
158. The cell of any one of paragraphs 149-157, wherein the cell is a eukaryotic cell.
159. The cell of any one of paragraphs 149-157, wherein the cell is a prokaryotic cell.
160. The cell of paragraph 158, wherein the cell is a eukaryotic cell, such as a mammalian cell, a human cell, or a non-human mammalian cell, e.g., a BHK cell, a CHO cell, a mouse cell, a hamster cell, or a rat cell.
161. The cell of any one of paragraphs 149-160, wherein the cell is a fungal cell, such as a filmentous fungal cell, or a yeast cell.
161a. The cell of paragraph 161 , wherein the fungal cell is a Pichia pastoris cell.
161 b. The cell of paragraph 161 , wherein the fungal cell is a Saccharomyces cerevisiae cell.
162. The cell of paragraph 161 , wherein the cell is a yeast cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
163. The cell of paragraph 161 , wherein the cell is a filamentous fungal cell e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete,
Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
164. The cell of paragraph 163, wherein the cell is a Trichoderma cell.
165. The cell of paragraph 163, wherein the cell is a Trichoderma reesei cell.
166. The cell of paragraph 163, wherein the cell is an Aspergillus cell.
167. The cell of paragraph 163, wherein the cell is an Aspergillus niger cell.
168. The cell of paragraph 163, wherein the cell is an Aspergillus oryzae cell.
169. The cell of paragraph 158, wherein the cell is a plant cell.
170. The cell of paragraph 169, wherein the cell is one or more of a maize, rice, sorghum, rye, barley, wheat, millet, oats, sugarcane, turfgrass, switchgrass, soybean, canola, alfalfa, sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, vegetable, or safflower cell.
171. The cell according to paragraph 159, wherein the cell is a prokaryotic cell, e.g., a Grampositive cell selected from the group consisting of Bacillus, Clostridium, Corynebacterium, Enterococcus, Geobacillus, Lactobacillus, Lacticaseibacillus, Lactiplantibacillus, Levilactobacillus, Ugilactobacillus, Umosilactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-negative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Lacticaseibacillus casei, Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum, Levilactobacillus brevis, Ugilactobacillus salivarius, Umosilactobacillus fermentum, Umosilactobacillus reuteri, Lactobacillus acidophilus,
Lactobacillus bulgaricus, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus johnsonii,
Lactobacillus helveticus, Corynebacterium glutamicum, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
171a. The cell according to paragraph 171 , wherein the cell is a Lacticaseibacillus paracesei cell. 171 b. The cell according to paragraph 171 , wherein the cell is a Streptococcus thermophilus cell. 171c. The cell according to paragraph 171 , wherein the cell is a E. coli cell.
172. The cell of paragraph 171 , wherein the cell is a Bacillus cell.
173. The cell of paragraph 171 , wherein the cell is a Bacillus subtilis cell.
174. The cell of paragraph 171 , wherein the cell is a Bacillus licheniformis cell.
175. The cell of any one of paragraphs 149-174, which is isolated.
176. The cell of any one of paragraphs 149-175, which is purified.
177. A method of producing a Cas nuclease or fusion polypeptide comprising cultivating the recombinant host cell of any one of paragraphs 149-176 under conditions conducive for production of the Cas nuclease or fusion polypeptide.
178. The method of paragraph 177, further comprising recovering the Cas nuclease or the fusion polypeptide.
179. Use of the Cas nuclease according to any one of paragraphs 1 -58, the fusion polypeptide of any one of paragraphs 59-78, the composition according to any one of paragraphs 79-104, the polynucleotide according to any one of paragraphs 132-147, or the nucleic acid construct or expression vector according to paragraph 148, for modifying a target sequence in a targeted cell.
180. Use of the Cas nuclease according to any one of paragraphs 1-58, the fusion polypeptide of any one of paragraphs 59-78, the composition according to any one of paragraphs 79-104, the polynucleotide according to any one of paragraphs 132-147, the nucleic acid construct or expression vector according to paragraph 148, or the cell according to any one of paragraphs 149-176, for the manufacture of a medicament for modifying a target sequence in a targeted cell.
181. Use according to any one of paragraphs 179-180, wherein the targeted cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, a non-human animal cell, an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a non-human mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
182. A formulation comprising (i) the Cas nuclease according to any one of paragraphs 1-58, the fusion polypeptide according to any one of paragraphs 59-78, a composition according to any one of paragraphs 79-104, the polynucleotide according to any one of paragraphs 132-147, the
nucleic acid construct or expression vector according to paragraph 148, or the cell according to any one of paragraphs 149-176, and optionally, (ii) one or more of a lipid, a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle.
183. The formulation of paragraph 182, wherein the lipid is a lipid nanoparticle. 184. The formulation according to any one of paragraphs 182-183, wherein the Cas nuclease or fusion polypeptide is in a lyophilized formulation.
185. The formulation according to any one of paragraphs 182-184, wherein the Cas nuclease or fusion polypeptide is in a liquid formulation.
186. The formulation according to any one of paragraphs 182-185, wherein the Cas nuclease or fusion polypeptide is in a substantially endotoxin-free formulation.
Claims
1 . A Cas nuclease selected from the group consisting of:
(a) a polypeptide having at least 70% sequence identity to any of the amino acid sequences of SEQ ID NOs: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52;
(b) a polypeptide encoded by a polynucleotide having at least 70% sequence identity to any of the polypeptide coding sequences of SEQ ID NOs: 73, 100, 53, 92, 91 , or 81 , or to any one of SEQ ID NOs: 53-104, 347, 349, 351 , 353, 405, 416, 417, 434, 449, 465, 466, 512-520,528, 549 or 550;
(c) a polypeptide derived from any one of SEQ ID NOs: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations), in particular substitutions, such as conservative amino acid substitutions;
(d) a polypeptide having a TM-score of at least 0.80 compared to the three- dimensional structure of the polypeptide of any one of SEQ ID Nos: 21 , 48, 1 , 40, 39, 29, 2-20, 22-28, 30-38, 41-47, or 49-52, wherein the three-dimensional structure is calculated using Alphafold;
(e) a polypeptide derived from the polypeptide of (a), (b), (c), or (d), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and
(f) a fragment of the polypeptide of (a), (b), (c), (d), or (e).
2. The nuclease according to claim 1 , wherein the nuclease comprises one or more domain selected from the group consisting of:
(a) a RuvC domain having at least 80% sequence identity to any one of the amino acid sequences of SEQ ID NOs: 105 - 143 or 313 - 318;
(b) a HNH domain having at least 80% sequence identity to any one of the amino acid sequences of SEQ ID NOs: 144 - 156 or 319-320;
(c) a RuvC domain derived from any one of SEQ ID NOs: 105 - 143 or 313 - 318 by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 105 - 143 or 313 - 318;
(d) a HNH domain derived from any one of SEQ ID NOs: 144 - 156 or 319-320 by substitution, deletion or addition of one or several amino acids of SEQ ID NOs: 144 - 156; and
(e) a fragment of the catalytic domain of (a), (b), (c), or (d).
3. The nuclease of any one of claims 1-2, wherein the nuclease is a nickase having one or more inactivated RuvC domain created by an amino acid substitution, insertion, or deletion at a position provided for the nuclease in column 3 of Table 2.
4. The nuclease of any one of claims 1-3, wherein the nuclease is a nickase having one or more inactivated HNH domain created by an amino acid substitution, insertion or deletion at a position provided for the nuclease in column 3 of Table 3.
5. The nuclease of any one of claims 1-4, wherein the nuclease is a nickase and has a single-stranded break activity towards a DNA target site.
6. The nuclease of any one of claims 1-2, wherein the nuclease is a catalytically dead nuclease.
7. The nuclease of any one of claims 1-6, wherein the nuclease is a Class 2 Type II Cas nuclease.
8. The nuclease of any one of claims 1-7, wherein the nuclease utilizes a protospacer adjacent motif (PAM) sequence provided for the nuclease in Table 1 .
9. A non-naturally occurring composition comprising (i) the Cas nuclease of any one of claims 1-8, or (ii) a nucleic acid molecule comprising a sequence encoding the Cas nuclease of any one of claims 1-8.
10. The composition according to claim 9, further comprising one or more RNA molecules, or a DNA polynucleotide encoding one or more of the one or more RNA molecules, wherein the one or more RNA molecules and the Cas nuclease do not naturally occur together, and the one or more RNA molecules are configured to form a complex with the Cas nuclease and/or target the complex to a target site.
11. The composition according to any one of claims 9-10, wherein the one or more RNA molecule comprises a guide RNA (gRNA), which gRNA is comprising a CRISPR RNA (crRNA) and a trans-activating RNA (tracrRNA).
12. A method of modifying a nucleotide sequence at a DNA target site in the genome of a cell, comprising introducing into the cell the Cas nuclease according to any one of claims 1-8, a polynucleotide encoding the Cas nuclease of any one of claims 1-8, and/or the composition of any one of claims 9-11 .
13. A polynucleotide encoding the Cas nuclease according to any one of claims 1-8.
14. A nucleic acid construct or expression vector comprising the polynucleotide according to claim 13, operably linked to one or more control sequences that direct the production of the Cas nuclease in a cell.
15. A cell comprising the Cas nuclease of any one of claims 1 -8, the polynucleotide according to claim 13, the nucleic acid construct or expression vector according to claim 14, or the composition of any one of claims 9-11.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363612593P | 2023-12-20 | 2023-12-20 | |
| US63/612,593 | 2023-12-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025132815A1 true WO2025132815A1 (en) | 2025-06-26 |
Family
ID=94278841
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/087441 Pending WO2025132815A1 (en) | 2023-12-20 | 2024-12-19 | Novel cas nucleases and polynucleotides encoding the same |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025132815A1 (en) |
Citations (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0238023A2 (en) | 1986-03-17 | 1987-09-23 | Novo Nordisk A/S | Process for the production of protein products in Aspergillus oryzae and a promoter for use in Aspergillus |
| WO1992006204A1 (en) | 1990-09-28 | 1992-04-16 | Ixsys, Inc. | Surface expression libraries of heteromeric receptors |
| EP0506780A1 (en) | 1989-12-18 | 1992-10-07 | Novo Nordisk As | STABLE INTEGRATION OF DNA IN BACTERIAL GENOMES. |
| US5223409A (en) | 1988-09-02 | 1993-06-29 | Protein Engineering Corp. | Directed evolution of novel binding proteins |
| US5244797A (en) | 1988-01-13 | 1993-09-14 | Life Technologies, Inc. | Cloned genes encoding reverse transcriptase lacking RNase H activity |
| WO1994014968A1 (en) | 1992-12-22 | 1994-07-07 | Novo Nordisk A/S | Dna amplification |
| WO1994025612A2 (en) | 1993-05-05 | 1994-11-10 | Institut Pasteur | Nucleotide sequences for the control of the expression of dna sequences in a cellular host |
| WO1995017413A1 (en) | 1993-12-21 | 1995-06-29 | Evotec Biosystems Gmbh | Process for the evolutive design and synthesis of functional polymers based on designer elements and codes |
| WO1995022625A1 (en) | 1994-02-17 | 1995-08-24 | Affymax Technologies N.V. | Dna mutagenesis by random fragmentation and reassembly |
| WO1995033836A1 (en) | 1994-06-03 | 1995-12-14 | Novo Nordisk Biotech, Inc. | Phosphonyldipeptides useful in the treatment of cardiovascular diseases |
| US6100063A (en) | 1998-02-12 | 2000-08-08 | Novo Nordisk A/S | Procaryotic cell comprising at least two copies of a gene |
| WO2008066931A2 (en) | 2006-11-29 | 2008-06-05 | Novozymes, Inc. | Bacillus licheniformis chromosome |
| US20080159996A1 (en) | 2006-05-25 | 2008-07-03 | Dale Ando | Methods and compositions for gene inactivation |
| EP2029732B1 (en) | 2006-05-31 | 2009-09-23 | Novozymes A/S | Chloramphenicol resistance selection in bacillus licheniformis |
| US20100047805A1 (en) | 2008-08-22 | 2010-02-25 | Sangamo Biosciences, Inc. | Methods and compositions for targeted single-stranded cleavage and targeted integration |
| US20100218264A1 (en) | 2008-12-04 | 2010-08-26 | Sangamo Biosciences, Inc. | Genome editing in rats using zinc-finger nucleases |
| US20100291048A1 (en) | 2009-03-20 | 2010-11-18 | Sangamo Biosciences, Inc. | Modification of CXCR4 using engineered zinc finger proteins |
| US20110207221A1 (en) | 2010-02-09 | 2011-08-25 | Sangamo Biosciences, Inc. | Targeted genomic modification with partially single-stranded donor molecules |
| US20110265198A1 (en) | 2010-04-26 | 2011-10-27 | Sangamo Biosciences, Inc. | Genome editing of a Rosa locus using nucleases |
| US20110281361A1 (en) | 2005-07-26 | 2011-11-17 | Sangamo Biosciences, Inc. | Linear donor constructs for targeted integration |
| US8110379B2 (en) | 2007-04-26 | 2012-02-07 | Sangamo Biosciences, Inc. | Targeted integration into the PPP1R12C locus |
| US20130122591A1 (en) | 2011-10-27 | 2013-05-16 | The Regents Of The University Of California | Methods and compositions for modification of the hprt locus |
| US20130177942A1 (en) | 2006-11-29 | 2013-07-11 | Novozymes, Inc. | Methods of Improving the Introduction of DNA into Bacterial Cells |
| US20130177983A1 (en) | 2011-09-21 | 2013-07-11 | Sangamo Bioscience, Inc. | Methods and compositions for regulation of transgene expression |
| WO2014052630A1 (en) | 2012-09-27 | 2014-04-03 | Novozymes, Inc. | Bacterial mutants with improved transformation efficiency |
| WO2018172556A1 (en) * | 2017-03-24 | 2018-09-27 | Curevac Ag | Nucleic acids encoding crispr-associated proteins and uses thereof |
| US20190330620A1 (en) | 2016-10-14 | 2019-10-31 | Emendobio Inc. | Rna compositions for genome editing |
| US20200085066A1 (en) | 2015-05-06 | 2020-03-19 | Snipr Technologies Limited | Altering microbial populations & modifying microbiota |
| US20200109398A1 (en) | 2018-08-28 | 2020-04-09 | Flagship Pioneering, Inc. | Methods and compositions for modulating a genome |
| WO2020191234A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2021183622A1 (en) | 2020-03-12 | 2021-09-16 | Novozymes A/S | Crispr-aid using catalytically inactive rna-guided endonuclease |
| WO2023092132A1 (en) * | 2021-11-22 | 2023-05-25 | Mammoth Biosciences, Inc. | Effector proteins and uses thereof |
| WO2024133344A1 (en) | 2022-12-20 | 2024-06-27 | Novozymes A/S | A method for providing a candidate biological sequence and related electronic device |
-
2024
- 2024-12-19 WO PCT/EP2024/087441 patent/WO2025132815A1/en active Pending
Patent Citations (45)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0238023A2 (en) | 1986-03-17 | 1987-09-23 | Novo Nordisk A/S | Process for the production of protein products in Aspergillus oryzae and a promoter for use in Aspergillus |
| US5244797B1 (en) | 1988-01-13 | 1998-08-25 | Life Technologies Inc | Cloned genes encoding reverse transcriptase lacking rnase h activity |
| US5244797A (en) | 1988-01-13 | 1993-09-14 | Life Technologies, Inc. | Cloned genes encoding reverse transcriptase lacking RNase H activity |
| US5223409A (en) | 1988-09-02 | 1993-06-29 | Protein Engineering Corp. | Directed evolution of novel binding proteins |
| EP0506780A1 (en) | 1989-12-18 | 1992-10-07 | Novo Nordisk As | STABLE INTEGRATION OF DNA IN BACTERIAL GENOMES. |
| WO1992006204A1 (en) | 1990-09-28 | 1992-04-16 | Ixsys, Inc. | Surface expression libraries of heteromeric receptors |
| WO1994014968A1 (en) | 1992-12-22 | 1994-07-07 | Novo Nordisk A/S | Dna amplification |
| WO1994025612A2 (en) | 1993-05-05 | 1994-11-10 | Institut Pasteur | Nucleotide sequences for the control of the expression of dna sequences in a cellular host |
| WO1995017413A1 (en) | 1993-12-21 | 1995-06-29 | Evotec Biosystems Gmbh | Process for the evolutive design and synthesis of functional polymers based on designer elements and codes |
| WO1995022625A1 (en) | 1994-02-17 | 1995-08-24 | Affymax Technologies N.V. | Dna mutagenesis by random fragmentation and reassembly |
| WO1995033836A1 (en) | 1994-06-03 | 1995-12-14 | Novo Nordisk Biotech, Inc. | Phosphonyldipeptides useful in the treatment of cardiovascular diseases |
| US6100063A (en) | 1998-02-12 | 2000-08-08 | Novo Nordisk A/S | Procaryotic cell comprising at least two copies of a gene |
| US20110281361A1 (en) | 2005-07-26 | 2011-11-17 | Sangamo Biosciences, Inc. | Linear donor constructs for targeted integration |
| US20080159996A1 (en) | 2006-05-25 | 2008-07-03 | Dale Ando | Methods and compositions for gene inactivation |
| US7951925B2 (en) | 2006-05-25 | 2011-05-31 | Sangamo Biosciences, Inc. | Methods and compositions for gene inactivation |
| EP2029732B1 (en) | 2006-05-31 | 2009-09-23 | Novozymes A/S | Chloramphenicol resistance selection in bacillus licheniformis |
| US20130177942A1 (en) | 2006-11-29 | 2013-07-11 | Novozymes, Inc. | Methods of Improving the Introduction of DNA into Bacterial Cells |
| WO2008066931A2 (en) | 2006-11-29 | 2008-06-05 | Novozymes, Inc. | Bacillus licheniformis chromosome |
| US8110379B2 (en) | 2007-04-26 | 2012-02-07 | Sangamo Biosciences, Inc. | Targeted integration into the PPP1R12C locus |
| US20100047805A1 (en) | 2008-08-22 | 2010-02-25 | Sangamo Biosciences, Inc. | Methods and compositions for targeted single-stranded cleavage and targeted integration |
| US20100218264A1 (en) | 2008-12-04 | 2010-08-26 | Sangamo Biosciences, Inc. | Genome editing in rats using zinc-finger nucleases |
| US20100291048A1 (en) | 2009-03-20 | 2010-11-18 | Sangamo Biosciences, Inc. | Modification of CXCR4 using engineered zinc finger proteins |
| US20110207221A1 (en) | 2010-02-09 | 2011-08-25 | Sangamo Biosciences, Inc. | Targeted genomic modification with partially single-stranded donor molecules |
| US20120017290A1 (en) | 2010-04-26 | 2012-01-19 | Sigma Aldrich Company | Genome editing of a Rosa locus using zinc-finger nucleases |
| US20110265198A1 (en) | 2010-04-26 | 2011-10-27 | Sangamo Biosciences, Inc. | Genome editing of a Rosa locus using nucleases |
| US20130177983A1 (en) | 2011-09-21 | 2013-07-11 | Sangamo Bioscience, Inc. | Methods and compositions for regulation of transgene expression |
| US20130177960A1 (en) | 2011-09-21 | 2013-07-11 | Sangamo Biosciences, Inc. | Methods and compositions for regulation of transgene expression |
| US20130122591A1 (en) | 2011-10-27 | 2013-05-16 | The Regents Of The University Of California | Methods and compositions for modification of the hprt locus |
| US20130137104A1 (en) | 2011-10-27 | 2013-05-30 | The Regents Of The University Of California | Methods and compositions for modification of the hprt locus |
| WO2014052630A1 (en) | 2012-09-27 | 2014-04-03 | Novozymes, Inc. | Bacterial mutants with improved transformation efficiency |
| US20200085066A1 (en) | 2015-05-06 | 2020-03-19 | Snipr Technologies Limited | Altering microbial populations & modifying microbiota |
| US20190330620A1 (en) | 2016-10-14 | 2019-10-31 | Emendobio Inc. | Rna compositions for genome editing |
| WO2018172556A1 (en) * | 2017-03-24 | 2018-09-27 | Curevac Ag | Nucleic acids encoding crispr-associated proteins and uses thereof |
| US20200109398A1 (en) | 2018-08-28 | 2020-04-09 | Flagship Pioneering, Inc. | Methods and compositions for modulating a genome |
| WO2020191234A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191248A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Method and compositions for editing nucleotide sequences |
| WO2020191245A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191243A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191241A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191239A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191233A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2020191246A1 (en) | 2019-03-19 | 2020-09-24 | The Broad Institute, Inc. | Methods and compositions for editing nucleotide sequences |
| WO2021183622A1 (en) | 2020-03-12 | 2021-09-16 | Novozymes A/S | Crispr-aid using catalytically inactive rna-guided endonuclease |
| WO2023092132A1 (en) * | 2021-11-22 | 2023-05-25 | Mammoth Biosciences, Inc. | Effector proteins and uses thereof |
| WO2024133344A1 (en) | 2022-12-20 | 2024-06-27 | Novozymes A/S | A method for providing a candidate biological sequence and related electronic device |
Non-Patent Citations (76)
| Title |
|---|
| "Soc. App. Bacteriol. Symposium Series", 1980 |
| BERGER ET AL., BIOCHEMISTRY, vol. 22, 1983, pages 2365 - 2372 |
| BERNHARD ET AL., J BACTERIOL., vol. 133, 1978, pages 897 - 903 |
| BIKARD ET AL., NATURE BIOTECHNOLOGY, vol. 32, no. 11, 2014, pages 1146 - 1150 |
| BOWIESAUER, PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 2152 - 2156 |
| BURKE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, 2001, pages 6289 - 6294 |
| CHANGWILSON, PROC. NATL. ACAD. SCI. USA, 1987 |
| CHOI ET AL., J. MICROBIOL. METHODS, vol. 64, 2006, pages 391 - 397 |
| CHRISTENSEN ET AL., BIOLTECHNOLOGY6, 1988, pages 1419 - 1422 |
| COOPER ET AL., EMBO J., vol. 12, 1993, pages 2575 - 2583 |
| DATABASE Geneseq [online] 15 November 2018 (2018-11-15), "CRISPR-mediated genome editing related Cas9 protein, SEQ:645.", retrieved from EBI accession no. GS_PROT:BFS13816 Database accession no. BFS13816 * |
| DATABASE Geneseq [online] 20 July 2023 (2023-07-20), "Nucleic acid detection related cas effector protein, SEQ 28.", retrieved from EBI accession no. GS_PROT:BNA26376 Database accession no. BNA26376 * |
| DATABASE PROTEIN [online] 2 June 2024 (2024-06-02), HAFT, D.H. ET AL.: "A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes", XP002813179, retrieved from NCBI accession no. REFSEQ:WP_329384251 Database accession no. WP_329384251 * |
| DATABASE PROTEIN [online] 2 June 2024 (2024-06-02), HAFT,D.H. ET AL.: "A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes", XP002813180, retrieved from NCBI accession no. REFSEQ:WP_138269342 Database accession no. WP_138269342 * |
| DATABASE UniProt [online] 12 April 2017 (2017-04-12), "RecName: Full=CRISPR-associated endonuclease Cas9 {ECO:0000256|HAMAP-Rule:MF_01480}; EC=3.1.-.- {ECO:0000256|HAMAP-Rule:MF_01480};", XP002813330, retrieved from EBI accession no. UNIPROT:A0A1Q2D796 Database accession no. A0A1Q2D796 * |
| DATABASE UniProt [online] 15 March 2017 (2017-03-15), "RecName: Full=CRISPR-associated endonuclease Cas9 {ECO:0000256|HAMAP-Rule:MF_01480}; EC=3.1.-.- {ECO:0000256|HAMAP-Rule:MF_01480};", XP002813331, retrieved from EBI accession no. UNIPROT:A0A1L8RGR8 Database accession no. A0A1L8RGR8 * |
| DATABASE UniProt [online] 24 July 2013 (2013-07-24), "RecName: Full=CRISPR-associated endonuclease Cas9 {ECO:0000256|HAMAP-Rule:MF_01480}; EC=3.1.-.- {ECO:0000256|HAMAP-Rule:MF_01480};", XP002813181, retrieved from EBI accession no. UNIPROT:R6DVD3 Database accession no. R6DVD3 * |
| DATTA ET AL., GENE, vol. 379, 2006, pages 109 - 115 |
| DAVIS ET AL.: "Basic Methods in Molecular Biology", 2012, ELSEVIER |
| DAWSON ET AL., SCIENCE, vol. 266, 1994, pages 776 - 779 |
| DERBYSHIRE ET AL., GENE, vol. 46, 1986, pages 145 |
| DONALD ET AL., J. BACTERIOL., vol. 195, no. 11, 2013, pages 2612 - 2620 |
| FERLA, M.PATRICK, W., MICROBIOLOGY, vol. 160, no. 8, 2014, pages 1571 - 1584 |
| FREUDL, MICROBIAL CELL FACTORIES, vol. 17, 2018, pages 52 |
| GAO ET AL., CELL RES., vol. 26, 2016, pages 901 |
| GEISBERG ET AL., CELL, vol. 156, no. 4, 2014, pages 812 - 824 |
| GERARD, G. R., DNA, vol. 5, 1986, pages 271 - 279 |
| GUOSHERMAN, MOL. CELLULAR BIOL., vol. 15, 1995, pages 5983 - 5990 |
| HAMBRAEUS ET AL., MICROBIOLOGY, vol. 146, no. 12, 2000, pages 3051 - 3059 |
| HAWKSWORTH ET AL.: "Ainsworth and Bisby's Dictionary of The Fungi", 1995, CAB INTERNATIONAL, UNIVERSITY PRESS |
| HEINZE ET AL., BMC MICROBIOLOGY, vol. 18, 2018, pages 56 |
| HILTON ET AL., J. BIOL. CHEM., vol. 271, 1996, pages 4699 - 4708 |
| HOLMSANDER, TRENDS BIOCHEM. SCI., vol. 20, 1995, pages 478 - 480 |
| HUE ET AL., J. BACTERIOL., vol. 177, 1995, pages 3465 - 3471 |
| JORGENSEN ET AL., FUNGAL GENETICS AND BIOLOGY, vol. 48, no. 5, 2015, pages 544 - 553 |
| JUMPER ET AL.: "Highly accurate protein structure prediction with AlphaFold", NATURE, vol. 596, 2021, pages 583 - 589, XP037990370, DOI: 10.1038/s41586-021-03819-2 |
| KABERDINBLÄSI, FEMS MICROBIOL. REV., vol. 30, no. 6, 2006, pages 967 - 979 |
| KOMOR, A.C. ET AL.: "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage", NATURE, vol. 533, 2016, pages 420 - 424, XP037965728, DOI: 10.1038/nature17946 |
| KOTEWICZ, M. L. ET AL., GENE, vol. 35, 1985, pages 249 - 258 |
| LABROU, PROTEIN DOWNSTREAM PROCESSING, vol. 1129, 2014, pages 3 - 10 |
| LI ET AL., MICROBIAL CELL FACTORIES, vol. 16, 2017, pages 168 |
| LOWMAN ET AL., BIOCHEMISTRY, vol. 30, 1991, pages 10832 - 10837 |
| LUBERTOZZIKEASLING, BIOTECHN. ADVANCES, vol. 27, 2009, pages 53 - 75 |
| MAKAROVA ET AL., NAT REV MICROBIOL, vol. 18, 2020, pages 67 - 83 |
| MEMBRILLO-HERNANDEZ ET AL., JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 275, no. 43, 2000, pages 33869 - 33875 |
| MOROZOV ET AL., EUKARYOTIC CELL, vol. 5, no. 11, 2006, pages 1838 - 1846 |
| MUKHERJEE ET AL., TRICHODERMA: BIOLOGY AND APPLICATIONS, 2013 |
| NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453 |
| NEHLS ET AL., SCIENCE, 1996 |
| NER ET AL., DNA, vol. 7, 1988, pages 127 |
| NUCLEIC ACIDS RES., vol. 33, 2005, pages 2302 - 2309 |
| PATELGUPTA, INT. J. SYST. EVOL. MICROBIOL., vol. 70, 2020, pages 406 - 438 |
| PRASHANT ET AL.: "CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering", NATURE BIOTECHNOLOGY, vol. 31, no. 9, 2013, pages 833 - 838, XP055693153, DOI: 10.1038/nbt.2675 |
| REIDHAAR-OLSONSAUER, SCIENCE, vol. 241, 1988, pages 53 - 57 |
| RICE ET AL., TRENDS GENET., vol. 16, 2000, pages 276 - 277 |
| ROMANOS ET AL., YEAST, vol. 8, 1992, pages 423 - 488 |
| SACHLA ET AL.: "A simplified method for CRISPR-Cas9 engineering of Bacillus subtilis", MICROBIOL SPECTR, vol. 9, 2021, pages 00754 - 21 |
| SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LAB. |
| SAUNDERSSAUNDERS, MICROBIAL GENETICS APPLIED TO BIOTECHNOLOGY, 1987 |
| SCHMOLLDATTENBÖCK: "Gene Expression Systems in Fungi: Advancements and Applications", FUNGAL BIOLOGY, 2016 |
| SELINGER ET AL., J. BACTERIOL., vol. 172, 1990, pages 3290 - 3297 |
| SHINDYALOVBOURNE, PROTEIN ENG., vol. 11, 1998, pages 739 - 747 |
| SHOJI ET AL., FEMS MICROBIOLOGY LETTERS, vol. 244, no. 1, 2005, pages 41 - 46 |
| SHUKAL ET AL., MICROBIAL CELL FACTORIES, vol. 21, no. 1, 2022, pages 19 |
| SMITH ET AL., J. MOL. BIOL., vol. 224, 1992, pages 899 - 904 |
| SMOLKE ET AL.: "Synthetic Biology: Parts, Devices and Applications", YEAST: HOW TO DESIGN AND MAKE USE OF PROMOTERS IN S. CEREVISIAE, 2018 |
| SONG ET AL., PLOS ONE, vol. 11, no. 7, 2016, pages 0158447 |
| VARADI ET AL.: "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models", NUCLEIC ACIDS RESEARCH, 2021 |
| VERMA, BIOCHIM. BIOPHYS. ACTA, vol. 473, 1977, pages 1 |
| VOS ET AL., SCIENCE, vol. 255, 1992, pages 306 - 312 |
| WINGFIELD, CURRENT PROTOCOLS IN PROTEIN SCIENCE, vol. 80, no. 1, 2015 |
| WLODAVER ET AL., FEBS LETT., vol. 309, 1992, pages 59 - 64 |
| XU ET AL., BIOTECHNOLOGY LETTERS, vol. 40, 2018, pages 949 - 955 |
| YASBIN ET AL., J. BACTERIOL., vol. 121, 1975, pages 296 - 304 |
| YELTON ET AL., PROC. NATL. ACAD. SCI. USA, vol. 81, 1984, pages 1470 - 1474 |
| ZHANGSKOLNICK, PROTEINS, vol. 57, 2004, pages 702 - 710 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12123014B2 (en) | Class II, type V CRISPR systems | |
| CN107849562B (en) | Genome editing system and method of use | |
| US20190185847A1 (en) | Improving a Microorganism by CRISPR-Inhibition | |
| US20170088845A1 (en) | Vectors and methods for fungal genome engineering by crispr-cas9 | |
| EP1805296B1 (en) | Stable genomic integration of multiple polynucleotide copies | |
| WO2015082535A1 (en) | Fungal gene library by double split-marker integration | |
| US20220010305A1 (en) | Genome Editing by Guided Endonuclease and Single-stranded Oligonucleotide | |
| CA3121271A1 (en) | Modified filamentous fungal host cells | |
| US20220025422A1 (en) | Improved Filamentous Fungal Host Cells | |
| WO2024218234A1 (en) | Generation of multi-copy host cells | |
| WO2025132815A1 (en) | Novel cas nucleases and polynucleotides encoding the same | |
| US20150307871A1 (en) | Method for generating site-specific mutations in filamentous fungi | |
| EP3728583B1 (en) | Counter-selection by inhibition of conditionally essential genes | |
| US20220298517A1 (en) | Counter-selection by inhibition of conditionally essential genes | |
| US20220267783A1 (en) | Filamentous fungal expression system | |
| WO2024120767A1 (en) | Modified rna polymerase activities | |
| US20250059568A1 (en) | Class ii, type v crispr systems | |
| Li | CRISPR/Cas9-Enabled Functional Genomic Editing in the Thermotolerant Yeast Kluyveromyces marxianus | |
| WO2025226596A1 (en) | Methods for producing secreted polypeptides | |
| WO2024240965A2 (en) | Droplet-based screening method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24837334 Country of ref document: EP Kind code of ref document: A1 |